Metadata in Academic Papers: What Blind Review Submissions Expose

The blind review paradox

Double-blind peer review is the gold standard in academic publishing. The premise is straightforward: reviewers do not know who wrote the paper, and authors do not know who reviewed it. This anonymity is intended to ensure that papers are evaluated on their merits, free from the biases of reputation, institutional affiliation, or personal relationships.

The paradox is that while researchers carefully remove their names from the manuscript text — replacing "Author et al. (2024)" with "[Anonymized]" and stripping affiliations from the title page — they routinely submit PDFs that contain their full names and institutional affiliations in the file's metadata fields.

A reviewer who checks File > Properties on the submitted PDF can identify the author in seconds. The anonymization effort applied to the visible content is undermined by invisible metadata.

What metadata academic submission portals handle

Major submission systems

Academic journals use manuscript management systems to handle submissions. The major platforms are ScholarOne (Clarivate), Editorial Manager (ARIES), and OJS (Open Journal Systems). Their handling of metadata varies.

ScholarOne does not automatically strip metadata from uploaded files. The author uploads a PDF or Word file, and the system stores and distributes it to reviewers as-is. Some journals that use ScholarOne add custom instructions asking authors to remove metadata before uploading, but this is not enforced by the platform.

Editorial Manager has a PDF conversion step that generates a merged PDF from uploaded files. This conversion may modify some metadata fields (Creator and Producer will reflect the conversion tool), but does not systematically strip Author, Title, or Subject fields from the source documents.

OJS stores uploaded files without metadata processing. Submissions are made available to reviewers with their original metadata intact.

The practical result: in most cases, the submission portal does not strip metadata. The responsibility falls on the author.

What LaTeX-generated PDFs include

LaTeX is the dominant authoring tool in mathematics, physics, computer science, and many engineering disciplines. PDFs generated from LaTeX carry metadata determined by the LaTeX configuration and the PDF engine used.

Default metadata in LaTeX PDFs

When you compile a LaTeX document with pdflatex, xelatex, or lualatex, the resulting PDF includes:

Author — populated from the \author{} command in the LaTeX source, if the hyperref package is configured to set PDF metadata
Title — populated from the \title{} command
Subject and Keywords — populated if set via hyperref's \hypersetup{} configuration
Creator — typically "LaTeX with hyperref" or similar
Producer — the PDF engine (e.g., "pdfTeX-1.40.25" or "XeTeX 0.999996")
Creation date — the date and time of compilation

The hyperref problem

The hyperref package, which is included in the vast majority of LaTeX documents for clickable cross-references, automatically sets PDF metadata from the document's \author and \title commands unless explicitly overridden.

If your LaTeX source contains:

\author{Jane Smith}
\title{A Novel Approach to Graph Coloring}

And you use hyperref without modification, the resulting PDF will contain "Jane Smith" in the Author metadata field, regardless of whether the visible title page has been anonymized.

How to fix LaTeX metadata for blind review

Override the metadata before compilation by adding to your preamble:

\hypersetup{
  pdfauthor={},
  pdftitle={Submission for Review},
  pdfsubject={},
  pdfkeywords={},
  pdfcreator={},
}

This clears the metadata fields that hyperref would otherwise populate. Some conferences provide LaTeX templates with these overrides already in place, but many do not.

What Word-generated PDFs include

In the social sciences, humanities, and many professional fields, manuscripts are prepared in Microsoft Word and converted to PDF for submission.

Word-to-PDF metadata inheritance

When a Word document is saved as PDF, the following metadata carries over:

Author — the name associated with the Word license or system profile
Company — the organization name configured in Word
Title — the document's Title property, which may differ from the text on the first page
Last Modified By — the name of the person who last saved the file
Creator — "Microsoft Word" with the version number
Creation and modification dates — when the Word file was created and last modified

The "Remove Author Name" checklist miss

Word's File > Info panel shows Author and Last Author fields. Many researchers know to check this. What they miss:

The Company field — which identifies their university or research institution
Template paths — which may include their username or department name
Comments and tracked changes — which may remain from co-author review and contain names
Embedded images — which may contain EXIF data from lab equipment or cameras identifying the research facility

Conference-specific risks

Conference submissions carry additional metadata risks:

Submission file naming

Some researchers name their submission files descriptively: Smith_GraphColoring_ICML2025.pdf. The filename is not metadata in the strict sense, but it travels with the file and reveals authorship. Most submission systems rename the file upon upload, but some do not.

Supplementary materials

Supplementary materials (code, data, additional figures) often receive less attention to anonymization than the main manuscript. A Jupyter notebook in the supplementary materials may contain the author's username in file paths, a README file may reference the lab's website, and code comments may include the author's name.

Preprint servers

Researchers who post preprints on arXiv or SSRN before submitting to a journal create a situation where the paper is publicly associated with the author even if the journal submission is anonymized. This is a procedural anonymization issue rather than a metadata issue, but it interacts with metadata: if the journal submission and the preprint have matching creation dates in their metadata, a reviewer who suspects the identity can confirm it.

Step-by-step guide for anonymizing submissions

For LaTeX users

Override hyperref metadata using \hypersetup{} with empty author, title, subject, and keywords fields
Check the compiled PDF by opening it in a PDF reader and checking File > Properties to verify that metadata fields are empty or generic
Remove the compilation date if your threat model includes timestamp analysis. Some LaTeX configurations can be set to use a fixed date
Check supplementary files for usernames in file paths, author names in code comments, and institution names in READMEs

For Word users

Run Document Inspector (File > Info > Check for Issues > Inspect Document) and remove document properties, comments, and revision data
Check the Company field specifically — Document Inspector handles it, but verify manually
Save as PDF after cleaning metadata in the Word file
Check the PDF's metadata separately — some fields may be added during conversion
Check embedded images for EXIF data, especially photographs from lab settings

For both

Use a generic filename for the submission file
Remove self-citations that identify you from the reference list (replace with "[Anonymized]")
Check acknowledgments — a section thanking your specific funding source or lab colleagues reveals identity
Scan the final PDF with a metadata tool to verify that all identifying metadata has been removed

Purgit scans academic PDFs for author-identifying metadata — author fields, creator strings, company names, timestamps, and embedded image data. Verify your submission is truly anonymous before uploading.

[Scan a File Free]