Metadata in Academic Papers: What Blind Review Submissions Expose
Double-blind peer review requires anonymous submissions. But metadata in submitted PDFs routinely reveals author identity through embedded fields.
The blind review paradox
Double-blind peer review is the gold standard in academic publishing. The premise is straightforward: reviewers do not know who wrote the paper, and authors do not know who reviewed it. This anonymity is intended to ensure that papers are evaluated on their merits, free from the biases of reputation, institutional affiliation, or personal relationships.
The paradox is that while researchers carefully remove their names from the manuscript text — replacing "Author et al. (2024)" with "[Anonymized]" and stripping affiliations from the title page — they routinely submit PDFs that contain their full names and institutional affiliations in the file's metadata fields.
A reviewer who checks File > Properties on the submitted PDF can identify the author in seconds. The anonymization effort applied to the visible content is undermined by invisible metadata.
What metadata academic submission portals handle
Major submission systems
Academic journals use manuscript management systems to handle submissions. The major platforms are ScholarOne (Clarivate), Editorial Manager (ARIES), and OJS (Open Journal Systems). Their handling of metadata varies.
ScholarOne does not automatically strip metadata from uploaded files. The author uploads a PDF or Word file, and the system stores and distributes it to reviewers as-is. Some journals that use ScholarOne add custom instructions asking authors to remove metadata before uploading, but this is not enforced by the platform.
Editorial Manager has a PDF conversion step that generates a merged PDF from uploaded files. This conversion may modify some metadata fields (Creator and Producer will reflect the conversion tool), but does not systematically strip Author, Title, or Subject fields from the source documents.
OJS stores uploaded files without metadata processing. Submissions are made available to reviewers with their original metadata intact.
The practical result: in most cases, the submission portal does not strip metadata. The responsibility falls on the author.
What LaTeX-generated PDFs include
LaTeX is the dominant authoring tool in mathematics, physics, computer science, and many engineering disciplines. PDFs generated from LaTeX carry metadata determined by the LaTeX configuration and the PDF engine used.
Default metadata in LaTeX PDFs
When you compile a LaTeX document with pdflatex, xelatex, or lualatex, the resulting PDF includes:
- Author — populated from the
\author{}command in the LaTeX source, if the hyperref package is configured to set PDF metadata - Title — populated from the
\title{}command - Subject and Keywords — populated if set via hyperref's
\hypersetup{}configuration - Creator — typically "LaTeX with hyperref" or similar
- Producer — the PDF engine (e.g., "pdfTeX-1.40.25" or "XeTeX 0.999996")
- Creation date — the date and time of compilation
The hyperref problem
The hyperref package, which is included in the vast majority of LaTeX documents for clickable cross-references, automatically sets PDF metadata from the document's \author and \title commands unless explicitly overridden.
If your LaTeX source contains:
\author{Jane Smith}
\title{A Novel Approach to Graph Coloring}
And you use hyperref without modification, the resulting PDF will contain "Jane Smith" in the Author metadata field, regardless of whether the visible title page has been anonymized.
How to fix LaTeX metadata for blind review
Override the metadata before compilation by adding to your preamble:
\hypersetup{
pdfauthor={},
pdftitle={Submission for Review},
pdfsubject={},
pdfkeywords={},
pdfcreator={},
}
This clears the metadata fields that hyperref would otherwise populate. Some conferences provide LaTeX templates with these overrides already in place, but many do not.
What Word-generated PDFs include
In the social sciences, humanities, and many professional fields, manuscripts are prepared in Microsoft Word and converted to PDF for submission.
Word-to-PDF metadata inheritance
When a Word document is saved as PDF, the following metadata carries over:
- Author — the name associated with the Word license or system profile
- Company — the organization name configured in Word
- Title — the document's Title property, which may differ from the text on the first page
- Last Modified By — the name of the person who last saved the file
- Creator — "Microsoft Word" with the version number
- Creation and modification dates — when the Word file was created and last modified
The "Remove Author Name" checklist miss
Word's File > Info panel shows Author and Last Author fields. Many researchers know to check this. What they miss:
- The Company field — which identifies their university or research institution
- Template paths — which may include their username or department name
- Comments and tracked changes — which may remain from co-author review and contain names
- Embedded images — which may contain EXIF data from lab equipment or cameras identifying the research facility
Conference-specific risks
Conference submissions carry additional metadata risks:
Submission file naming
Some researchers name their submission files descriptively: Smith_GraphColoring_ICML2025.pdf. The filename is not metadata in the strict sense, but it travels with the file and reveals authorship. Most submission systems rename the file upon upload, but some do not.
Supplementary materials
Supplementary materials (code, data, additional figures) often receive less attention to anonymization than the main manuscript. A Jupyter notebook in the supplementary materials may contain the author's username in file paths, a README file may reference the lab's website, and code comments may include the author's name.
Preprint servers
Researchers who post preprints on arXiv or SSRN before submitting to a journal create a situation where the paper is publicly associated with the author even if the journal submission is anonymized. This is a procedural anonymization issue rather than a metadata issue, but it interacts with metadata: if the journal submission and the preprint have matching creation dates in their metadata, a reviewer who suspects the identity can confirm it.
Step-by-step guide for anonymizing submissions
For LaTeX users
- Override hyperref metadata using
\hypersetup{}with empty author, title, subject, and keywords fields - Check the compiled PDF by opening it in a PDF reader and checking File > Properties to verify that metadata fields are empty or generic
- Remove the compilation date if your threat model includes timestamp analysis. Some LaTeX configurations can be set to use a fixed date
- Check supplementary files for usernames in file paths, author names in code comments, and institution names in READMEs
For Word users
- Run Document Inspector (File > Info > Check for Issues > Inspect Document) and remove document properties, comments, and revision data
- Check the Company field specifically — Document Inspector handles it, but verify manually
- Save as PDF after cleaning metadata in the Word file
- Check the PDF's metadata separately — some fields may be added during conversion
- Check embedded images for EXIF data, especially photographs from lab settings
For both
- Use a generic filename for the submission file
- Remove self-citations that identify you from the reference list (replace with "[Anonymized]")
- Check acknowledgments — a section thanking your specific funding source or lab colleagues reveals identity
- Scan the final PDF with a metadata tool to verify that all identifying metadata has been removed
Purgit scans academic PDFs for author-identifying metadata — author fields, creator strings, company names, timestamps, and embedded image data. Verify your submission is truly anonymous before uploading.
[Scan a File Free]