5 Times Metadata Got Lawyers in Trouble

The invisible layer in every document

Every document you create — every PDF, every Word file, every spreadsheet — carries an invisible layer of data about itself. This metadata records who wrote it, when, with what software, how many times it was revised, and sometimes what was deleted.

In most professional contexts, this data is harmless background noise. In legal practice, it has ended careers, derailed cases, and exposed confidential client strategies.

Here are five documented cases where metadata in legal documents caused real damage.

1. The Epstein court filing redaction failure (2019)

In the Virginia Giuffre v. Ghislaine Maxwell litigation, court filings were submitted with names redacted by placing black rectangles over the text. The redactions were visual overlays — the text beneath remained in the PDF's data layer, extractable by anyone who selected and copied the covered area.

The sealed names were published across news outlets and social media within hours of the filing becoming available.

What went wrong: The redaction was applied at the visual layer, not the data layer. PDF annotation tools draw over text without removing it. The filing was not verified after redaction.

What should have happened: Structural redaction (removing the text objects from the content stream), followed by a verification scan confirming the text was no longer extractable.

For a detailed technical analysis of this case, see our full article on the Epstein PDF redaction failure.

2. The DOJ antitrust brief with tracked changes (2004)

In 2004, the U.S. Department of Justice filed a brief in a major antitrust case. The filing was a Word document that had been converted to PDF for submission.

Journalists and opposing counsel discovered that the original Word file's tracked changes were embedded in the document's metadata. The revision history contained internal DOJ edits and deletions — including passages where department attorneys had debated the strength of their own arguments. Deleted paragraphs, rejected phrasing, and editorial comments were all recoverable.

The extracted metadata was cited by opposing counsel to challenge the government's position, arguing that the DOJ's own internal edits demonstrated uncertainty about the merits of the case.

What went wrong: The Word document was converted to PDF without first accepting all changes and purging revision history. Word's "Accept All Changes" removes the visual markup, but the revision data persists in the document's XML structure unless explicitly removed via Document Inspector or a dedicated cleaning tool.

What should have happened: Before conversion to PDF, the OOXML document should have been sanitized — tracked changes removed at the XML level (not just accepted), author fields cleared, and revision history purged. The resulting PDF should have been scanned for residual metadata.

3. The UK government's Iraq dossier (2003)

In February 2003, the British government published a dossier titled "Iraq: Its Infrastructure of Concealment, Deception and Intimidation" — part of the justification for military action in Iraq.

Shortly after publication, a Cambridge University lecturer named Glen Rangwala examined the Word document's metadata. The Properties panel revealed the names of the document's authors. Cross-referencing those names with published academic work revealed that large sections of the dossier had been copied directly from an academic paper by Ibrahim al-Marashi, a postgraduate researcher at the Monterey Institute of International Studies.

The metadata didn't just reveal the copy. It revealed the chain of editing — who had modified the document, in what order, and what had been changed. The document metadata contained ten revision entries, each associated with a named user account on a government computer.

The incident became known as the "Dodgy Dossier" affair and significantly damaged the British government's credibility on its case for war.

What went wrong: The Word document was published with its author metadata, revision history, and document properties intact. No metadata inspection or cleaning was performed before release.

What should have happened: The document properties should have been cleared, the author fields removed, and the revision history purged before publication. A verification scan would have confirmed that no identifying metadata remained.

4. The disputed sender identity (2009)

In a commercial dispute involving parties in Abu Dhabi and the United Kingdom, a key question was the identity of the person who had authored a particular communication. One party denied having sent the document.

Forensic examination of the Word document's metadata revealed the author field, the "last modified by" field, and the computer account name associated with the document's creation. These metadata fields directly identified the disputed sender. The metadata evidence was admitted and contributed to the resolution of the dispute.

What went wrong (from the sender's perspective): The document was sent without clearing the author and "last modified by" fields. Word automatically populates these fields with the logged-in user's account name — a default that most users never change or review.

What this demonstrates: Metadata is forensic evidence. In litigation, document properties are routinely examined and admitted. The author field in a Word document is not a label you choose — it is a record of the Windows account that created the file.

5. The settlement agreement with visible revision history

This case represents a pattern rather than a single publicized incident, because it occurs regularly in legal practice and is rarely reported publicly due to the embarrassment involved.

A law firm sends a settlement agreement draft as a Word document to opposing counsel. The sending firm has negotiated internally — revising terms, adjusting dollar amounts, removing provisions, and editing language across multiple rounds of drafts. The final version appears clean.

Opposing counsel opens the document in Word and navigates to Review > Show Markup. The full revision history is visible. Deleted paragraphs are displayed with strikethrough. Previous dollar amounts are shown alongside current ones. Internal comments between the sending firm's attorneys are readable.

The opposing counsel now knows the sending firm's negotiation range, their internal reservations about specific terms, and the provisions they considered but removed. The negotiation dynamic shifts permanently.

What went wrong: The document was sent with tracked changes and comments still embedded in the file's XML structure. "Accepting" tracked changes in Word removes the visual markup, but the revision data may persist depending on the Word version and the method used. Comments may be "resolved" but not deleted from the XML.

What should have happened: The OOXML file should have been sanitized at the XML level — tracked changes removed from the document's XML parts, comment entries deleted, revision history cleared, and the output verified by re-parsing the file to confirm no revision data remained.

The common thread

These five cases span different jurisdictions, different document types, and different levels of sensitivity. But they share the same structural failure:

The document looked correct on screen. The visual presentation was exactly what the sender intended. Nothing appeared wrong.
The metadata was invisible to casual inspection. You cannot see metadata by reading a document. You have to actively inspect it — and most professional workflows do not include a metadata inspection step.
The tools did not warn the user. Word does not warn you when you email a document with tracked changes. Adobe Acrobat does not warn you when a redaction is visual rather than structural. The tools are silent about the risk.
The damage was irreversible. Once a document with metadata is in someone else's hands, the metadata is extracted. You cannot recall it. You cannot un-share it. The information is permanently disclosed.

What professional document hygiene looks like

The legal profession has responded to metadata risks with bar association ethics opinions, firm-wide policies, and dedicated metadata cleaning tools. But adoption remains uneven, and incidents continue.

Effective document hygiene requires three elements:

Detection: Before a document leaves your control, scan it for metadata, revision history, tracked changes, comments, author fields, and embedded data. Not manually — systematically, using a tool that understands the format's internal structure.

Removal: Strip or replace the identified metadata. This must happen at the data layer (XML nodes, PDF objects, EXIF tags), not at the visual layer.

Verification: After removal, re-scan the output to confirm the metadata is gone. If you only remove and do not verify, you are trusting the tool without evidence. Verification is what turns cleanup into assurance.

Purgit scans documents for hidden metadata, tracked changes, and revision history. It removes findings at the structural level and verifies removal by re-scanning the output. Before you send, scan.

[Scan a File Free]