Skip to main content

Understanding Scan Reports

Report Overview

Every scan produces a report — a structured record of what Purgit found in your file, what policy was applied, and what the verification status is after sanitization.

Reports are available in three formats:

  • JSON — Machine-readable, ideal for CI/CD pipelines and automation.
  • HTML — Human-readable, styled report for review and sharing.
  • Embedded — A summary is embedded in sanitized files for provenance tracking.

Full JSON Schema

Here is a complete annotated example of a scan report:

{
  "reportId": "rpt_01HX7K9M...",
  "createdAt": "2026-03-06T14:23:11Z",

  "file": {
    "name": "contract-draft.pdf",
    "size": 1048576,
    "format": "pdf",
    "hash": "sha256:a1b2c3d4e5f6..."
  },

  "policy": {
    "name": "legal",
    "version": "1.2.0"
  },

  "summary": {
    "totalFindings": 7,
    "bySeverity": {
      "critical": 0,
      "high": 2,
      "medium": 3,
      "low": 1,
      "info": 1
    },
    "autoFixable": 5,
    "requiresReview": 2
  },

  "findings": [
    {
      "ruleId": "PDF-META-001",
      "severity": "high",
      "autofix": true,
      "field": "Author",
      "value": "Jane Smith",
      "path": "/Info/Author",
      "description": "PDF Author field contains personal name",
      "recommendation": "Remove Author field before sharing externally"
    },
    {
      "ruleId": "PDF-META-003",
      "severity": "high",
      "autofix": true,
      "field": "Creator",
      "value": "Microsoft Word 16.0",
      "path": "/Info/Creator",
      "description": "Creator field reveals authoring software",
      "recommendation": "Remove or normalize Creator field"
    },
    {
      "ruleId": "PDF-META-002",
      "severity": "medium",
      "autofix": true,
      "field": "Keywords",
      "value": "merger, acquisition, Project Falcon",
      "path": "/Info/Keywords",
      "description": "Keywords field may contain sensitive terms",
      "recommendation": "Review keywords and remove if sensitive"
    }
  ],

  "verification": {
    "status": "verified",
    "residualFindings": 0,
    "verifiedAt": "2026-03-06T14:23:14Z"
  }
}

Field Reference

Top-Level Fields

| Field | Type | Description | |-------|------|-------------| | reportId | string | Unique report identifier, prefixed with rpt_ | | createdAt | string | ISO 8601 timestamp of when the scan was run |

file Object

| Field | Type | Description | |-------|------|-------------| | name | string | Original filename as uploaded | | size | number | File size in bytes | | format | string | Detected format: pdf, jpeg, png, heic, docx | | hash | string | SHA-256 hash of the original file for integrity verification |

policy Object

| Field | Type | Description | |-------|------|-------------| | name | string | Policy name used for the scan | | version | string | Semantic version of the policy definition |

summary Object

| Field | Type | Description | |-------|------|-------------| | totalFindings | number | Total number of findings | | bySeverity | object | Breakdown by severity: critical, high, medium, low, info | | autoFixable | number | Findings that can be automatically removed during sanitization | | requiresReview | number | Findings that need manual review |

findings Array

Each finding is an object:

| Field | Type | Description | |-------|------|-------------| | ruleId | string | Unique rule identifier (e.g., PDF-META-001) | | severity | string | critical, high, medium, low, or info | | autofix | boolean | Whether this finding is automatically removed during sanitization | | field | string | Human-readable name of the metadata field | | value | string | The actual value found (redacted in some contexts) | | path | string | Technical path to the field within the file structure | | description | string | What this finding means | | recommendation | string | Suggested action |

verification Object

Present only on reports for sanitized files:

| Field | Type | Description | |-------|------|-------------| | status | string | verified (all clear), partial (some findings remain), not_verified (verification not run) | | residualFindings | number | Number of findings still present after sanitization | | verifiedAt | string | ISO 8601 timestamp of the verification scan |

Interpreting Severity Levels

  • Critical (0 tolerance): These represent immediate privacy or security risks. A file with critical findings should never be shared externally without sanitization.
  • High (strongly recommended): Likely exposes personal or organizational identity. Remove before sharing in professional contexts.
  • Medium (context-dependent): May or may not be sensitive. Review based on your use case and audience.
  • Low (informational risk): Minimal direct privacy impact. Useful for thorough audits.
  • Info (metadata awareness): No action needed in most cases. Reported only under strict policy.

Using Reports in CI/CD

Use the JSON report format to gate CI/CD pipelines. Example shell script:

# Scan and fail the pipeline if any findings exist
purgit scan document.pdf --format json --output report.json --quiet

if [ $? -eq 1 ]; then
  echo "Metadata findings detected. Failing pipeline."
  cat report.json | jq '.summary'
  exit 1
fi

Or check for specific severity levels:

CRITICAL=$(cat report.json | jq '.summary.bySeverity.critical')
if [ "$CRITICAL" -gt 0 ]; then
  echo "Critical findings found. Blocking release."
  exit 1
fi

Next Steps

  • Integrations — GitHub Actions workflow, Node.js, Python, and webhook examples.
  • Policies & Rules — Learn about rule IDs and severity definitions.

Last updated: 2026-03-06