Understanding Scan Reports
Report Overview
Every scan produces a report — a structured record of what Purgit found in your file, what policy was applied, and what the verification status is after sanitization.
Reports are available in three formats:
- JSON — Machine-readable, ideal for CI/CD pipelines and automation.
- HTML — Human-readable, styled report for review and sharing.
- Embedded — A summary is embedded in sanitized files for provenance tracking.
Full JSON Schema
Here is a complete annotated example of a scan report:
{
"reportId": "rpt_01HX7K9M...",
"createdAt": "2026-03-06T14:23:11Z",
"file": {
"name": "contract-draft.pdf",
"size": 1048576,
"format": "pdf",
"hash": "sha256:a1b2c3d4e5f6..."
},
"policy": {
"name": "legal",
"version": "1.2.0"
},
"summary": {
"totalFindings": 7,
"bySeverity": {
"critical": 0,
"high": 2,
"medium": 3,
"low": 1,
"info": 1
},
"autoFixable": 5,
"requiresReview": 2
},
"findings": [
{
"ruleId": "PDF-META-001",
"severity": "high",
"autofix": true,
"field": "Author",
"value": "Jane Smith",
"path": "/Info/Author",
"description": "PDF Author field contains personal name",
"recommendation": "Remove Author field before sharing externally"
},
{
"ruleId": "PDF-META-003",
"severity": "high",
"autofix": true,
"field": "Creator",
"value": "Microsoft Word 16.0",
"path": "/Info/Creator",
"description": "Creator field reveals authoring software",
"recommendation": "Remove or normalize Creator field"
},
{
"ruleId": "PDF-META-002",
"severity": "medium",
"autofix": true,
"field": "Keywords",
"value": "merger, acquisition, Project Falcon",
"path": "/Info/Keywords",
"description": "Keywords field may contain sensitive terms",
"recommendation": "Review keywords and remove if sensitive"
}
],
"verification": {
"status": "verified",
"residualFindings": 0,
"verifiedAt": "2026-03-06T14:23:14Z"
}
}
Field Reference
Top-Level Fields
| Field | Type | Description |
|-------|------|-------------|
| reportId | string | Unique report identifier, prefixed with rpt_ |
| createdAt | string | ISO 8601 timestamp of when the scan was run |
file Object
| Field | Type | Description |
|-------|------|-------------|
| name | string | Original filename as uploaded |
| size | number | File size in bytes |
| format | string | Detected format: pdf, jpeg, png, heic, docx |
| hash | string | SHA-256 hash of the original file for integrity verification |
policy Object
| Field | Type | Description |
|-------|------|-------------|
| name | string | Policy name used for the scan |
| version | string | Semantic version of the policy definition |
summary Object
| Field | Type | Description |
|-------|------|-------------|
| totalFindings | number | Total number of findings |
| bySeverity | object | Breakdown by severity: critical, high, medium, low, info |
| autoFixable | number | Findings that can be automatically removed during sanitization |
| requiresReview | number | Findings that need manual review |
findings Array
Each finding is an object:
| Field | Type | Description |
|-------|------|-------------|
| ruleId | string | Unique rule identifier (e.g., PDF-META-001) |
| severity | string | critical, high, medium, low, or info |
| autofix | boolean | Whether this finding is automatically removed during sanitization |
| field | string | Human-readable name of the metadata field |
| value | string | The actual value found (redacted in some contexts) |
| path | string | Technical path to the field within the file structure |
| description | string | What this finding means |
| recommendation | string | Suggested action |
verification Object
Present only on reports for sanitized files:
| Field | Type | Description |
|-------|------|-------------|
| status | string | verified (all clear), partial (some findings remain), not_verified (verification not run) |
| residualFindings | number | Number of findings still present after sanitization |
| verifiedAt | string | ISO 8601 timestamp of the verification scan |
Interpreting Severity Levels
- Critical (0 tolerance): These represent immediate privacy or security risks. A file with critical findings should never be shared externally without sanitization.
- High (strongly recommended): Likely exposes personal or organizational identity. Remove before sharing in professional contexts.
- Medium (context-dependent): May or may not be sensitive. Review based on your use case and audience.
- Low (informational risk): Minimal direct privacy impact. Useful for thorough audits.
- Info (metadata awareness): No action needed in most cases. Reported only under
strictpolicy.
Using Reports in CI/CD
Use the JSON report format to gate CI/CD pipelines. Example shell script:
# Scan and fail the pipeline if any findings exist
purgit scan document.pdf --format json --output report.json --quiet
if [ $? -eq 1 ]; then
echo "Metadata findings detected. Failing pipeline."
cat report.json | jq '.summary'
exit 1
fi
Or check for specific severity levels:
CRITICAL=$(cat report.json | jq '.summary.bySeverity.critical')
if [ "$CRITICAL" -gt 0 ]; then
echo "Critical findings found. Blocking release."
exit 1
fi
Next Steps
- Integrations — GitHub Actions workflow, Node.js, Python, and webhook examples.
- Policies & Rules — Learn about rule IDs and severity definitions.