Integrating Purgit Into Your Document Pipeline via API

Why integrate via API

Purgit's web interface handles individual document scanning and sanitization. But if your organization processes documents at volume — contract management systems, report generators, document publishing workflows, automated compliance pipelines — you need metadata handling integrated directly into your software.

The Purgit API provides programmatic access to the same scan, sanitize, and verify capabilities available in the web interface. You send a document, receive a metadata report, request sanitization, and get back a clean file with verification results.

Authentication

API keys

All API requests require authentication via Bearer token. Generate API keys from your Purgit dashboard under Settings > API Keys.

Authorization: Bearer purgit_live_sk_abc123...

Key management best practices

Generate separate keys for each integration (one for your CMS connector, one for your CI pipeline, one for your email gateway)
Store keys in environment variables or a secrets manager, never in source code
Rotate keys on a regular schedule and immediately if a key may have been exposed
Use test-mode keys (purgit_test_sk_...) during development — test keys process documents but do not count against your plan quota

The scan, sanitize, verify flow

The Purgit API follows a three-step flow that mirrors the verification-first philosophy of the product.

Step 1: Upload and scan

Upload a document to receive a metadata scan report.

POST /api/v1/scan
Content-Type: multipart/form-data

file: [binary file data]
policy: "default"  // optional: policy ID for custom rules

The response includes a scan ID and the full metadata inventory:

{
  "scanId": "scan_8f3a2b1c",
  "status": "complete",
  "findings": {
    "total": 12,
    "categories": {
      "identity": 3,
      "timestamps": 2,
      "revision": 4,
      "comments": 2,
      "location": 1
    },
    "details": [
      {
        "ruleId": "PDF-META-001",
        "field": "Author",
        "value": "Jane Smith",
        "severity": "high",
        "category": "identity"
      }
    ]
  },
  "fileHash": "sha256:a1b2c3..."
}

Step 2: Sanitize

Request sanitization of a scanned document. You can sanitize all findings or specify which categories or individual findings to address.

POST /api/v1/sanitize
Content-Type: application/json

{
  "scanId": "scan_8f3a2b1c",
  "mode": "all",
  "policy": "default"
}

Or selectively:

{
  "scanId": "scan_8f3a2b1c",
  "mode": "selective",
  "include": ["identity", "comments", "revision"],
  "exclude": ["timestamps"]
}

The response includes a sanitization ID and a download URL for the clean file:

{
  "sanitizeId": "san_4d5e6f7g",
  "status": "complete",
  "removedCount": 10,
  "preservedCount": 2,
  "downloadUrl": "/api/v1/download/san_4d5e6f7g",
  "expiresAt": "2026-04-01T00:00:00Z"
}

Step 3: Verify

The sanitized file is automatically re-scanned to verify that metadata was successfully removed. Verification results are included in the sanitization response, but you can also request a standalone verification scan.

POST /api/v1/verify
Content-Type: application/json

{
  "sanitizeId": "san_4d5e6f7g"
}

Response:

{
  "verifyId": "ver_9h0i1j2k",
  "status": "pass",
  "remainingFindings": 0,
  "report": {
    "scannedFields": 47,
    "cleanFields": 47,
    "flaggedFields": 0
  }
}

If verification fails (remaining findings > 0), the response includes details on which metadata persisted and why.

Code examples

Node.js

const fs = require('fs');

const API_BASE = 'https://api.purgit.io';
const API_KEY = process.env.PURGIT_API_KEY;

async function scanAndSanitize(filePath) {
  // Step 1: Upload and scan
  const formData = new FormData();
  formData.append('file', new Blob([fs.readFileSync(filePath)]));
  formData.append('policy', 'default');

  const scanRes = await fetch(`${API_BASE}/api/v1/scan`, {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${API_KEY}` },
    body: formData,
  });
  const scan = await scanRes.json();

  if (scan.findings.total === 0) {
    console.log('No metadata found.');
    return null;
  }

  console.log(`Found ${scan.findings.total} metadata findings.`);

  // Step 2: Sanitize
  const sanitizeRes = await fetch(`${API_BASE}/api/v1/sanitize`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      scanId: scan.scanId,
      mode: 'all',
    }),
  });
  const sanitize = await sanitizeRes.json();

  // Step 3: Download clean file
  const fileRes = await fetch(`${API_BASE}${sanitize.downloadUrl}`, {
    headers: { 'Authorization': `Bearer ${API_KEY}` },
  });
  const cleanFile = Buffer.from(await fileRes.arrayBuffer());

  const outputPath = filePath.replace(/(\.\w+)$/, '.clean$1');
  fs.writeFileSync(outputPath, cleanFile);
  console.log(`Clean file saved to ${outputPath}`);

  return sanitize;
}

Python

import os
import requests

API_BASE = "https://api.purgit.io"
API_KEY = os.environ["PURGIT_API_KEY"]
HEADERS = {"Authorization": f"Bearer {API_KEY}"}


def scan_and_sanitize(file_path: str) -> dict | None:
    # Step 1: Upload and scan
    with open(file_path, "rb") as f:
        scan_res = requests.post(
            f"{API_BASE}/api/v1/scan",
            headers=HEADERS,
            files={"file": f},
            data={"policy": "default"},
        )
    scan = scan_res.json()

    if scan["findings"]["total"] == 0:
        print("No metadata found.")
        return None

    print(f"Found {scan['findings']['total']} metadata findings.")

    # Step 2: Sanitize
    sanitize_res = requests.post(
        f"{API_BASE}/api/v1/sanitize",
        headers={**HEADERS, "Content-Type": "application/json"},
        json={"scanId": scan["scanId"], "mode": "all"},
    )
    sanitize = sanitize_res.json()

    # Step 3: Download clean file
    file_res = requests.get(
        f"{API_BASE}{sanitize['downloadUrl']}",
        headers=HEADERS,
    )

    base, ext = os.path.splitext(file_path)
    output_path = f"{base}.clean{ext}"
    with open(output_path, "wb") as f:
        f.write(file_res.content)

    print(f"Clean file saved to {output_path}")
    return sanitize

GitHub Actions workflow

Automate metadata scanning as part of your CI pipeline. This workflow scans documents in your repository before they are published or distributed.

name: Document Metadata Scan
on:
  push:
    paths:
      - 'docs/**'
      - 'assets/**'
  pull_request:
    paths:
      - 'docs/**'
      - 'assets/**'

jobs:
  scan-metadata:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Find documents
        id: find-docs
        run: |
          find docs assets -type f \
            \( -name "*.pdf" -o -name "*.docx" -o -name "*.xlsx" \
               -o -name "*.pptx" -o -name "*.jpg" -o -name "*.png" \) \
            > /tmp/doc-list.txt
          echo "count=$(wc -l < /tmp/doc-list.txt)" >> "$GITHUB_OUTPUT"

      - name: Scan documents for metadata
        if: steps.find-docs.outputs.count > 0
        env:
          PURGIT_API_KEY: ${{ secrets.PURGIT_API_KEY }}
        run: |
          FAILED=0
          while IFS= read -r file; do
            RESULT=$(curl -s -X POST "https://api.purgit.io/api/v1/scan" \
              -H "Authorization: Bearer $PURGIT_API_KEY" \
              -F "file=@$file" \
              -F "policy=default")
            FINDINGS=$(echo "$RESULT" | jq '.findings.total')
            if [ "$FINDINGS" -gt 0 ]; then
              echo "::warning file=$file::Found $FINDINGS metadata findings"
              FAILED=1
            fi
          done < /tmp/doc-list.txt
          if [ "$FAILED" -eq 1 ]; then
            echo "::error::Documents contain metadata. Run Purgit to clean before committing."
            exit 1
          fi

Webhook setup

For asynchronous processing of large files, configure webhooks to receive scan and sanitization results.

Register a webhook endpoint

POST /api/v1/webhooks
Content-Type: application/json

{
  "url": "https://your-app.com/webhooks/purgit",
  "events": ["scan.complete", "sanitize.complete", "verify.complete"],
  "secret": "whsec_your_signing_secret"
}

Webhook payload

{
  "event": "sanitize.complete",
  "timestamp": "2026-03-31T14:30:00Z",
  "data": {
    "sanitizeId": "san_4d5e6f7g",
    "scanId": "scan_8f3a2b1c",
    "status": "complete",
    "removedCount": 10,
    "downloadUrl": "/api/v1/download/san_4d5e6f7g",
    "verification": {
      "status": "pass",
      "remainingFindings": 0
    }
  }
}

Signature verification

Webhook payloads are signed with your webhook secret using HMAC-SHA256. Verify the signature from the X-Purgit-Signature header before processing:

const crypto = require('crypto');

function verifyWebhookSignature(payload, signature, secret) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(payload)
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}

Error handling and retry logic

HTTP status codes

200 — success
400 — invalid request (unsupported file format, missing required fields)
401 — invalid or missing API key
413 — file exceeds size limit
429 — rate limit exceeded
500 — server error (retry with backoff)

Retry strategy

For 429 and 500 responses, implement exponential backoff:

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await fetch(url, options);
    if (res.status === 429 || res.status >= 500) {
      if (attempt === maxRetries) throw new Error(`Failed after ${maxRetries} retries`);
      const delay = Math.pow(2, attempt) * 1000;
      await new Promise(r => setTimeout(r, delay));
      continue;
    }
    return res;
  }
}

Rate limits

API rate limits depend on your plan:

| Plan | Requests/minute | Concurrent uploads | |------|----------------|--------------------| | Pro | 60 | 5 | | Team | 300 | 20 | | Enterprise | Custom | Custom |

Rate limit headers are included in every response:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1711900800

Monitor X-RateLimit-Remaining and throttle requests as needed rather than hitting the limit and handling 429 responses.

Ready to integrate? Generate your API key at purgit.io/dashboard/api-keys and start scanning documents programmatically. Free tier includes 50 API scans per month.

[Scan a File Free]