Docs/API/Redact

Redact

PII, PHI, and PCI redaction for transcripts and free text. Backed by the GLiNER-PII model family co-authored with Knowledgator. Runs inside your boundary on the same Helm chart as the rest of the stack.

The redaction object

json
{
  "id": "redact_abc123",
  "object": "redaction",
  "status": "completed",
  "created": 1712345678,
  "model": "gliner-pii-large",
  "vertical": "healthcare",
  "mode": "replace",
  "redacted_text": "Caller [PERSON] called about MRN [MEDICAL_RECORD], blood type [BLOOD_TYPE].",
  "entities": [
    {"type": "PERSON", "text": "Sarah Mitchell", "replacement": "[PERSON]", "start": 7, "end": 21, "score": 0.97},
    {"type": "MEDICAL_RECORD", "text": "7748321", "replacement": "[MEDICAL_RECORD]", "start": 40, "end": 47, "score": 0.94},
    {"type": "BLOOD_TYPE", "text": "a+", "replacement": "[BLOOD_TYPE]", "start": 60, "end": 62, "score": 0.91}
  ],
  "stats": {"total_entities": 3, "duration_ms": 142},
  "metadata": {"external_id": "call-55123"}
}

Create a redaction

POST/v1/redact

Submit free text, a structured transcript, or an existing transcript_id. Returns inline when small; queues a job and dispatches a webhook when large. Same auth and webhook patterns as the rest of the Wordcab API.

Body

inputstringOptional

Plain text to redact. Exactly one of input, utterances, or transcript_id is required.

utterancesarrayOptional

Structured transcript. Each item is an object with speaker, start, end, text. Redaction is applied to text; everything else is preserved.

transcript_idstringOptional

Redact an existing Wordcab transcript by id. The redacted result is returned and (if you choose) written back to the transcript record.

modelstringOptional

Model id. Defaults to the deployment's configured Redact model, typically gliner-pii-large for the open reference or wordcab-redact-{vertical} for the enterprise build.

verticalstringOptional

One of healthcare, finance, legal, contact_center, general. Selects the enterprise fine-tune. Defaults to general.

entitiesstring[]Optional

Entity types to detect. Defaults to the HIPAA Safe Harbor set when vertical=healthcare, the PCI + KYC set when vertical=finance, and a general PII set otherwise. See the Redact product page for the vetted catalog per vertical.

custom_entitiesstring[]Optional

Free-text entity labels matched zero-shot by the GLiNER detection model. Use for tenant-specific identifiers, e.g. INTERNAL_TICKET_ID, PATIENT_ROOM_NUMBER.

modestringOptional

One of detect (spans only), replace (default, [PERSON]-style placeholders), pseudonymize (stable per-job pseudonyms preserve referential integrity), mask (per-character *, preserves length).

thresholdnumberOptional

Detection threshold in [0.0, 1.0]. Default 0.45. Higher values reduce false positives at the cost of recall.

chunkingstringOptional

One of auto (default), none, paragraph, utterance. Overrides the automatic windowing the model uses for long inputs.

languagestringOptional

ISO 639-1 code. Default en. Enterprise variants ship in English, French, Spanish, German, Portuguese; additional languages on request.

webhook_urlstringOptional

Destination for the redaction.completed webhook on this specific job. Overrides the deployment default.

streambooleanOptional

If true, stream entity spans as soon as each chunk is processed. Useful for inline redaction in a voice agent loop.

storebooleanOptional

If false, the redacted artifact is delivered and immediately purged from server-side storage. Recommended for highly regulated workloads. Default true.

metadataobjectOptional

Free-form string:string map returned on the resulting object. Up to 20 keys.

Example, healthcare transcript, replace mode

bash
curl -X POST https://api.wordcab.com/v1/redact \\
  -H "Authorization: Bearer $WORDCAB_API_KEY" \\
  -H "Content-Type: application/json" \\
  -d '{
    "vertical": "healthcare",
    "mode": "replace",
    "entities": ["PERSON", "DATE_OF_BIRTH", "MEDICAL_RECORD", "BLOOD_TYPE"],
    "utterances": [
      {"speaker": 0, "start": 12.0, "end": 15.0, "text": "Can I get your date of birth?"},
      {"speaker": 1, "start": 16.0, "end": 19.0, "text": "Yes, it's March 4, 1981."},
      {"speaker": 1, "start": 20.0, "end": 24.0, "text": "My MRN is 7748321, blood type a+."}
    ]
  }'

Example, finance, pseudonymize mode, async via webhook

bash
curl -X POST https://api.wordcab.com/v1/redact \\
  -H "Authorization: Bearer $WORDCAB_API_KEY" \\
  -H "Content-Type: application/json" \\
  -d '{
    "transcript_id": "transcript_abc123",
    "vertical": "finance",
    "mode": "pseudonymize",
    "store": false,
    "webhook_url": "https://your.app/webhooks/wordcab/redact",
    "custom_entities": ["INTERNAL_TICKET_ID"]
  }'

Response

Synchronous inline jobs return a redaction object directly. Asynchronous jobs return {"id": "redact_...", "status": "queued"} and dispatch redaction.completed to your webhook when ready.

Retrieve a redaction

GET/v1/redact/{redaction_id}

Returns the full redaction object. Useful when you missed a webhook delivery or are reconciling state.

List redactions

GET/v1/redact

Query parameters: limit, cursor, status, vertical, created_after, created_before.

Delete a redaction

DELETE/v1/redact/{redaction_id}

Hard-deletes the redaction record. 204 on success. Deletions are permanent. If store=false was set at create time, the record was already purged on delivery.

Inline redaction inside a transcription job

If you only want redaction as a side-effect of transcription, you do not need this endpoint at all. Pass redact: ["pii", "phi", "pci"] on the Transcripts API request, the same model runs inside the transcript pipeline and the redacted spans appear on the resulting transcript object. Use /v1/redact when you need vertical-specific tuning, custom entities, or to redact text that did not originate from Wordcab transcription.