Redact
PII, PHI, and PCI redaction for transcripts and free text. Backed by the GLiNER-PII model family co-authored with Knowledgator. Runs inside your boundary on the same Helm chart as the rest of the stack.
The redaction object
{
"id": "redact_abc123",
"object": "redaction",
"status": "completed",
"created": 1712345678,
"model": "gliner-pii-large",
"vertical": "healthcare",
"mode": "replace",
"redacted_text": "Caller [PERSON] called about MRN [MEDICAL_RECORD], blood type [BLOOD_TYPE].",
"entities": [
{"type": "PERSON", "text": "Sarah Mitchell", "replacement": "[PERSON]", "start": 7, "end": 21, "score": 0.97},
{"type": "MEDICAL_RECORD", "text": "7748321", "replacement": "[MEDICAL_RECORD]", "start": 40, "end": 47, "score": 0.94},
{"type": "BLOOD_TYPE", "text": "a+", "replacement": "[BLOOD_TYPE]", "start": 60, "end": 62, "score": 0.91}
],
"stats": {"total_entities": 3, "duration_ms": 142},
"metadata": {"external_id": "call-55123"}
}Create a redaction
Submit free text, a structured transcript, or an existing transcript_id. Returns inline when small; queues a job and dispatches a webhook when large. Same auth and webhook patterns as the rest of the Wordcab API.
Body
Plain text to redact. Exactly one of input, utterances, or transcript_id is required.
Structured transcript. Each item is an object with speaker, start, end, text. Redaction is applied to text; everything else is preserved.
Redact an existing Wordcab transcript by id. The redacted result is returned and (if you choose) written back to the transcript record.
Model id. Defaults to the deployment's configured Redact model, typically gliner-pii-large for the open reference or wordcab-redact-{vertical} for the enterprise build.
One of healthcare, finance, legal, contact_center, general. Selects the enterprise fine-tune. Defaults to general.
Entity types to detect. Defaults to the HIPAA Safe Harbor set when vertical=healthcare, the PCI + KYC set when vertical=finance, and a general PII set otherwise. See the Redact product page for the vetted catalog per vertical.
Free-text entity labels matched zero-shot by the GLiNER detection model. Use for tenant-specific identifiers, e.g. INTERNAL_TICKET_ID, PATIENT_ROOM_NUMBER.
One of detect (spans only), replace (default, [PERSON]-style placeholders), pseudonymize (stable per-job pseudonyms preserve referential integrity), mask (per-character *, preserves length).
Detection threshold in [0.0, 1.0]. Default 0.45. Higher values reduce false positives at the cost of recall.
One of auto (default), none, paragraph, utterance. Overrides the automatic windowing the model uses for long inputs.
ISO 639-1 code. Default en. Enterprise variants ship in English, French, Spanish, German, Portuguese; additional languages on request.
Destination for the redaction.completed webhook on this specific job. Overrides the deployment default.
If true, stream entity spans as soon as each chunk is processed. Useful for inline redaction in a voice agent loop.
If false, the redacted artifact is delivered and immediately purged from server-side storage. Recommended for highly regulated workloads. Default true.
Free-form string:string map returned on the resulting object. Up to 20 keys.
Example, healthcare transcript, replace mode
curl -X POST https://api.wordcab.com/v1/redact \\
-H "Authorization: Bearer $WORDCAB_API_KEY" \\
-H "Content-Type: application/json" \\
-d '{
"vertical": "healthcare",
"mode": "replace",
"entities": ["PERSON", "DATE_OF_BIRTH", "MEDICAL_RECORD", "BLOOD_TYPE"],
"utterances": [
{"speaker": 0, "start": 12.0, "end": 15.0, "text": "Can I get your date of birth?"},
{"speaker": 1, "start": 16.0, "end": 19.0, "text": "Yes, it's March 4, 1981."},
{"speaker": 1, "start": 20.0, "end": 24.0, "text": "My MRN is 7748321, blood type a+."}
]
}'Example, finance, pseudonymize mode, async via webhook
curl -X POST https://api.wordcab.com/v1/redact \\
-H "Authorization: Bearer $WORDCAB_API_KEY" \\
-H "Content-Type: application/json" \\
-d '{
"transcript_id": "transcript_abc123",
"vertical": "finance",
"mode": "pseudonymize",
"store": false,
"webhook_url": "https://your.app/webhooks/wordcab/redact",
"custom_entities": ["INTERNAL_TICKET_ID"]
}'Response
Synchronous inline jobs return a redaction object directly. Asynchronous jobs return {"id": "redact_...", "status": "queued"} and dispatch redaction.completed to your webhook when ready.
Retrieve a redaction
Returns the full redaction object. Useful when you missed a webhook delivery or are reconciling state.
List redactions
Query parameters: limit, cursor, status, vertical, created_after, created_before.
Delete a redaction
Hard-deletes the redaction record. 204 on success. Deletions are permanent. If store=false was set at create time, the record was already purged on delivery.
Inline redaction inside a transcription job
If you only want redaction as a side-effect of transcription, you do not need this endpoint at all. Pass redact: ["pii", "phi", "pci"] on the Transcripts API request, the same model runs inside the transcript pipeline and the redacted spans appear on the resulting transcript object. Use /v1/redact when you need vertical-specific tuning, custom entities, or to redact text that did not originate from Wordcab transcription.