Coming soon

Documents in.
The same JSON, every time.

Structured document extraction at scale. One API routes each page to the right engine and returns the fields you pinned. Batch ten documents or ten thousand.

You are on the list. We will email you the moment the API opens.
Invoice (PDF)

INVOICE

Tech Services LLC

Invoice #TS-2026-001

Date: March 4, 2026

Due: April 3, 2026

#DescriptionQtyUnitTotal
1Laptop Repair, Dell XPS 151$100.00$100.00
2Software Install, Office 3652$50.00$100.00
3Data Recovery, 500GB HDD1$150.00$150.00
4Network Setup, 3 Endpoints3$75.00$225.00
5Hardware Upgrade, RAM 32GB2$120.00$240.00
6Security Audit, Pen Test2$150.00$300.00
7Web Hosting, cPanel Annual12$25.00$300.00

Subtotal$3,145.00

Tax (10%)$314.50

Grand Total$3,459.50

vendor invoice_number line_items tax total
Clean result
POST /v1/extract200 OK
{
  "document_type": "invoice",
  "vendor": "Tech Services LLC",
  "invoice_number": "TS-2026-001",
  "issue_date": "2026-03-04",
  "due_date": "2026-04-03",
  "line_items": [
    { "description": "Laptop Repair, Dell XPS 15", "qty": 1, "total": "$100.00" },
    { "description": "Data Recovery, 500GB HDD", "qty": 1, "total": "$150.00" }
    // 12 more
  ],
  "subtotal": "$3,145.00",
  "tax": "$314.50",
  "total": "$3,459.50"
}

Simplified preview. API responses include a confidence score per field.

Same invoice. Two weeks later. Different shape.

Ask an LLM to extract the same document twice and you get two answers. Ask doXtract and you get the same shape every time, because the fields are pinned by a Template or Profile.

LLM run 1 LLM run 2

Pipe it through a model

Run 1
{
  "vendor_name": "Tech Services",
  "amount": "3459.50",
  "date": "April 3",
  "tax": "maybe included"
}
Run 2
{
  "vendor": "Tech Services LLC",
  Total": "$3,459.50",
  due: "2026-04-03"
  // invented field "late_fee"
}
  • Keys drift between runs
  • Values invented when the page is unclear
  • You pay the model for every page, even digital text
doXtract

Pinned by a Profile

Every run
{
  "document_type": "invoice",
  "vendor": "Tech Services LLC",
  "invoice_number": "TS-2026-001",
  "issue_date": "2026-03-04",
  "due_date": "2026-04-03",
  "line_items": [ ... ],
  "subtotal": "$3,145.00",
  "tax": "$314.50",
  "total": "$3,459.50"
}
  • Same keys, same types, same order
  • Only what is on the page
  • Routes each page to the cheapest engine that can read it

Structured extraction at scale.

Teams evaluating Textract, Azure Document Intelligence, Reducto, or a DIY stack usually need the same thing: reliable fields across document types, without rebuilding glue code every time a vendor changes their layout.

Cloud OCR APIs
Textract and Azure return text or isolated fields per processor. You still wire validation, retries, webhooks, and multi-stage pipelines yourself.
Dev-first APIs
Reducto and peers parse well, but OCR, extract, split, and edit are separate endpoints. You orchestrate stages, templates, and delivery across multiple calls.
doXtract
One developer API for OCR, structured extraction, redaction, and chunking. Profiles pin the schema. Same call shape for ten documents or ten thousand.

How it works

One upload. The right engine for every page.

Any document
Router
Fast textdigital PDFs and Office files
Vision OCRscans and images
Converteverything else

We only invoke a vision model when a page cannot be read any other way. You get exact text, faster, without paying for a model you did not need.

Compose a pipeline, not just an extract.

  1. Extract
  2. OCR
  3. Structure
  4. Redact
  5. Summarize
  6. Translate
  7. Chunk for RAG

The same fields, every time.

Templates ship ready from the library. Profiles are yours to define. Either pins the field list before your first API call.

130+ prebuilt Templates

Built from real forms across 14 categories. Consistent fields out of the box.

InvoicesReceiptsTax forms HR formsStatementsContracts

Build your own Profiles

Build an Extraction Profile from your own documents once. Every matching document afterward returns the same fields in the same shape.

Your documenttoyour Profiletorepeatable JSON

What teams build with doXtract.

Six pipelines teams wire up first, from invoice intake to agent-ready chunks.

Accounts payable

Turn invoices and purchase orders into clean line-item data for your ledger.

Lending and underwriting

Pull figures from bank statements, paystubs, and tax forms in seconds.

Expense management

Pull merchant, date, and totals from receipt photos and scans.

Healthcare and claims

Extract fields from patient intake, lab results, and insurance claim forms.

Contracts and legal

Capture parties, dates, and key terms, and redact PII before storage.

AI and RAG pipelines

Convert any document into clean, chunked text your agents can use.

One API, 30+ file types.

PDFDOCXXLSXPPTXODT ODSRTFPNGJPGTIFF HTMLCSVMDXMLJSON EMLMSGePuband more

Built for production volume.

  • Batch thousands of documents in one call
  • Webhooks when each job finishes
  • Idempotency keys and clear rate limits
  • Python and JavaScript SDKs
  • Agent-ready over MCP
  • Scales out automatically under load
extract.shREST API
# one request, any document
curl https://api.doxtract.io/v1/extract \
  -H "Authorization: Bearer dxt_live_..." \
  -F "file=@invoice.pdf" \
  -F "profile=ap-invoices"

# 200 OK
{
  "job_id":        "job_8fK2mNp9vR",
  "status":        "completed",
  "pages":         2,
  "quality_score": 0.97,
  "fields": {
    "vendor":         { "value": "Tech Services LLC", "confidence": 0.98 },
    "invoice_number": { "value": "TS-2026-001",        "confidence": 0.99 },
    "issue_date":     { "value": "2026-03-04",          "confidence": 0.97 },
    "due_date":       { "value": "2026-04-03",          "confidence": 0.97 },
    "line_items": {
      "value": [
        { "description": "Laptop Repair, Dell XPS 15", "qty": 1, "total": "$100.00" },
        { "description": "Data Recovery, 500GB HDD",   "qty": 1, "total": "$150.00" }
      ],
      "confidence": 0.94
    },
    "subtotal":       { "value": "$3,145.00",          "confidence": 0.96 },
    "tax":            { "value": "$314.50",            "confidence": 0.95 },
    "total":          { "value": "$3,459.50",          "confidence": 0.97 }
  }
}

Be first when the API opens.

You are on the list. We will email you the moment the API opens.
Powered by
Cloudflare Stripe Supabase