Coming soon

Send a document.
Get back clean text.

Not an LLM wrapper. doXtract routes every document to the right engine and returns structured data you can trust, at any scale.

You are on the list. We will email you the moment the API opens.
Invoice (PDF)
Clean result
POST /v1/extract200 OK
{
  "document_type": "invoice",
  "invoice_number": "INV-2026-0788",
  "vendor": "Northwind Trading Co.",
  "total": "$48,200.00",
  "due_date": "2026-02-14"
}
Statement (XLSX)
Clean result
POST /v1/extract200 OK
{
  "document_type": "bank_statement",
  "account": "****4417",
  "period": "Dec 2026",
  "closing_balance": "$12,840.55",
  "transactions": 54
}
Receipt (scan)
Clean result
POST /v1/extract200 OK
{
  "document_type": "receipt",
  "merchant": "Rivertown Coffee",
  "date": "2026-01-09",
  "total": "$18.40",
  "tax": "$1.40"
}

An LLM can read one document. doXtract is the extraction layer.

Pipe it through an LLM yourself

  • Pays the model for every page, even plain digital text
  • A different JSON shape on every run
  • Invents values when the page is unclear
  • No control over scanned pages or images
  • You build the queue, retries, and scaling

doXtract

  • Routes each document to the right engine, only paying for vision when a page needs it
  • The same fields, every time, via Templates and Profiles
  • Returns exactly what is on the page
  • Reads digital, scanned, and image documents in one API
  • Batching, webhooks, and scale-out built in

How it works

One inbox. The right engine for every page.

Any document
Router
Fast textdigital PDFs and Office files
Vision OCRscans and images
Converteverything else

We only invoke a vision model when a page cannot be read any other way. You get exact text, faster, without paying for a model you did not need.

Compose a pipeline, not just an extract.

  1. Extract
  2. OCR
  3. Structure
  4. Redact
  5. Summarize
  6. Translate
  7. Chunk for RAG

The same fields, every time.

Start with a prebuilt Template, or build your own Profile. Either way, every matching document returns the same structured shape.

130+ prebuilt Templates

Built from real forms across 14 categories. Consistent fields out of the box.

InvoicesReceiptsTax forms HR formsStatementsContracts

Build your own Profiles

Build an Extraction Profile from your own documents once. Every matching document afterward returns the same fields in the same shape.

Your documenttoyour Profiletorepeatable JSON

What teams build with doXtract.

From invoices to lab results to lease agreements, doXtract returns the fields you need.

Accounts payable

Turn invoices and purchase orders into clean line-item data for your ledger.

Lending and underwriting

Pull figures from bank statements, paystubs, and tax forms in seconds.

Real estate and mortgage

Read appraisals, inspections, and closing disclosures into structured records.

Property rentals

Capture applicant details and lease terms from rental applications.

Healthcare and claims

Extract fields from patient intake, lab results, and insurance claim forms.

Expense management

Pull merchant, date, and totals from receipt photos and scans.

Contracts and legal

Capture parties, dates, and key terms, and redact PII before storage.

KYC and onboarding

Read IDs and application forms into structured customer profiles.

AI and RAG pipelines

Convert any document into clean, chunked text your agents can use.

One API, 30+ file types.

PDFDOCXXLSXPPTXODT ODSRTFPNGJPGTIFF HTMLCSVMDXMLJSON EMLMSGePuband more

Built for production, not a notebook.

  • Batch thousands of documents in one call
  • Webhooks when each job finishes
  • Idempotency keys and clear rate limits
  • Python and JavaScript SDKs
  • Agent-ready over MCP
  • Scales out automatically under load
extract.shREST API
# one request, any document
curl https://api.doxtract.io/v1/extract \
  -H "Authorization: Bearer dxt_live_..." \
  -F "file=@contract.pdf" \
  -F "profile=ap-invoices"

# 200 OK
{
  "fields": { "total": "$48,200.00" },
  "pages": 12
}

Be first when the API opens.

You are on the list. We will email you the moment the API opens.
Powered by
Cloudflare Stripe Supabase