Documentation Index
Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt
Use this file to discover all available pages before exploring further.
When to use this
- Document preview — see the markdown before writing extraction fields
- Custom pipelines — feed parsed markdown into your own LLM, search index, or RAG system
- Debugging — understand how a document is parsed (blocks, tables, reading order)
- Lightweight integration — you only need the text + per-block confidence
End-to-end
Python package + class names are provisional.
pip install anyformat-sdk and from anyformat.sdk import Client work today, but both are expected to change before the official launch — pin the version you ship with.parse node and no edges. For documents with mixed content (tables, figures, dense text), use agentic mode by flipping mode to "agentic" — each block is routed through a typed strategy (text / table / figure) instead of a single batched call.
- curl
- TypeScript
- Python
extractions array is empty in the response — only the parse section is populated:
Confidence
The response carries two document-level confidence rollups plus a per-block attribute inside the rendered markdown.| Field | Source | Range | Use for |
|---|---|---|---|
parse.parse_confidence | Char-weighted mean of per-block LLM logprobs. null when no blocks have logprob-based confidence. | 80–99 typical | Triage — “is this doc trustworthy enough to extract from?” |
parse.layout_confidence | Char-weighted mean of YOLO layout-segmentation scores. Present whenever blocks exist. | 30–60 typical | Fallback when parse_confidence is null. Measures “is this region a table?”, not “is the parsed content accurate?” |
data-confidence attribute on each <section> | Per-block — calibrated logprobs when available, YOLO fallback otherwise | 0–100 | UI highlight — “dim low-confidence regions” |
Agentic mode caveat: agentic strategies don’t always populate per-block logprobs (e.g.
text-bytes-first never calls an LLM), so parse_confidence is often null and callers fall back to layout_confidence. For calibrated parser confidence (apples-to-apples with extraction confidence), use mode="standard".Example output
<section> includes:
id— block identifier (page and block number)data-type— semantic type:title,text,table,picture,otherdata-confidence— 0–100 confidence in this block (parser-calibrated when available, YOLO fallback otherwise)data-bbox— bounding-box coordinates (normalised 0–1)data-cell-id— table-cell identifiers for precise referencing
Tips
- Reuse one workflow. Create a single parse-only workflow and submit all documents to it — no need for separate workflows per document type.
- Tables are preserved — output as HTML
<table>with cell IDs. - Multi-page — each page gets its own
<DOCUMENT>block with apageattribute. All pages processed automatically. - Use
parse.parse_confidencefor triage — filter low-confidence documents into a manual-review queue before downstream processing. Fall back toparse.layout_confidencewhenparse_confidenceis null (e.g. agentic mode).
Next steps
Agentic Parse to Markdown
The full agentic walkthrough with confidence details
Quickstart
Add extraction fields to get structured data
Invoice Processing
A full extraction example with nested fields and line items
Response formats
Full schema of the results endpoint
