> ## Documentation Index > Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt > Use this file to discover all available pages before exploring further. # Parse-Only Workflow > Convert documents to structured markdown without extraction — useful for previewing, debugging, or building custom pipelines ## When to use this * **Document preview** — see the markdown before writing extraction fields * **Custom pipelines** — feed parsed markdown into your own LLM, search index, or RAG system * **Debugging** — understand how a document is parsed (blocks, tables, reading order) * **Lightweight integration** — you only need the text + per-block confidence ## End-to-end A parse-only workflow has one `parse` node and no edges. For documents with mixed content (tables, figures, dense text), use **agentic mode** by flipping `mode` to `"agentic"` — each block is routed through a typed strategy (text / table / figure) instead of a single batched call. ```bash theme={null} # 1. Create the workflow (standard mode shown; for agentic, add "mode": "agentic") curl -X POST 'https://api.anyformat.ai/v2/workflows/' \ -H 'Content-Type: application/json' \ -H "Authorization: Bearer $ANYFORMAT_API_KEY" \ -d '{ "name": "Document Parser", "description": "Parse documents to markdown without extraction", "nodes": [{"id": "parse_1", "type": "parse"}], "edges": [] }' # 2. Submit a document curl -X POST 'https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/run/' \ -H "Authorization: Bearer $ANYFORMAT_API_KEY" \ -F 'file=@document.pdf' # 3. Poll for results while true; do RESPONSE=$(curl -s -w '\n%{http_code}' \ -H "Authorization: Bearer $ANYFORMAT_API_KEY" \ "https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/files/COLLECTION_ID/results/") STATUS=$(echo "$RESPONSE" | tail -1) [ "$STATUS" = "200" ] && echo "$RESPONSE" | head -n -1 && break [ "$STATUS" != "412" ] && echo "Error: $STATUS" && exit 1 sleep 5 done ``` ```typescript theme={null} import { Anyformat } from "@anyformat/sdk"; const af = new Anyformat({ apiKey: process.env.ANYFORMAT_API_KEY! }); const file: File = /* a File with .name set */; // Standard mode. For agentic, pass { mode: "agentic" } to parse(). const workflow = await af .workflow("Document Parser", "Parse documents to markdown without extraction") .parse() .create(); const run = await workflow.run(file); const result = await run.wait(); const markdown = result.parse?.markdown ?? ""; // parseConfidence may be null in some parse modes — fall back to layoutConfidence. const confidence = result.parse?.parseConfidence ?? result.parse?.layoutConfidence; console.log(`confidence: ${confidence}`); console.log(markdown.slice(0, 500)); ``` ```python theme={null} import os from anyformat.sdk import Client client = Client(api_key=os.environ["ANYFORMAT_API_KEY"]) # Standard mode. For agentic, pass mode="agentic" to .parse(). workflow = ( client.workflow("Document Parser") .parse() .create() ) result = workflow.run("document.pdf").wait() # parse_confidence may be None in some parse modes — fall back to layout_confidence. # Explicit `is not None` check (not `or`) so a real 0.0 confidence still wins. confidence = ( result.parse.parse_confidence if result.parse.parse_confidence is not None else result.parse.layout_confidence ) print(f"confidence: {confidence}") print((result.parse.markdown or "")[:500]) ``` For a parse-only workflow, the `extractions` array is empty in the response — only the `parse` section is populated: ```json theme={null} { "collection_id": "069dcc2c-e14c-7606-8000-2ee4fb17b4e1", "verification_url": "https://app.anyformat.ai/workflows/.../files/...", "parse": { "markdown": "...", "text": "...", "parse_confidence": 94.2, "layout_confidence": 87.4, "blocks": [/* … */] }, "classifications": [], "splits": [], "extractions": [] } ``` ## Confidence The response carries two document-level confidence rollups plus a per-block attribute inside the rendered markdown. | Field | Source | Range | Use for | | ----------------------------------------------- | -------------------------------------------------------------------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------- | | `parse.parse_confidence` | Char-weighted mean of per-block LLM logprobs. `null` when no blocks have logprob-based confidence. | 80–99 typical | Triage — "is this doc trustworthy enough to extract from?" | | `parse.layout_confidence` | Char-weighted mean of YOLO layout-segmentation scores. Present whenever blocks exist. | 30–60 typical | Fallback when `parse_confidence` is null. Measures "is this region a table?", not "is the parsed content accurate?" | | `data-confidence` attribute on each `

` | Per-block — calibrated logprobs when available, YOLO fallback otherwise | 0–100 | UI highlight — "dim low-confidence regions" | **Agentic mode caveat:** agentic strategies don't always populate per-block logprobs (e.g. `text-bytes-first` never calls an LLM), so `parse_confidence` is often `null` and callers fall back to `layout_confidence`. For calibrated parser confidence (apples-to-apples with extraction confidence), use `mode="standard"`. ## Example output ```markdown theme={null}

# ACME CORPORATION

123 Business Ave, Suite 100 New York, NY 10001

Item	Quantity	Price
Widget A	10	$25.00

``` Each `

` includes: * **`id`** — block identifier (page and block number) * **`data-type`** — semantic type: `title`, `text`, `table`, `picture`, `other` * **`data-confidence`** — 0–100 confidence in this block (parser-calibrated when available, YOLO fallback otherwise) * **`data-bbox`** — bounding-box coordinates (normalised 0–1) * **`data-cell-id`** — table-cell identifiers for precise referencing ## Tips * **Reuse one workflow.** Create a single parse-only workflow and submit all documents to it — no need for separate workflows per document type. * **Tables are preserved** — output as HTML `` with cell IDs. * **Multi-page** — each page gets its own `` block with a `page` attribute. All pages processed automatically. * **Use `parse.parse_confidence` for triage** — filter low-confidence documents into a manual-review queue before downstream processing. Fall back to `parse.layout_confidence` when `parse_confidence` is null (e.g. agentic mode). ## Next steps The full agentic walkthrough with confidence details Add extraction fields to get structured data A full extraction example with nested fields and line items Full schema of the results endpoint