> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Parse-Only Workflow

> Convert documents to structured markdown without extraction — useful for previewing, debugging, or building custom pipelines

## When to use this

* **Document preview** — see the markdown before writing extraction fields
* **Custom pipelines** — feed parsed markdown into your own LLM, search index, or RAG system
* **Debugging** — understand how a document is parsed (blocks, tables, reading order)
* **Lightweight integration** — you only need the text + per-block confidence

## End-to-end

A parse-only workflow has one `parse` node and no edges. For documents with mixed content (tables, figures, dense text), use **agentic mode** by flipping `mode` to `"agentic"` — each block is routed through a typed strategy (text / table / figure) instead of a single batched call.

<Tabs>
  <Tab title="curl" icon="terminal">
    ```bash theme={null}
    # 1. Create the workflow (standard mode shown; for agentic, add "mode": "agentic")
    curl -X POST 'https://api.anyformat.ai/v2/workflows/' \
      -H 'Content-Type: application/json' \
      -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
      -d '{
        "name": "Document Parser",
        "description": "Parse documents to markdown without extraction",
        "nodes": [{"id": "parse_1", "type": "parse"}],
        "edges": []
      }'

    # 2. Submit a document
    curl -X POST 'https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/run/' \
      -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
      -F 'file=@document.pdf'

    # 3. Poll for results
    while true; do
      RESPONSE=$(curl -s -w '\n%{http_code}' \
        -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
        "https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/files/COLLECTION_ID/results/")
      STATUS=$(echo "$RESPONSE" | tail -1)
      [ "$STATUS" = "200" ] && echo "$RESPONSE" | head -n -1 && break
      [ "$STATUS" != "412" ] && echo "Error: $STATUS" && exit 1
      sleep 5
    done
    ```
  </Tab>

  <Tab title="TypeScript" icon="js" iconType="brands">
    ```typescript theme={null}
    import { Anyformat } from "@anyformat/sdk";

    const af = new Anyformat({ apiKey: process.env.ANYFORMAT_API_KEY! });
    const file: File = /* a File with .name set */;

    // Standard mode. For agentic, pass { mode: "agentic" } to parse().
    const workflow = await af
      .workflow("Document Parser", "Parse documents to markdown without extraction")
      .parse()
      .create();

    const run = await workflow.run(file);
    const result = await run.wait();

    const markdown = result.parse?.markdown ?? "";
    // parseConfidence may be null in some parse modes — fall back to layoutConfidence.
    const confidence = result.parse?.parseConfidence ?? result.parse?.layoutConfidence;
    console.log(`confidence: ${confidence}`);
    console.log(markdown.slice(0, 500));
    ```
  </Tab>

  <Tab title="Python" icon="python" iconType="brands">
    ```python theme={null}
    import os
    from anyformat.sdk import Client

    client = Client(api_key=os.environ["ANYFORMAT_API_KEY"])

    # Standard mode. For agentic, pass mode="agentic" to .parse().
    workflow = (
        client.workflow("Document Parser")
        .parse()
        .create()
    )

    result = workflow.run("document.pdf").wait()
    # parse_confidence may be None in some parse modes — fall back to layout_confidence.
    # Explicit `is not None` check (not `or`) so a real 0.0 confidence still wins.
    confidence = (
        result.parse.parse_confidence
        if result.parse.parse_confidence is not None
        else result.parse.layout_confidence
    )
    print(f"confidence: {confidence}")
    print((result.parse.markdown or "")[:500])
    ```
  </Tab>
</Tabs>

For a parse-only workflow, the `extractions` array is empty in the response — only the `parse` section is populated:

```json theme={null}
{
  "collection_id": "069dcc2c-e14c-7606-8000-2ee4fb17b4e1",
  "verification_url": "https://app.anyformat.ai/workflows/.../files/...",
  "parse": {
    "markdown": "<DOCUMENT id=\"1\" page=\"1\">...",
    "text": "...",
    "parse_confidence": 94.2,
    "layout_confidence": 87.4,
    "blocks": [/* … */]
  },
  "classifications": [],
  "splits": [],
  "extractions": []
}
```

## Confidence

The response carries two document-level confidence rollups plus a per-block attribute inside the rendered markdown.

| Field                                           | Source                                                                                             | Range         | Use for                                                                                                             |
| ----------------------------------------------- | -------------------------------------------------------------------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------- |
| `parse.parse_confidence`                        | Char-weighted mean of per-block LLM logprobs. `null` when no blocks have logprob-based confidence. | 80–99 typical | Triage — "is this doc trustworthy enough to extract from?"                                                          |
| `parse.layout_confidence`                       | Char-weighted mean of YOLO layout-segmentation scores. Present whenever blocks exist.              | 30–60 typical | Fallback when `parse_confidence` is null. Measures "is this region a table?", not "is the parsed content accurate?" |
| `data-confidence` attribute on each `<section>` | Per-block — calibrated logprobs when available, YOLO fallback otherwise                            | 0–100         | UI highlight — "dim low-confidence regions"                                                                         |

<Note>
  **Agentic mode caveat:** agentic strategies don't always populate per-block logprobs (e.g. `text-bytes-first` never calls an LLM), so `parse_confidence` is often `null` and callers fall back to `layout_confidence`. For calibrated parser confidence (apples-to-apples with extraction confidence), use `mode="standard"`.
</Note>

## Example output

```markdown theme={null}
<DOCUMENT id="1" page="1">
<section id="p1_b1" data-type="title" data-confidence="94.2" data-bbox="x0:0.034,y0:0.037,x1:0.436,y1:0.053">

# ACME CORPORATION

</section>

<section id="p1_b2" data-type="text" data-confidence="91.7" data-bbox="x0:0.031,y0:0.055,x1:0.304,y1:0.140">

123 Business Ave, Suite 100
New York, NY 10001

</section>

<section id="p1_b3" data-type="table" data-confidence="88.4" data-bbox="x0:0.025,y0:0.219,x1:0.976,y1:0.807">

<table>
<thead>
<tr><th data-cell-id="r0c0">Item</th><th data-cell-id="r0c1">Quantity</th><th data-cell-id="r0c2">Price</th></tr>
</thead>
<tbody>
<tr><td data-cell-id="r1c0">Widget A</td><td data-cell-id="r1c1">10</td><td data-cell-id="r1c2">$25.00</td></tr>
</tbody>
</table>

</section>
</DOCUMENT>
```

Each `<section>` includes:

* **`id`** — block identifier (page and block number)
* **`data-type`** — semantic type: `title`, `text`, `table`, `picture`, `other`
* **`data-confidence`** — 0–100 confidence in this block (parser-calibrated when available, YOLO fallback otherwise)
* **`data-bbox`** — bounding-box coordinates (normalised 0–1)
* **`data-cell-id`** — table-cell identifiers for precise referencing

## Tips

* **Reuse one workflow.** Create a single parse-only workflow and submit all documents to it — no need for separate workflows per document type.
* **Tables are preserved** — output as HTML `<table>` with cell IDs.
* **Multi-page** — each page gets its own `<DOCUMENT>` block with a `page` attribute. All pages processed automatically.
* **Use `parse.parse_confidence` for triage** — filter low-confidence documents into a manual-review queue before downstream processing. Fall back to `parse.layout_confidence` when `parse_confidence` is null (e.g. agentic mode).

## Next steps

<CardGroup cols={2}>
  <Card title="Agentic Parse to Markdown" icon="wand-magic-sparkles" href="/guides/recipes/agentic-parse-to-markdown">
    The full agentic walkthrough with confidence details
  </Card>

  <Card title="Quickstart" icon="route" href="/guides/quickstart">
    Add extraction fields to get structured data
  </Card>

  <Card title="Invoice Processing" icon="file-invoice-dollar" href="/guides/recipes/invoice-processing">
    A full extraction example with nested fields and line items
  </Card>

  <Card title="Response formats" icon="reply" href="/api-reference/response-formats">
    Full schema of the results endpoint
  </Card>
</CardGroup>
