Parse-Only Workflow

When to use this

Document preview — see the markdown before writing extraction fields
Custom pipelines — feed parsed markdown into your own LLM, search index, or RAG system
Debugging — understand how a document is parsed (blocks, tables, reading order)
Lightweight integration — you only need the text + per-block confidence

End-to-end

A parse-only workflow has one parse node and no edges. For documents with mixed content (tables, figures, dense text), use agentic mode by flipping mode to "agentic" — each block is routed through a typed strategy (text / table / figure) instead of a single batched call.

curl
TypeScript
Python

# 1. Create the workflow (standard mode shown; for agentic, add "mode": "agentic")
curl -X POST 'https://api.anyformat.ai/v2/workflows/' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
  -d '{
    "name": "Document Parser",
    "description": "Parse documents to markdown without extraction",
    "nodes": [{"id": "parse_1", "type": "parse"}],
    "edges": []
  }'

# 2. Submit a document
curl -X POST 'https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/run/' \
  -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
  -F 'file=@document.pdf'

# 3. Poll for results
while true; do
  RESPONSE=$(curl -s -w '\n%{http_code}' \
    -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
    "https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/files/COLLECTION_ID/results/")
  STATUS=$(echo "$RESPONSE" | tail -1)
  [ "$STATUS" = "200" ] && echo "$RESPONSE" | head -n -1 && break
  [ "$STATUS" != "412" ] && echo "Error: $STATUS" && exit 1
  sleep 5
done

import { Anyformat } from "@anyformat/sdk";

const af = new Anyformat({ apiKey: process.env.ANYFORMAT_API_KEY! });
const file: File = /* a File with .name set */;

// Standard mode. For agentic, pass { mode: "agentic" } to parse().
const workflow = await af
  .workflow("Document Parser", "Parse documents to markdown without extraction")
  .parse()
  .create();

const run = await workflow.run(file);
const result = await run.wait();

const markdown = result.parse?.markdown ?? "";
// parseConfidence may be null in some parse modes — fall back to layoutConfidence.
const confidence = result.parse?.parseConfidence ?? result.parse?.layoutConfidence;
console.log(`confidence: ${confidence}`);
console.log(markdown.slice(0, 500));

import os
from anyformat.sdk import Client

client = Client(api_key=os.environ["ANYFORMAT_API_KEY"])

# Standard mode. For agentic, pass mode="agentic" to .parse().
workflow = (
    client.workflow("Document Parser")
    .parse()
    .create()
)

result = workflow.run("document.pdf").wait()
# parse_confidence may be None in some parse modes — fall back to layout_confidence.
# Explicit `is not None` check (not `or`) so a real 0.0 confidence still wins.
confidence = (
    result.parse.parse_confidence
    if result.parse.parse_confidence is not None
    else result.parse.layout_confidence
)
print(f"confidence: {confidence}")
print((result.parse.markdown or "")[:500])

For a parse-only workflow, the extractions array is empty in the response — only the parse section is populated:

{
  "collection_id": "069dcc2c-e14c-7606-8000-2ee4fb17b4e1",
  "verification_url": "https://app.anyformat.ai/workflows/.../files/...",
  "parse": {
    "markdown": "<DOCUMENT id=\"1\" page=\"1\">...",
    "text": "...",
    "parse_confidence": 94.2,
    "layout_confidence": 87.4,
    "blocks": [/* … */]
  },
  "classifications": [],
  "splits": [],
  "extractions": []
}

Confidence

The response carries two document-level confidence rollups plus a per-block attribute inside the rendered markdown.

Field	Source	Range	Use for
`parse.parse_confidence`	Char-weighted mean of per-block LLM logprobs. `null` when no blocks have logprob-based confidence.	80–99 typical	Triage — “is this doc trustworthy enough to extract from?”
`parse.layout_confidence`	Char-weighted mean of YOLO layout-segmentation scores. Present whenever blocks exist.	30–60 typical	Fallback when `parse_confidence` is null. Measures “is this region a table?”, not “is the parsed content accurate?”
`data-confidence` attribute on each `<section>`	Per-block — calibrated logprobs when available, YOLO fallback otherwise	0–100	UI highlight — “dim low-confidence regions”

Agentic mode caveat: agentic strategies don’t always populate per-block logprobs (e.g. text-bytes-first never calls an LLM), so parse_confidence is often null and callers fall back to layout_confidence. For calibrated parser confidence (apples-to-apples with extraction confidence), use mode="standard".

Example output

<DOCUMENT id="1" page="1">
<section id="p1_b1" data-type="title" data-confidence="94.2" data-bbox="x0:0.034,y0:0.037,x1:0.436,y1:0.053">

# ACME CORPORATION

</section>

<section id="p1_b2" data-type="text" data-confidence="91.7" data-bbox="x0:0.031,y0:0.055,x1:0.304,y1:0.140">

123 Business Ave, Suite 100
New York, NY 10001

</section>

<section id="p1_b3" data-type="table" data-confidence="88.4" data-bbox="x0:0.025,y0:0.219,x1:0.976,y1:0.807">

<table>
<thead>
<tr><th data-cell-id="r0c0">Item</th><th data-cell-id="r0c1">Quantity</th><th data-cell-id="r0c2">Price</th></tr>
</thead>
<tbody>
<tr><td data-cell-id="r1c0">Widget A</td><td data-cell-id="r1c1">10</td><td data-cell-id="r1c2">$25.00</td></tr>
</tbody>
</table>

</section>
</DOCUMENT>

Each <section> includes:

id — block identifier (page and block number)
data-type — semantic type: title, text, table, picture, other
data-confidence — 0–100 confidence in this block (parser-calibrated when available, YOLO fallback otherwise)
data-bbox — bounding-box coordinates (normalised 0–1)
data-cell-id — table-cell identifiers for precise referencing

Tips

Reuse one workflow. Create a single parse-only workflow and submit all documents to it — no need for separate workflows per document type.
Tables are preserved — output as HTML <table> with cell IDs.
Multi-page — each page gets its own <DOCUMENT> block with a page attribute. All pages processed automatically.
Use parse.parse_confidence for triage — filter low-confidence documents into a manual-review queue before downstream processing. Fall back to parse.layout_confidence when parse_confidence is null (e.g. agentic mode).

Next steps

Agentic Parse to Markdown

The full agentic walkthrough with confidence details

Quickstart

Add extraction fields to get structured data

Invoice Processing

A full extraction example with nested fields and line items

Response formats

Full schema of the results endpoint

​When to use this

​End-to-end

​Confidence

​Example output

​Tips

​Next steps

Agentic Parse to Markdown

Quickstart

Invoice Processing

Response formats

When to use this

End-to-end

Confidence

Example output

Tips

Next steps