Agentic Parse to Markdown

Each block in the document is routed through a typed strategy (text / table / figure / dense-table-agent) instead of a single batched call — better tables and richer figure descriptions.

What is agentic parsing?

Standard parsing sends every block to one LLM call with one prompt. Agentic parsing first routes each block (using a fast LLM with the page image) to the right specialised strategy:

Block type	Strategy	Why
Plain text	`text-bytes-first`	Cheap, fast — bypasses LLM entirely when source bytes are extractable
Tables	`dense-table-agent`	Tool-calling agent that handles spans, merged cells, multi-page tables
Figures / charts	`single-call vision`	LLM with image input for structured figure descriptions

End-to-end

curl
TypeScript
Python

# 1. Create an agentic parse-only workflow
curl -X POST 'https://api.anyformat.ai/v2/workflows/' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
  -d '{
    "name": "Agentic parse-only",
    "nodes": [{"id": "parse_1", "type": "parse", "mode": "agentic"}],
    "edges": []
  }'

# 2. Submit a document
curl -X POST 'https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/run/' \
  -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
  -F 'file=@document.pdf'

# 3. Poll for results (agentic mode takes 30–90s for a typical 3-page document)
curl -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
  'https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/files/COLLECTION_ID/results/'

import { Anyformat } from "@anyformat/sdk";

const af = new Anyformat({ apiKey: process.env.ANYFORMAT_API_KEY! });
const file: File = /* a File with .name set */;

const workflow = await af
  .workflow("Agentic parse-only")
  .parse({ mode: "agentic" })
  .create();

const run = await workflow.run(file);
const result = await run.wait({ timeoutMs: 180_000, pollMs: 3_000 });  // agentic runs are slower

// Agentic mode doesn't emit parseConfidence — fall back to layoutConfidence.
const confidence = result.parse?.parseConfidence ?? result.parse?.layoutConfidence;
console.log(`document confidence: ${confidence}`);
console.log(result.parse?.markdown?.slice(0, 500) ?? "");

import os
from anyformat.sdk import Client

client = Client(api_key=os.environ["ANYFORMAT_API_KEY"])

workflow = (
    client.workflow("Agentic parse-only")
    .parse(mode="agentic")
    .create()
)

result = workflow.run("document.pdf").wait(timeout=180)  # agentic runs are slower

# Agentic mode doesn't emit parse_confidence — fall back to layout_confidence.
# Explicit `is not None` check (not `or`) so a real 0.0 confidence still wins.
confidence = (
    result.parse.parse_confidence
    if result.parse.parse_confidence is not None
    else result.parse.layout_confidence
)
print(f"document confidence: {confidence}")
print((result.parse.markdown or "")[:500])

Agentic mode takes longer than standard mode — typically 30–90s for a 3-page document — because each block hits its own LLM strategy. Use webhooks instead of polling for production workloads.

Sample response

{
  "collection_id": "069dcc2c-e14c-7606-8000-2ee4fb17b4e1",
  "verification_url": "https://app.anyformat.ai/workflows/.../files/...",
  "parse": {
    "markdown": "<DOCUMENT id=\"1\" page=\"1\">...",
    "text": "...",
    "parse_confidence": null,
    "layout_confidence": 47.8,
    "blocks": [/* … */]
  },
  "classifications": [],
  "splits": [],
  "extractions": []
}

Per-block vs. document confidence

The response carries two document-level rollups plus a per-block data-confidence attribute inside the markdown.

<section id="p1_b1" data-type="text" data-confidence="94.2" data-bbox="x0:0.034,y0:0.037,x1:0.436,y1:0.053">

# ACME CORPORATION

</section>

Field	Use for
`parse.parse_confidence` — char-weighted mean of per-block LLM logprobs (typical 80–99); `null` when no block had logprob-based confidence	Triage: “is this doc worth processing further?”
`parse.layout_confidence` — char-weighted mean of YOLO layout-segmentation scores (typical 30–60); always present when blocks exist	Fallback when `parse_confidence` is null. Measures “is this region a table?”, not “is the parsed content accurate?”
`data-confidence` per `<section>`	UI highlighting: dim or flag low-confidence regions inline

The document-level values are character-weighted means — a 500-char paragraph at 80% counts ~100× more than a 5-char header at 99%.

Mode caveat: in agentic mode, per-block strategies don’t always populate parser logprobs (e.g. the fast text-bytes-first strategy never calls an LLM). When logprobs are absent, parse_confidence is null and callers fall back to layout_confidence. For calibrated parser confidence comparable to extraction confidence (80–99 range), use mode="standard".

Use with AI coding agents

Building an integration on top of anyformat? Install the anyformat Claude Code skill so Claude (or any compatible agent) knows the right endpoints, payloads, and gotchas out of the box. See Coding assistant for installation and example prompts.

Next steps

Create workflow reference

Full reference for the typed-graph endpoint

Parse-only workflow

Standard parse-only cookbook with the same endpoint

Add extraction

Add an extract node after the parse step to pull structured fields

Webhooks

Skip polling — receive extraction.completed events

​What is agentic parsing?

​End-to-end

​Sample response

​Per-block vs. document confidence

​Use with AI coding agents

​Next steps

Create workflow reference

Parse-only workflow

Add extraction

Webhooks

What is agentic parsing?

End-to-end

Sample response

Per-block vs. document confidence

Use with AI coding agents

Next steps