Parse-Only Workflow

Convert any document (PDF, DOCX, images) to structured markdown without defining extraction fields. This is useful when you want the parsed content to feed into your own pipeline, or when you need to preview how anyformat sees a document before setting up extraction.

When to Use This

Document preview — See the markdown before writing extraction fields
Custom pipelines — Feed parsed markdown into your own LLM, search index, or RAG system
Debugging — Understand how a document is parsed (blocks, tables, reading order)
Lightweight integration — You only need the text, not structured extraction

Create a Parse-Only Workflow

A parse-only workflow has no extraction fields. Pass a single placeholder field (required by the API) — it won’t be used.

curl -X POST 'https://api.anyformat.ai/v2/workflows/' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -d '{
    "name": "Document Parser",
    "description": "Parse documents to markdown without extraction",
    "fields": [
      {"name": "_placeholder", "description": "unused", "data_type": "string"}
    ]
  }'

Save the workflow_id — you’ll reuse it for every parse request.

Remove the Extract Node

By default, new workflows include both a parse and an extract node. To skip extraction entirely, update the workflow graph in the anyformat platform to remove the extract node, leaving only the parse node.

Removing the extract node means the workflow will only convert documents to markdown. No structured data extraction will run, which makes processing faster and cheaper.

Submit a Document

curl -X POST 'https://api.anyformat.ai/v2/workflows/YOUR_WORKFLOW_ID/run/' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@document.pdf'

Retrieve the Parsed Markdown

Poll until processed, then fetch results. The results endpoint returns a unified JSON response with the parsed markdown for each file.

# Poll until processed
while true; do
  STATUS=$(curl -s -o /dev/null -w '%{http_code}' \
    -H 'Authorization: Bearer YOUR_API_KEY' \
    "https://api.anyformat.ai/v2/files/${run_id}/extraction/")
  [ "$STATUS" = "200" ] && break
  [ "$STATUS" != "412" ] && echo "Error: $STATUS" && exit 1
  sleep 5
done

# Get results
curl -s -H 'Authorization: Bearer YOUR_API_KEY' \
  "https://api.anyformat.ai/v2/workflows/${workflow_id}/results/"

The response is a JSON object keyed by filename. Each file contains a results object with the output from each workflow node. Pass file_id to get results for a single file.

{
  "document.pdf": {
    "results": {
      "parse": {
        "markdown": "<DOCUMENT id=\"1\" page=\"1\">..."
      }
    }
  }
}

For workflows with extraction fields, the response also includes an extraction key:

{
  "invoice.pdf": {
    "results": {
      "parse": {
        "markdown": "<DOCUMENT ...>..."
      },
      "extraction": {
        "total": {"value": "1,234.56", "confidence": 95.2},
        "date": {"value": "2026-01-15", "confidence": 88.0}
      }
    }
  }
}

Example Output

The parsed markdown preserves document structure with semantic blocks:

<DOCUMENT id="1" page="1">
<section id="p1_b1" data-type="title" data-bbox="x0:0.034,y0:0.037,x1:0.436,y1:0.053">

# ACME CORPORATION

</section>

<section id="p1_b2" data-type="text" data-bbox="x0:0.031,y0:0.055,x1:0.304,y1:0.140">

123 Business Ave, Suite 100
New York, NY 10001

</section>

<section id="p1_b3" data-type="table" data-bbox="x0:0.025,y0:0.219,x1:0.976,y1:0.807">

<table>
<thead>
<tr>
<th data-cell-id="r0c0">Item</th>
<th data-cell-id="r0c1">Quantity</th>
<th data-cell-id="r0c2">Price</th>
</tr>
</thead>
<tbody>
<tr>
<td data-cell-id="r1c0">Widget A</td>
<td data-cell-id="r1c1">10</td>
<td data-cell-id="r1c2">$25.00</td>
</tr>
</tbody>
</table>

</section>
</DOCUMENT>

Each <section> includes:

id — Block identifier (page and block number)
data-type — Semantic type: title, text, table, other (figures/images)
data-bbox — Bounding box coordinates (normalized 0-1)
data-cell-id — Table cell identifiers for precise cell referencing

Markdown Content

The parse.markdown field contains parsed markdown with section tags, table structure, and embedded images. Figures and charts are included as base64-encoded <img> tags, which makes the payload larger for image-heavy documents.

Parse Node Configuration

You can configure the parse node in the anyformat platform by clicking on the parse node in the workflow graph. Available options:

Setting	Description
Engine	`Fast` for quick analysis, `Performant` for higher accuracy
Figure Enhancement	When enabled, uses an LLM to extract structured data from charts and images (e.g., axis labels, data points). Off by default.
Prompt Hint	Optional text to guide the parser — useful for domain-specific documents (e.g., “This is a medical lab report, preserve all numeric values exactly”)

Figure Enhancement adds an extra LLM call per figure block, which increases processing time and cost. Only enable it if you need structured descriptions of charts and images.

Tips

Parse-only workflows skip the extraction LLM call entirely, making them faster and cheaper than full extraction workflows.

Reuse one workflow — Create a single parse-only workflow and submit all documents to it. No need for separate workflows per document type.
Tables are preserved — The parser detects tables and outputs them as HTML <table> elements with cell IDs for precise referencing.
Multi-page handling — Each page gets its own <DOCUMENT> block with page number. All pages are processed automatically.
Use visual for images — The raw variant strips images. If your documents contain figures, charts, or logos you need to preserve, use variant=visual.

Overview

SDKs

Endpoints

Parse-Only Workflow

Parse-Only Workflow

When to Use This

Create a Parse-Only Workflow

Remove the Extract Node

Submit a Document

Retrieve the Parsed Markdown

Example Output

Markdown Content

Parse Node Configuration

Tips

Next Steps

Complete Workflow Guide

Invoice Processing

Overview

SDKs

Endpoints

​Parse-Only Workflow

​When to Use This

​Create a Parse-Only Workflow

​Remove the Extract Node

​Submit a Document

​Retrieve the Parsed Markdown

​Example Output

​Markdown Content

​Parse Node Configuration

​Tips

​Next Steps

Complete Workflow Guide

Invoice Processing

Parse-Only Workflow

When to Use This

Create a Parse-Only Workflow

Remove the Extract Node

Submit a Document

Retrieve the Parsed Markdown

Example Output

Markdown Content

Parse Node Configuration

Tips

Next Steps