Parse-Only Workflow
Convert any document (PDF, DOCX, images) to structured markdown without defining extraction fields. This is useful when you want the parsed content to feed into your own pipeline, or when you need to preview how anyformat sees a document before setting up extraction.When to Use This
- Document preview — See the markdown before writing extraction fields
- Custom pipelines — Feed parsed markdown into your own LLM, search index, or RAG system
- Debugging — Understand how a document is parsed (blocks, tables, reading order)
- Lightweight integration — You only need the text, not structured extraction
Create a Parse-Only Workflow
A parse-only workflow has no extraction fields. Pass a single placeholder field (required by the API) — it won’t be used.workflow_id — you’ll reuse it for every parse request.
Remove the Extract Node
By default, new workflows include both a parse and an extract node. To skip extraction entirely, update the workflow graph in the anyformat platform to remove the extract node, leaving only the parse node.Removing the extract node means the workflow will only convert documents to markdown. No structured data extraction will run, which makes processing faster and cheaper.
Submit a Document
Retrieve the Parsed Markdown
Poll until processed, then fetch results. The results endpoint returns a unified JSON response with the parsed markdown for each file.results object with the output from each workflow node. Pass file_id to get results for a single file.
extraction key:
Example Output
The parsed markdown preserves document structure with semantic blocks:<section> includes:
id— Block identifier (page and block number)data-type— Semantic type:title,text,table,other(figures/images)data-bbox— Bounding box coordinates (normalized 0-1)data-cell-id— Table cell identifiers for precise cell referencing
Markdown Content
Theparse.markdown field contains parsed markdown with section tags, table structure, and embedded images. Figures and charts are included as base64-encoded <img> tags, which makes the payload larger for image-heavy documents.
Parse Node Configuration
You can configure the parse node in the anyformat platform by clicking on the parse node in the workflow graph. Available options:| Setting | Description |
|---|---|
| Engine | Fast for quick analysis, Performant for higher accuracy |
| Figure Enhancement | When enabled, uses an LLM to extract structured data from charts and images (e.g., axis labels, data points). Off by default. |
| Prompt Hint | Optional text to guide the parser — useful for domain-specific documents (e.g., “This is a medical lab report, preserve all numeric values exactly”) |
Figure Enhancement adds an extra LLM call per figure block, which increases processing time and cost. Only enable it if you need structured descriptions of charts and images.
Tips
Parse-only workflows skip the extraction LLM call entirely, making them faster and cheaper than full extraction workflows.
- Reuse one workflow — Create a single parse-only workflow and submit all documents to it. No need for separate workflows per document type.
- Tables are preserved — The parser detects tables and outputs them as HTML
<table>elements with cell IDs for precise referencing. - Multi-page handling — Each page gets its own
<DOCUMENT>block with page number. All pages are processed automatically. - Use
visualfor images — Therawvariant strips images. If your documents contain figures, charts, or logos you need to preserve, usevariant=visual.
Next Steps
Complete Workflow Guide
Add extraction fields to get structured data from your documents
Invoice Processing
See a full extraction example with nested fields and line items
