Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt

Use this file to discover all available pages before exploring further.

This recipe uses the object field type with nested_fields to capture tabular line-item data.

Workflow fields

We recommend creating this workflow in the anyformat platform where you can test with sample invoices and iterate on field descriptions. Copy the workflow ID to use with the API.
FieldTypeDescription
invoice_numberstringThe unique invoice identifier
vendor_namestringName of the company issuing the invoice
issue_datedateDate the invoice was issued
due_datedatePayment due date
subtotalfloatAmount before tax
tax_amountfloatTotal tax applied
total_amountfloatFinal amount due including tax
currencyenumCurrency code (USD, EUR, GBP, …)
line_itemsobjectIndividual line items (description, quantity, unit_price, amount)

End-to-end

Python package + class names are provisional. pip install anyformat-sdk and from anyformat.sdk import Client work today, but both are expected to change before the official launch — pin the version you ship with.
Three-step flow: create the workflow, run a document, poll the results endpoint until it returns 200.
# 1. Create the workflow
curl -X POST 'https://api.anyformat.ai/v2/workflows/' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
  -d '{
    "name": "Invoice Processor",
    "description": "Extract invoice header and line items",
    "nodes": [
      {"id": "parse_1", "type": "parse"},
      {
        "id": "extract_1",
        "type": "extract",
        "extraction_schema": {
          "fields": [
            {"name": "invoice_number", "description": "The unique invoice identifier", "data_type": "string"},
            {"name": "vendor_name",    "description": "Name of the company that issued the invoice", "data_type": "string"},
            {"name": "issue_date",     "description": "Date the invoice was issued", "data_type": "date"},
            {"name": "due_date",       "description": "Date by which payment is due", "data_type": "date"},
            {"name": "subtotal",       "description": "Amount before tax", "data_type": "float"},
            {"name": "tax_amount",     "description": "Total tax amount", "data_type": "float"},
            {"name": "total_amount",   "description": "Final total amount due including tax", "data_type": "float"},
            {
              "name": "currency",
              "description": "Currency of the invoice amounts",
              "data_type": "enum",
              "enum_options": [
                {"name": "USD", "description": "US Dollar"},
                {"name": "EUR", "description": "Euro"},
                {"name": "GBP", "description": "British Pound"}
              ]
            },
            {
              "name": "line_items",
              "description": "Individual line items listed on the invoice",
              "data_type": "object",
              "nested_fields": [
                {"name": "description", "description": "Description of the item or service", "data_type": "string"},
                {"name": "quantity",    "description": "Number of units",                    "data_type": "integer"},
                {"name": "unit_price",  "description": "Price per unit",                     "data_type": "float"},
                {"name": "amount",      "description": "Total amount for this line item",    "data_type": "float"}
              ]
            }
          ]
        }
      }
    ],
    "edges": [{"source": "parse_1", "target": "extract_1"}]
  }'
# → { "id": "550e8400-…", ... }   ← workflow_id

# 2. Run a document
curl -X POST 'https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/run/' \
  -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
  -F 'file=@invoice.pdf'
# → { "id": "069dcc2c-…", "status": "pending", ... }   ← collection_id

# 3. Poll for results (412 while processing, 200 when done)
curl -H "Authorization: Bearer $ANYFORMAT_API_KEY" \
  'https://api.anyformat.ai/v2/workflows/WORKFLOW_ID/files/COLLECTION_ID/results/'
For production integrations, use webhooks instead of polling — they deliver results immediately and don’t consume your rate limit.

Example response

The full envelope is documented in Response formats. The top-level keys are collection_id, verification_url, parse, classifications, splits, and extractions. Each scalar field follows the ExtractedField shape (value, value_override, verification_status, confidence, evidence).
{
  "collection_id": "069dcc2c-e14c-7606-8000-2ee4fb17b4e1",
  "verification_url": "https://app.anyformat.ai/workflows/.../files/...",
  "parse": {"markdown": "<DOCUMENT id=\"1\" page=\"1\">...", "text": "...", "parse_confidence": 94.2, "layout_confidence": 87.4, "blocks": []},
  "classifications": [],
  "splits": [],
  "extractions": [
    {
      "split_name": null,
      "partition": null,
      "fields": {
        "invoice_number": {
          "value": "INV-2024-0847",
          "confidence": 97.0,
          "evidence": [{"text": "Invoice #INV-2024-0847", "page_number": 1}],
          "verification_status": "not_verified"
        },
        "total_amount": {
          "value": 4087.50,
          "confidence": 96.0,
          "evidence": [{"text": "Total: $4,087.50", "page_number": 2}],
          "verification_status": "not_verified"
        },
        "currency": {
          "value": "USD",
          "confidence": 98.0,
          "evidence": [{"text": "USD", "page_number": 1}],
          "verification_status": "not_verified"
        }
      }
    }
  ]
}

Tips

  • Write specific field descriptions. “Total amount due including tax” extracts better than just “total”.
  • If invoices span multiple currencies, add the enum_options for all currencies you expect.
  • Multi-page invoices are handled automatically; no special configuration needed.

Next steps

Field types

Object, enum, and other complex field types

Response formats

Full schema of every section in the results envelope