Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt

Use this file to discover all available pages before exploring further.

The hand-written Python SDK for anyformat, built on httpx. It mirrors the TypeScript SDK and exposes a fluent builder over the same typed-graph workflow definition.
Package and class names are provisional. pip install anyformat-sdk and from anyformat.sdk import Client work today, but both are expected to change before the official launch — pin the version you ship with.

Installation

Requires Python 3.13 (the SDK pins to >=3.13,<3.14 today; this is expected to loosen before the official launch).
pip install anyformat-sdk
The PyPI distribution is named anyformat-sdk; the import path uses the dotted namespace anyformat.sdk (transport) and anyformat.workflow (schema factories).

Authentication

Pass the API key to Client, or set ANYFORMAT_API_KEY in the environment and read it from there.
import os
from anyformat.sdk import Client

# Explicit
client = Client(api_key="af_...")

# Or from the environment
client = Client(api_key=os.environ["ANYFORMAT_API_KEY"])
See Authentication for how to mint an API key.

Basic usage

Full flow: build a workflow with a fluent builder, run a document, await the typed result.
import os
from anyformat.sdk import Client
from anyformat.workflow import Schema

client = Client(api_key=os.environ["ANYFORMAT_API_KEY"])

# 1. Build & create the workflow (parse → extract)
workflow = (
    client.workflow("Invoice Processor")
    .parse()
    .extract([
        Schema.string("invoice_number", "The unique invoice identifier"),
        Schema.float("total_amount",    "Total invoice amount"),
        Schema.date("issue_date",       "Date when the invoice was issued"),
    ])
    .create()
)
print(f"Workflow id: {workflow.id}")

# 2. Submit a document — accepts Path, str (path), or bytes.
run = workflow.run("invoice.pdf")

# 3. Wait for results. Polls /results/ (412 → still processing, 200 → done).
result = run.wait()  # default: timeout=300s, poll_interval=3s

print(result.fields["invoice_number"].value)
print(result.fields["total_amount"].value)
The Result exposes typed scalar accessors via result.fields[name] for linear workflows (one untagged extraction) and the full envelope at result.raw for everything else (parse markdown, multiple extractions per split, classifications, splits).

Async usage

AsyncClient is the async sibling — every builder, handle, and result method has an async counterpart.
import asyncio
import os

from anyformat.sdk import AsyncClient
from anyformat.workflow import Schema

async def main():
    client = AsyncClient(api_key=os.environ["ANYFORMAT_API_KEY"])
    try:
        workflow = await (
            client.workflow("Invoice Processor")
            .parse()
            .extract([Schema.string("invoice_number", "...")])
            .create()
        )
        run = await workflow.run("invoice.pdf")
        result = await run.wait()
        print(result.fields["invoice_number"].value)
    finally:
        await client.aclose()

asyncio.run(main())

Builder methods

All node types in the typed graph are exposed as fluent methods. parse is required; the others are optional and can be chained in any topology the API allows.
MethodWhat it addsNotes
.parse(*, mode='standard', prompt_hint=None, figure_enhancement=False)The required parse nodeAll args keyword-only. mode='agentic' for the per-block agentic strategy.
.classify(*categories)A classify nodeCategories from ClassifyCategory(id=..., name=..., description=...)
.split(*rules, route_from=None)A splitter nodeUse route_from= after .classify() to wire which branch fans out
.extract(fields, *, branch=None, lookup_files=None)An extract nodebranch= is required after .classify() / .split(); Schema.* factories produce field objects
.validate(*rules, branch=None)A validate nodeAttaches to the most recent extract (or the one named by branch)
.create()Persists the workflowReturns a Workflow handle you can .run(...) on
.build()Same shape without the network callReturns a WorkflowDefinition — useful for tests and inspection
Workflow.run(file=None, *, text=None) accepts either file: bytes | pathlib.Path | str (a file path) or text= (raw text — useful for emails / plain-text bodies). Exactly one must be set. Returns a Run handle; Run.wait(timeout=300, poll_interval=3) returns the Result.

Reading results

result = run.wait()

# Scalar fields, for linear workflows (single untagged extraction)
inv = result.fields["invoice_number"]          # ExtractedField
print(inv.value, inv.confidence, inv.evidence)

# Parse output (markdown + per-block confidence)
parse_markdown = result.parse.markdown if result.parse else None

# Anything else — nested objects, split workflows, classifications:
# read result.raw, the validated wire envelope as a dict.
for extraction in result.raw["extractions"]:
    for name, field in extraction["fields"].items():
        print(name, field)
See Response formats for the full shape of every section.

Error handling

The SDK raises typed exceptions you can catch:
import time

from anyformat.sdk import (
    APIError,
    BadRequest,        # 400 — validation error, bad request body
    Unauthorized,      # 401 — invalid or missing API key
    Forbidden,         # 403 — disallowed by the server
    NotFound,          # 404 — workflow / file / webhook does not exist
    RateLimited,       # 429 — slow down; .retry_after has the suggested delay
    ServerError,       # 5xx — internal anyformat error
    SDKTimeout,        # local timeout — `.wait()` exceeded `timeout`
)

try:
    result = workflow.run("invoice.pdf").wait(timeout=60)
except RateLimited as e:
    time.sleep(e.retry_after or 5)
except NotFound:
    print("workflow or file is gone")
except APIError as e:
    print(f"HTTP {e.status_code}: {e.detail}")
except SDKTimeout:
    print("polling timed out — try webhooks for production")
APIError exposes .status_code, .error_code, and .detail. The wire-format error codes (e.g. EXTRACTION_FAILED, RATE_LIMITED) are documented at Errors.

Webhooks (not on the SDK surface yet)

Client doesn’t expose webhook endpoints today. Until it does, register and delete webhooks with httpx directly:
import os

import httpx

webhook = httpx.post(
    "https://api.anyformat.ai/v2/webhooks/",
    headers={"Authorization": f"Bearer {os.environ['ANYFORMAT_API_KEY']}"},
    json={"url": "https://your-server.com/hook", "events": ["extraction.completed"]},
).json()
# webhook["secret"] is only returned once — store it.
See Webhooks for the payload shape and signature verification.