> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Python SDK Guide

> Complete guide for the in-house anyformat Python SDK

The hand-written Python SDK for anyformat, built on `httpx`. It mirrors the [TypeScript SDK](/api-reference/sdks/typescript) and exposes a fluent builder over the same [typed-graph workflow definition](/concepts/workflows).

## Installation

Requires Python 3.13 (the SDK pins to `>=3.13,<3.14` today; this is expected to loosen before the official launch).

```bash theme={null}
pip install anyformat
```

The PyPI distribution is named `anyformat`; the import path uses the dotted namespace `anyformat.sdk` (transport) and `anyformat.workflow` (schema factories).

## Authentication

Pass the API key to `Client`, or set `ANYFORMAT_API_KEY` in the environment and read it from there.

```python theme={null}
import os
from anyformat.sdk import Client

# Explicit
client = Client(api_key="af_...")

# Or from the environment
client = Client(api_key=os.environ["ANYFORMAT_API_KEY"])
```

See [Authentication](/api-reference/authentication) for how to mint an API key.

## Basic usage

Full flow: build a workflow with a fluent builder, run a document, await the typed result.

```python theme={null}
import os
from anyformat.sdk import Client
from anyformat.workflow import Schema

client = Client(api_key=os.environ["ANYFORMAT_API_KEY"])

# 1. Build & create the workflow (parse → extract)
workflow = (
    client.workflow("Invoice Processor")
    .parse()
    .extract([
        Schema.string("invoice_number", "The unique invoice identifier"),
        Schema.float("total_amount",    "Total invoice amount"),
        Schema.date("issue_date",       "Date when the invoice was issued"),
    ])
    .create()
)
print(f"Workflow id: {workflow.id}")

# 2. Submit a document — accepts Path, str (path), or bytes.
run = workflow.run("invoice.pdf")

# 3. Wait for results. Polls /results/ (412 → still processing, 200 → done).
result = run.wait()  # default: timeout=300s, poll_interval=3s

print(result.fields["invoice_number"].value)
print(result.fields["total_amount"].value)
```

The `Result` exposes typed scalar accessors via `result.fields[name]` for linear workflows (one untagged extraction) and the full envelope at `result.raw` for everything else (parse markdown, multiple extractions per split, classifications, splits).

## Async usage

`AsyncClient` is the async sibling — every builder, handle, and result method has an async counterpart.

```python theme={null}
import asyncio
import os

from anyformat.sdk import AsyncClient
from anyformat.workflow import Schema

async def main():
    client = AsyncClient(api_key=os.environ["ANYFORMAT_API_KEY"])
    try:
        workflow = await (
            client.workflow("Invoice Processor")
            .parse()
            .extract([Schema.string("invoice_number", "...")])
            .create()
        )
        run = await workflow.run("invoice.pdf")
        result = await run.wait()
        print(result.fields["invoice_number"].value)
    finally:
        await client.aclose()

asyncio.run(main())
```

## Builder methods

All node types in the [typed graph](/concepts/workflows) are exposed as fluent methods. `parse` is required; the others are optional and can be chained in any topology the API allows.

| Method                                                                   | What it adds                        | Notes                                                                                              |
| ------------------------------------------------------------------------ | ----------------------------------- | -------------------------------------------------------------------------------------------------- |
| `.parse(*, mode='standard', prompt_hint=None, figure_enhancement=False)` | The required parse node             | All args keyword-only. `mode='agentic'` for the per-block agentic strategy.                        |
| `.classify(*categories)`                                                 | A classify node                     | Categories from `ClassifyCategory(id=..., name=..., description=...)`                              |
| `.split(*rules, route_from=None)`                                        | A splitter node                     | Use `route_from=` after `.classify()` to wire which branch fans out                                |
| `.extract(fields, *, branch=None, lookup_files=None)`                    | An extract node                     | `branch=` is required after `.classify()` / `.split()`; `Schema.*` factories produce field objects |
| `.validate(*rules, branch=None)`                                         | A validate node                     | Attaches to the most recent extract (or the one named by `branch`)                                 |
| `.create()`                                                              | Persists the workflow               | Returns a `Workflow` handle you can `.run(...)` on                                                 |
| `.build()`                                                               | Same shape without the network call | Returns a `WorkflowDefinition` — useful for tests and inspection                                   |

`Workflow.run(file=None, *, text=None)` accepts either `file: bytes | pathlib.Path | str` (a file path) or `text=` (raw text — useful for emails / plain-text bodies). Exactly one must be set. Returns a `Run` handle; `Run.wait(timeout=300, poll_interval=3)` returns the `Result`.

## Managing workflows

The client lists and deletes the workflows on your account:

| Method                                                     | What it does               | Returns                                                           |
| ---------------------------------------------------------- | -------------------------- | ----------------------------------------------------------------- |
| `client.list_workflows(page=1, page_size=20, status=None)` | One page of your workflows | `list[Workflow]` — each is `.run(...)`-able (`page_size` max 100) |
| `client.delete_workflow(workflow_id)`                      | Soft-deletes a workflow    | The deleted `workflow_id` (a 404 raises `NotFound`)               |

```python theme={null}
# List your workflows, then run the first one
workflows = client.list_workflows(page_size=50)
for wf in workflows:
    print(wf.id, wf.name)

result = workflows[0].run("invoice.pdf").wait()

# Delete by id
client.delete_workflow(workflows[0].id)
```

Both have async equivalents on `AsyncClient`: `await client.list_workflows(...)` returns `list[AsyncWorkflow]` and `await client.delete_workflow(id)` returns the id.

## Reading results

```python theme={null}
result = run.wait()

# Scalar fields, for linear workflows (single untagged extraction)
inv = result.fields["invoice_number"]          # ExtractedField
print(inv.value, inv.confidence, inv.evidence)

# Parse output (markdown + per-block confidence)
parse_markdown = result.parse.markdown if result.parse else None

# Anything else — nested objects, split workflows, classifications:
# read result.raw, the validated wire envelope as a dict.
for extraction in result.raw["extractions"]:
    for name, field in extraction["fields"].items():
        print(name, field)
```

See [Response formats](/api-reference/response-formats) for the full shape of every section.

## Error handling

The SDK raises typed exceptions you can catch:

```python theme={null}
import time

from anyformat.sdk import (
    APIError,
    BadRequest,        # 400 — validation error, bad request body
    Unauthorized,      # 401 — invalid or missing API key
    Forbidden,         # 403 — disallowed by the server
    NotFound,          # 404 — workflow / file / webhook does not exist
    RateLimited,       # 429 — slow down; .retry_after has the suggested delay
    ServerError,       # 5xx — internal anyformat error
    SDKTimeout,        # local timeout — `.wait()` exceeded `timeout`
)

try:
    result = workflow.run("invoice.pdf").wait(timeout=60)
except RateLimited as e:
    time.sleep(e.retry_after or 5)
except NotFound:
    print("workflow or file is gone")
except APIError as e:
    print(f"HTTP {e.status_code}: {e.detail}")
except SDKTimeout:
    print("polling timed out — try webhooks for production")
```

`APIError` exposes `.status_code`, `.error_code`, and `.detail`. The wire-format error codes (e.g. `EXTRACTION_FAILED`, `RATE_LIMITED`) are documented at [Errors](/api-reference/errors).

## Webhooks (not on the SDK surface yet)

`Client` doesn't expose webhook endpoints today. Until it does, register and delete webhooks with `httpx` directly:

```python theme={null}
import os

import httpx

webhook = httpx.post(
    "https://api.anyformat.ai/v2/webhooks/",
    headers={"Authorization": f"Bearer {os.environ['ANYFORMAT_API_KEY']}"},
    json={"url": "https://your-server.com/hook", "events": ["extraction.completed"]},
).json()
# webhook["secret"] is only returned once — store it.
```

See [Webhooks](/api-reference/webhooks/overview) for the payload shape and signature verification.

## Links

* [PyPI](https://pypi.org/project/anyformat/) — `pip install anyformat`
* [TypeScript SDK](/api-reference/sdks/typescript) — the same fluent shape in TS
* [Coding assistant](/guides/coding-assistant) — let Claude drive anyformat from your editor
