> ## Documentation Index
> Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Changelog

> Stay up to date with the latest changes and improvements to the anyformat API.

## July 2026

### July 7, 2026

**\[New + API]** API v3 launched — document packets and runs as first-class resources

The public API's new stable major is live under `/v3/`, rebuilt around three resources: **workflows**, **[document packets](/concepts/document-packets)** (the runnable bundle of 1+ files — v2's "file collection", renamed and promoted), and **[runs](/concepts/runs-and-results)** (one execution attempt of a workflow against a packet, with its own id, status, and results). The full reference lives in the new [v3 API tree](/api-reference-v3/introduction); v2 users can follow the [path-by-path migration guide](/api-reference-v3/migrating-from-v2). Highlights:

* **Flat run reads, no more 412-polling** — [`GET /v3/runs/{run_id}/`](/api-reference-v3/runs/get) always returns `200` with a `status` field; `results` carries the (unchanged) results envelope inline once `status` reaches `processed`. The `/results/` sub-path is gone.
* **One upload shape** — [`POST .../upload/`](/api-reference-v3/workflows/upload) groups 1–10 files into a single packet atomically, [`.../upload/run/`](/api-reference-v3/workflows/upload-and-run) does upload + run in one call and returns a `run_id`, and [`.../upload/from-url/`](/api-reference-v3/workflows/upload-from-url) imports 1–10 HTTPS URLs all-or-nothing.
* **Re-run without re-uploading** — [`POST /v3/document-packets/{id}/run/`](/api-reference-v3/document-packets/run) creates a fresh run each call; earlier runs stay readable.
* **The workflow read is the definition** — [`GET /v3/workflows/{workflow_id}/`](/api-reference-v3/workflows/get) returns the complete typed graph with a `persistent_id` per field, and [`PATCH`](/api-reference-v3/workflows/update) echoes that shape back: echo the `persistent_id` and a renamed field keeps its identity (analytics, ground truth, and quality metrics stay attached). There is no separate `/definition/` endpoint and no `PUT` in v3.
* **Keyset pagination everywhere** — every list returns `{items, next_cursor}` with `?limit=` (max 100) and `?cursor=`; no totals or page numbers.
* **Idempotent submission** — the upload and run-trigger POSTs accept an [`Idempotency-Key` header](/api-reference-v3/introduction#idempotency); a retry with the same key replays the original response instead of creating a duplicate packet or billing a second extraction.
* **`X-API-Version: 3.0.0`** is stamped on every `/v3/` response.

v2 is unchanged and keeps serving traffic through its announced deprecation window (see the July 3 entry below). Authentication, the error envelope, the results envelope, webhooks, and rate limits are identical across both versions.

### July 6, 2026

**\[Changed + API]** Dataset membership moves to collection-keyed, idempotent endpoints

Test-set membership is being renamed to the product's **dataset** vocabulary and re-keyed onto the collection — the identity a document already carries everywhere else (the `id` in every document response). Membership now lives under the workflow: `PUT /api/v3/workflows/{workflow_id}/dataset/files/{collection_id}/` adds a document to the dataset (optionally snapshotting a chosen extraction as ground truth via `{ "extraction_id": … }`), `DELETE` the same path removes it, and `POST /api/v3/workflows/{workflow_id}/dataset/files/bulk/` adds a filtered grid selection. All three writes are **idempotent** — re-adding a member or removing a non-member is a `200`/`204` no-op instead of a `409`. The test-file list rows now include a `collection_id` so callers can address membership without a second lookup.

The previous file-scoped routes `POST /api/v3/files/{file_id}/test-set/` and `DELETE /api/v3/files/{file_id}/test-set/` still work but are **deprecated** and now idempotent too; migrate to the `dataset/` paths above.

### July 3, 2026

**\[Deprecated + API]** v2 enters its deprecation window — `Deprecation`/`Sunset` headers on every response

With v3 as the new stable major, `/v2/` is frozen: no new features, existing behavior untouched. Every `/v2/` response now carries the [RFC 8594](https://datatracker.ietf.org/doc/html/rfc8594) deprecation signals — `Deprecation: Fri, 03 Jul 2026 00:00:00 GMT` and `Sunset: Sat, 03 Oct 2026 00:00:00 GMT`. Per-route `Link: <...>; rel="successor-version"` headers pointing at the v3 equivalent follow as the mappings are wired up. `/v3/` responses carry none of these headers.

**Action recommended:** wire an alert on the `Sunset` header so migration work can't be missed — [Versioning & deprecation](/api-reference/introduction#versioning--deprecation) shows a five-line client middleware that does it. Then plan the move with the [v2 → v3 migration guide](/api-reference-v3/migrating-from-v2). No client changes are required today; v2 keeps working until the sunset date.

## June 2026

### June 29, 2026

**\[New + API]** Deterministic validation rules — instant, free checks alongside AI rules

A **Validate** step can now mix two kinds of rules. **AI** rules stay as they were: you describe the condition in plain language and a model judges it. **Deterministic** rules are new — structured checks that run in plain code with no model call, so they're instant and **free**. Seven check types ship: **number range**, **date range** (with a relative `today` bound for not-expired checks), **arithmetic** (`subtotal + tax = total`, with a tolerance), **comparison** (a field vs a value or another field), **one-of** (value in an allowed set), **pattern** (regex), and **required** (present and non-empty). Build them in Studio with the new **AI / Deterministic** toggle on each rule, or author them via the API and SDKs — deterministic rules reference fields by name and carry a `check` payload, AI rules carry a `description`. Pattern checks run on a linear-time regex engine, so a user-supplied pattern can't hang an extraction. Existing workflows are unchanged: rules default to `kind: "ai"`. See [Validation rules](/guides/studio/validate-rules).

### June 22, 2026

**\[New]** Self-serve Business subscriptions, recurring monthly credits, and a billing incident banner

Business is now a self-serve plan. From the dashboard, an organisation owner or admin can click **Upgrade to Business** to open Stripe Checkout, enter a card, and become a paying Business customer in a single round-trip — no support ticket required. Once active, the organisation receives a fresh **100,000-credit monthly grant** on each successful renewal, replacing the previous month's grant rather than stacking on top of it. Enterprise is unchanged in spirit (still sales-led) but rides the same plumbing: support agrees on a custom monthly grant and price, mints a dedicated Stripe price, and pins both to the org on a `BillingPlan` row. Top-ups still work on top of the recurring grant for one-off credit additions.

Subscription customers get a **Customer Portal** link from the **Usage** page to manage their payment method and cancel the subscription themselves. When Stripe reports a failed renewal, a **billing incident banner** appears on every authenticated page describing the current state (`PastDue` during the grace window, `Suspended` if grace expires, `Canceled` after cancellation) with a deep-link to the action that resolves it — update the card, reactivate, or restart the subscription. Late-payment recovery is exact: if a card is fixed after the grace window has zeroed the grant, the next successful invoice restores `cycle_monthly_grant − consumed_during_unpaid_cycle` credits (floored at zero), so a customer never pays for and then loses what they already used.

This rollout is gated behind `BILLING_SUBSCRIPTIONS_ENABLED`. Existing Business and Enterprise organisations whose tier was set by the legacy one-shot upgrade keep their tier through a fallback path until their first real subscription invoice clears.

### June 18, 2026

**\[New + API]** `GET /v2/workflows/{workflow_id}/definition/` — round-trippable workflow definition

A new sub-resource returns the workflow's typed graph in exactly the shape `PUT /v2/workflows/{workflow_id}/` accepts. SDK callers can now GET, mutate one field, and PUT the result back without re-authoring the rest of the schema. The PATCH side preserves `persistent_id` per `(node_id, field_name)` so an unchanged echo cuts no new version (`ChangeKind.NONE` modulo edge order), and genuine edits keep `persistent_id` stable for unchanged-by-id fields — analytics, ground-truth, and monitoring continuity all hold across edits.

**\[Changed + API]** Field `description` is no longer required to be non-empty

The typed surface (`AnyField.description`) previously enforced `min_length=1`, but Path A persistence (the Studio frontend's save path) never enforced it — so legacy workflows can carry fields with empty descriptions that the new GET-definition endpoint needs to round-trip cleanly. Empty descriptions are now accepted on `POST /v2/workflows/`, `PUT /v2/workflows/{id}/`, and the generated SDKs. SDKs and docs still encourage non-empty descriptions — an empty value degrades extraction quality, it isn't structurally invalid.

**\[New + API]** `POST /v2/workflows/{workflow_id}/files/from-url/` — upload by URL instead of multipart

A new route accepts an HTTPS URL and tells anyformat to fetch the bytes server-side, returning the same `CreateCollectionResponse` shape as the multipart `POST /v2/workflows/{workflow_id}/files/` upload. Use it when the source document already lives in object storage you can presign (S3, GCS, R2, …) or at a public HTTPS URL — no need to stream bytes through your own backend. The fetch is bounded by a 10-second timeout and the same 20 MB cap as multipart upload, and runs through the outbound-URL SSRF guard so URLs resolving to non-globally-routable IPs (loopback, RFC1918, link-local, cloud-metadata, IPv4-mapped IPv6) are refused before any connection opens. HTTPS-only at the gateway; non-2xx upstream, DNS failure, or timeout surfaces as `422`. Documented under [Upload File from URL](/api-reference/files/create-from-url) and in [Files](/concepts/files#how-to-provide-a-file).

### June 15, 2026

**\[Fixed + API]** SDK consumers can pull per-file results through `/v2/workflows/{wf}/results/?file_id=...` again

The v3 route `GET /api/v3/files/{file_id}/results/` declared its `workflow_version_id` query parameter as `int`, but the v2 gateway forwards the 10-character `WorkflowVersion.external_id` string returned by `/api/v2/.../latest_version/`. Ninja's int coercion rejected the string with `422`, breaking any SDK call that chained `latest_version` into the per-file-results lookup. The route now accepts `workflow_version_id` as a string, resolves the `external_id` to the internal PK in-route, and returns `404` (not `422`) for an unknown id — matching every other v3 endpoint's "the public `external_id` is the single public identifier" contract.

### June 11, 2026

<Warning>
  **\[Breaking + API]** Object fields support only one level of nesting

  Object fields have always been limited to **one level of nesting** — children of an object must be non-object types — but the API surface advertised the opposite: `ObjectField.nested_fields` was typed recursively, so the generated OpenAPI spec listed object-in-object schemas as valid. The contract is now enforced at the source: posting a schema with a nested object to `POST /v2/workflows/` returns `422`, and the same constraint is mirrored in the regenerated TypeScript SDK types and the Python SDK CLI (which now returns a clean "only one level of nesting is supported" error instead of a raw pydantic traceback).

  **Action required:** schemas that declared an object inside an object must be flattened before submission. Single-level object fields are unaffected.
</Warning>

<Warning>
  **\[Breaking + API]** `POST /api/v3/extractions/progress` removed; replaced by the workflow-progress digest

  The batch progress endpoint `POST /api/v3/extractions/progress` is gone. Callers that want per-extraction progress alongside per-status file counts for a workflow version should switch to the new `POST /api/v3/extractions/workflow-progress-digest` (see the \[New + API] entry below). Single-extraction progress via `GET /api/v3/extractions/{uuid}/progress` is unchanged.
</Warning>

**\[New + API]** `POST /api/v3/extractions/workflow-progress-digest` — one round-trip for workflow polling

A new digest endpoint returns per-extraction progress for a caller-supplied set of `extraction_uuids` AND unfiltered per-status file counts for the surrounding `workflow_version_id` in a single response. The dashboard now polls this endpoint once every 4 seconds while files are processing, replacing three separate polls (`/extractions/progress` at 2.5s, and the v2 `/files/count/` and `/files/` polls at 5s). Polling stops automatically when no extraction is queued or in-progress. Counts come back `null` for unknown or cross-organisation workflow versions; cross-organisation extraction UUIDs are silently dropped, matching the existing single-extraction-progress semantics.

**\[New + API]** Cell-level grounding on extraction evidence

Table-row and scalar evidence now carries true cell provenance end-to-end: `evidence` and `mapped_locations` entries on extraction responses gain optional `cell` and `row` keys identifying the exact source cell that fed each value. Evidence without cell-level grounding (non-table datapoints, or table rows the model couldn't ground precisely) stays byte-identical to before — the new keys are emitted only when present. On the reference invoice that motivated this work, table rows with precise grounding went from 0% to 100% (626/626) with zero wrong-row assignments, and scalars whose value lives in a header table (invoice ids, dates) now resolve to one precise evidence on an LLM-cited page instead of a page-backfilled guess.

**\[New + API]** Test-set membership for files

Files can now be designated as part of a curated, locked test set. A new boolean `is_test_file` is returned on file responses, and two v3 routes manage membership: `POST /api/v3/files/{file_id}/test-set` to add and `DELETE /api/v3/files/{file_id}/test-set` to remove. Files in the test set are **frozen from re-verification**: every verification write (verify, unverify, edit, discard at the datapoint level; verify-extraction; add/delete/swap/verify-row/verify-all at the datarow level, across both v2 and v3) now returns `409 Conflict` with code `ground_truth_locked`, so an extraction designated as ground truth cannot be mutated out from under a downstream accuracy scorer. Ground-truth values and accuracy reporting land in a follow-up release.

**\[New]** Download lookup files from the Extract node UI

Lookup files (master files attached to an Extract node) were upload-only — once added, you couldn't get them back. The **Manage files** dialog in the Extract node now has a download button next to each lookup file, backed by a new presigned-URL endpoint `GET /api/v2/saas_manager/workflows/{id}/master_file/download/?uri=<s3-uri>`. Org membership and the workflow-scoped S3 key prefix are both enforced; the presign is pinned to the canonical bucket so a crafted cross-bucket URI cannot redirect the signed GET.

**\[Improved]** Table extraction now always uses the refine engine

Table parsing previously branched upfront on a heuristic between a cheap `simple_table` call (one shot, no retry — silent data loss on empty/truncated output) and a much slower `dense_table` agent (\~10–20× the cost and latency). The classification was fuzzy and data integrity depended on getting that one-shot guess right, so the routing prompt was biased toward dense, over-escalating and burning cost. Every table now takes the same path — cheap parse → audit critic → in-place refinement with tools — with a best-draft-wins guarantee, so a misclassification can no longer lose data. On an 18-document Iberia benchmark refine was **27% cheaper** (cheaper on every doc) and substantially faster on the slow tail (worst single doc 1346s → 317s), with all four `full_content` quality metrics improving. As part of the change the routing prompt now requires ≥2 rows × ≥2 columns of tabular evidence before sending a block to the table pipeline, so label/value pairs, signatures and logos are no longer misrouted as tables.

**\[Improved + API]** Workflow analytics timeseries response is much smaller and faster

`GET /api/v3/workflows/{id}/analytics/timeseries` now serves the default monitoring view in **a few KB of gzipped JSON** instead of 1.7–3.9 MB. The "Overall" series across all fields is shipped pre-collapsed as a new `overall_rows` array (one entry per extraction), and an opt-in repeatable `field_ids` query parameter narrows the legacy `rows` payload to only the fields you want — present-but-empty (`?field_ids=`) returns counts and `overall_rows` only. Calls that omit `field_ids` still receive the full legacy rows, so existing integrations keep working unchanged. The cold-path recomputation now scopes the live aggregation to cold versions instead of the full workflow, self-heals to all-warm on the first full-universe read, and the response is gzip-compressed.

### June 10, 2026

**\[New]** Per-user timezone preference for API datetimes

User profiles now carry an IANA timezone preference, defaulting to `Europe/Madrid`. Datetime fields in API responses are serialized in the user's chosen zone, and the dashboard **Account** page gains a searchable timezone selector that persists through the existing profile-update flow. Storage stays UTC end-to-end — only the wire serialization (and transitively, what the dashboard renders) reflects the preference.

**\[Fixed + API]** Oversized request bodies now return `400` instead of `500`

Any v3 endpoint that reads a request body returned a `500` when the JSON payload exceeded the 20 MB limit (`DATA_UPLOAD_MAX_MEMORY_SIZE`): Django raised `RequestDataTooBig` while django-ninja read `request.body` during parameter resolution, before the view ever ran, so it could not be caught inside the endpoint. A global exception handler on the v3 API now returns `400` with `{"detail": "Request body is too large. Maximum allowed size is 20 MB."}`. The limit in the message is derived from the setting, so it stays accurate if the cap is tuned.

**\[API]** Rate limiting now enforced on the external API in production

The external API's rate limiter is now active in production. It had been silently disabled for roughly three months because `RATE_LIMIT_ENABLED` and `REDIS_URL` were added to the dev config but never propagated to prod. Callers exceeding the configured limits may now receive `429` responses.

<Warning>
  **\[Breaking + API]** Deprecated v2 datapoint and file-verification endpoints removed

  The following v2 endpoints are gone. They were already marked deprecated and verified unreachable from the dashboard, the external API, the SDKs, and the cronjobs before removal:

  * `PATCH /api/v2/.../datapoints/{id}/` and `POST` / `DELETE /api/v2/.../datapoints/{id}/validation/`
  * `POST /api/v2/.../files/{id}/verify/`

  **Action required:** integrations that still target these paths must switch to the v3 replacements: `POST /api/v3/extractions/{uuid}/datapoints/{verify,unverify}` for verification toggles, `POST /api/v3/extractions/{uuid}/datapoints/{edit,discard}` for value edits and discards, and `POST /api/v3/extractions/{uuid}/verify` for whole-extraction verification.
</Warning>

**\[Fixed]** Verify Document button now works on split + extract workflows

The header **Verify Document** button used to be a permanent no-op on workflows that combine a Split node with an Extract: it targeted the file-level extraction, but extracted data on a split lives on per-split child extractions, so verifying the parent verified nothing. The button now targets the **active split's child extraction** (and is labelled **Verify Extraction** on a split tab to match), and is disabled on parse, split, and validation tabs where there is nothing to verify. A split file is reported as verified only once every non-empty child extraction is verified — empty category tabs no longer wedge the rollup. Shift+Enter advances through unverified splits before moving on to the next file.

**\[Improved + API]** New `verifier` API surface; `validator` deprecated

The "human-validation → verification" rename now reaches the file-level API. A new endpoint `POST` / `PATCH /api/v2/.../files/{id}/verifier/` assigns and updates the verifier for a file, a `verifier__in` filter narrows file lists by assigned verifier, and `?include=verifier` returns the assigned verifier in file responses. The legacy `/validator` action, `validator__in` filter, and `?include=validator` option remain live for backward compatibility but are now flagged `deprecated` in the OpenAPI spec; integrations should plan the move.

**\[Fixed + API]** Non-member reads of workflow analytics now return `404`

`GET /api/v3/workflows/{id}/analytics/summary` and `GET /api/v3/workflows/{id}/analytics/timeseries` previously returned `403 Forbidden` to a caller outside the workflow's organisation, leaking the existence of the workflow. They now return `404 Not Found`, matching `GET /api/v3/workflows/{id}`. Members continue to receive analytics as before.

**\[Fixed]** New free accounts now see their signup credits on the Usage page

The **Usage** page counted only purchased and granted credits, so a brand-new free account — whose only balance is its signup bonus — showed a zero balance and the empty-state screen instead of the credits it could actually spend. Signup credits are now included in both the headline total and the per-source breakdown.

### June 9, 2026

**\[API]** New v3 endpoints for row-level edits on object fields

The v3 API now exposes positional row operations on object-field tables: add, delete, swap, verify-row, and verify-all. Rows are addressed positionally by `(extraction_uuid, field_persistent_id, index)` instead of an opaque datarow id, and the dense `[0, n_live]` invariant is enforced at the boundary (negative indexes now return `422` instead of a silent `404`). The corresponding v2 actions on `DatarowViewSet` are now marked deprecated but remain live; integrations should plan to migrate.

**\[Improved]** Document parsing now retries transient LLM failures more reliably

The parse pipeline now retries `LengthFinishReasonError` (model hit max-tokens, often non-deterministic) and detects silent parse failures where the LLM returned a 200 but the response could not be coerced into the schema. Both previously skipped straight to the fallback model on the first attempt; they now go through the configured retry budget on the primary model first. The per-page parsing token budget was also raised from 8K to 32K to reduce length-truncation failures, and `ParseFailedError` now carries the underlying per-page error detail.

**\[Fixed]** Lookup + Validate workflows now complete end-to-end

Two independent bugs that broke the same Lookup + Validate workflow are fixed:

* A `Validate` node downstream of an `Extract` node with lookup fields previously returned every rule as *inconclusive ("No extraction data available to validate.")*, because the validate step resolved its upstream to a synthetic lookup node and read a state key nothing ever wrote. It now resolves through smart-lookup hops back to the real producing extract.
* The Gemini page-sum verification schema was missing a top-level `title`, which made model-build raise outside the per-task `try/except` and aborted the whole extraction whenever Phase-1 sum verification failed and Gemini verification was attempted.

**\[Improved]** Faster admin pages, file listings, and date-filtered results

Several database hotspots flagged by APM are now indexed and the corresponding queries rewritten to be sargable:

* `Extraction.uuid` and `File.name` get btree indexes (built `CONCURRENTLY`), eliminating sequential scans on the hundreds of lookups per day they served.
* A new composite `FileCollection(workflow_id, created_at DESC)` index backs the per-workflow collection listing.
* `created_at` date filters across the admin dashboard, billing workflow-usage queries, and the public `ResultsFilter` are rewritten from `(created_at AT TIME ZONE 'UTC')::date` casts (which disabled the index, costing up to \~2s on a single filter path) to half-open `created_at` ranges. Behavior is unchanged because the deployment timezone is UTC.

<Warning>
  **\[Breaking + API]** Deprecated v2 datarow endpoints removed

  The v2 `DatarowViewSet` actions marked deprecated in the same release as the v3 datarow rollout — `create_empty`, `delete_by_position`, `swap`, `validate_row`, `validate_all` — and their five `/api/v2/.../files/{file_id}/datarows/` routes are now removed.

  **Action required:** integrations that still drive row-level edits through the v2 paths must migrate to the v3 surface: `POST /api/v3/extractions/{extraction_uuid}/datarows/{add,delete,swap,verify-row,verify-all}`.
</Warning>

**\[Improved]** Per-category grid restores split-granular column filtering

Workflows that include a Classify or Split node show their extracted data in the Per-Category grid in the data viewer. Column filters on that grid had been deliberately disabled because the previous file-paginated path compared against the wrong extraction. A new split-granular results endpoint backs the grid: a filter on an extracted field now narrows to exactly the matching split results, paginated and counted server-side, with filter and sort behavior matching the main file grid.

### June 5, 2026

**\[Improved]** Transient frontend request failures now auto-retry with backoff

The web app retries idempotent React Query requests that fail with transient connectivity errors or 5xx responses, using capped exponential backoff (up to \~10s) and up to 3 attempts. Deterministic 4xx responses and unknown errors still fail fast. Connectivity errors are now detected via the Fetch spec's `TypeError` instead of message matching, which cuts down on false-positive error reports during brief network blips.

**\[API]** Malformed `Authorization: Bearer` headers now return `401`

`Authorization` headers with malformed bearer tokens (for example, an empty token or whitespace inside the token value) now return `401 Unauthorized` instead of `500 Internal Server Error`. The successful auth path and well-formed token-rejection responses are unchanged.

**\[Fixed]** Newly created workflows lay out their nodes with wider default spacing

Newly created workflow nodes are placed with a larger default gap so the auto-layout reads more clearly out of the box and matches the spacing the rest of the editor assumes.

**\[Fixed]** Full-page loading spinner no longer jumps during initial load

The full-page spinner stays anchored in place while the app boots, instead of shifting position as the surrounding layout settles in.

**\[New]** How credits work — new pricing & credits docs page

A new Concepts → Account page, **How credits work**, documents per-operator credit costs (including Validate at 5 credits per rule per page), plan €/credit conversion, included credits, usage tracking, and tips for keeping cost down. The Usage & Billing page links to it in place of the previous "contact us for pricing" stub.

**\[Improved]** Monitoring over-time chart now plots on a true time axis

The accuracy and confidence over-time chart on the Monitoring tab positions each extraction at its real timestamp on a time-scaled axis, instead of forcing points onto an evenly-spaced grid. Time-bucketing now adapts to the visible span, so long, dense histories coarsen gracefully while short spans stay at full resolution. Version markers sit at their exact creation time and merge into a single ranged label (for example `v1.0–v1.2`) when several versions ship the same day, and the empty states now distinguish between no rows, no reviewed extractions, and no recorded confidence scores.

**\[Improved]** Large workflow versions no longer time out when loading file counts

Loading the file counts for a high-volume workflow version no longer tips over the request timeout. The count is now served by a single annotated aggregate backed by an index-only scan, replacing the per-file status scan that caused worker timeouts on workflows with many files.

**\[Fixed]** Filtering and sorting restored on smart-lookup columns

Columns backed by a smart-lookup field keep their sort and filter controls in the data viewer while still showing the lookup indicator. Previously the custom lookup header replaced the entire header cell, hiding the filter button on those columns even though filtering still worked underneath.

**\[Fixed]** Overlapping bounding-box highlights no longer wash out the PDF page

In the PDF visualizer, hovering a stack of overlapping bounding boxes now reads as a single translucent tint on both light and dark pages, instead of compounding toward opaque as more boxes overlapped. Boxes shared by many fields, such as table sections, also render more efficiently.

### June 2, 2026

<Warning>
  **\[Breaking + API]** Monitoring summary `total_extractions` now counts processed extractions only

  `GET /api/v3/workflows/{id}/analytics/summary` previously counted every queued extraction in `total_extractions`, including in-flight rows that had not yet finished processing. It now counts processed extractions only — matching the `verified`, `through`, and per-field counts already returned by the same endpoint, and matching what the cold path (no version filter) has always reported.

  **Action required:** if your integration reads `total_extractions` from the summary endpoint and expects in-flight rows to be included, switch to a non-summary query or add the in-flight count from your own tracking. The Studio monitoring UI is unaffected.
</Warning>

**\[Improved]** Monitoring page now reveals data progressively

Each card on the Monitoring tab paints its title and controls immediately and shows a skeleton only for the body that is still loading, off the data source that feeds it. A fast endpoint can paint without waiting on a slower sibling, instead of the whole page sitting behind a single page-level skeleton. The time-series chart also now keeps daily resolution on 30-day ranges instead of collapsing into \~5 weekly points.

**\[Improved]** Faster workflow version analytics summary

`GET /api/v3/workflows/{id}/analytics/summary` is significantly faster on high-volume workflow versions. Overall metrics, per-field metrics, verified counts, and throughput are now computed in two table scans instead of five, with no change to the response shape.

<Accordion title="Technical detail">
  The endpoint previously routed through the paginated read path and ran the per-extraction aggregate twice plus a discarded pagination subquery. It now serves the summary from a dedicated read that combines the by-field `GROUP BY` and a single per-extraction `COUNT(...) FILTER(...)` aggregate, with the paginated read path unchanged for the timeseries and detail endpoints.
</Accordion>

**\[Improved]** Faster workflow version analytics timeseries

`GET /api/v3/workflows/{id}/analytics/timeseries` is now significantly faster on high-volume workflow versions, applying the same optimization as the summary endpoint. The response shape is unchanged.

<Accordion title="Technical detail">
  The timeseries endpoint consumes only the per-extraction detail rows, but was still served by the full five-scan paginated read — computing the overall sum, per-field `GROUP BY`, and through-rate counts only to discard them. It now uses a detail-only read, and resolves the latest N extractions from the `Extraction` table (which owns `processed_at`) instead of grouping the multi-million-row analytics cache — roughly 34× faster on the top-N lookup, with existing indexes only and no migration.
</Accordion>

**\[API]** Deprecated combined `/analytics` endpoint removed

The deprecated `GET /api/v3/workflows/{id}/analytics` route (flagged for removal when the frontend migration completed) is gone. Use `GET /analytics/summary` for per-version overall and per-field metrics, and `GET /analytics/timeseries` for the cross-version series.

**\[Fixed]** File validation status stays in sync with its datapoints

When you confirm the last unverified datapoint on a file, the file badge now reliably flips to **Validated**, and when you unverify a previously-verified datapoint the file rolls back to **Processed**. Previously, the file-level validation status could drift from its datapoints — most commonly, the last cell confirmation would not flip the file badge.

**\[Improved]** Organization switcher now scales to many organizations

The sidebar organization switcher caps at 10 visible organizations by default with a **Show all** expander, and adds a search box that fuzzy-matches organization names across your full list. Organizations you're a direct member of are prioritized over support-access ones, and a per-user most-recently-used ordering surfaces the organizations you actually work in first. The current organization is always shown.

**\[Fixed]** Deep links to a file's results now switch to the workflow's organization

Opening a workflow file via a direct URL or in a new tab (middle-click) now elevates the active organization to the one the workflow belongs to, matching click-through navigation. Previously a cold-loaded file results page kept the UI on your own organization, hiding the support-access banner and showing the wrong organization in the sidebar.

**\[Fixed]** Organization selector rows stay a readable size on long lists

When you belong to many organizations, each row in the sidebar organization selector keeps a consistent, tappable height with visible spacing, instead of compressing to near edge-to-edge as the list grows.

### June 1, 2026

**\[New]** Workflow version selector

The workflow page header now has a version selector (on the Studio and Monitoring tabs) so you can view any saved version of a workflow. Selecting a non-latest version on Studio disables editing with an explainer — only the latest version can be edited — while Monitoring scopes its analytics to the version you pick.

**\[New]** Per-version monitoring history

The Monitoring tab now shows version history: the accuracy/confidence chart spans all versions with markers at each version change, a changelog panel lists what changed, and KPIs and by-field metrics can be scoped to a selected version.

<Accordion title="Technical detail">
  The analytics endpoint was split into purpose-built reads — `GET /analytics/summary?version_id=` (per-version overall + per-field means and through-rate counts) and `GET /analytics/timeseries?limit=` (cross-version series) — and analytics are now paginated by extraction rather than by (extraction, field) row, so wide schemas no longer collapse the window. The old `/analytics` endpoint is deprecated but retained until the frontend migration completes.
</Accordion>

**\[API]** Python SDK now installs as `anyformat`

The official Python SDK's PyPI distribution has been renamed from `anyformat-sdk` to `anyformat` — `pip install anyformat` — mirroring `@anyformat/sdk` on npm. The import path is unchanged (`from anyformat.sdk import Client`).

<Accordion title="Technical detail">
  The distribution starts at version 0.6.0 (the next free slot past the legacy `anyformat==0.5.0` on PyPI). The internal CLI binary that previously occupied the `anyformat` console-script slot was renamed to `af`; `from anyformat.cli import ...` still works.
</Accordion>

**\[Improved]** Workflow data sub-tabs keep their structural entry points

When a workflow has many data sub-tabs, the collapsed row keeps the leading tabs (Documents, Split/Classify, the first categories) and pins the selected tab to the end, with the overflow in a "+N" menu — instead of hiding everything but Documents and the active tab. Rows with five or fewer sub-tabs show all of them.

## May 2026

### May 28, 2026

**\[Changed]** Validator nodes now consume credits

`ValidateNode` previously ran for free. Each rule in a validator now costs 25 credits, billed once per extraction. The bill scales with the number of rules in the validator, not with the number of pages or files in the document packet.

<Accordion title="Technical detail">
  A new `validate` operator (`Credits(25)`) was added to `DEFAULT_OPERATOR_PRICES`. The internal billing dispatch was reshaped from `(item) -> list[OperatorName]` to `(item, page_count) -> list[LineItem]`, so non-page-priced operators like validators can ignore the page count and substitute `len(rules)` as the billed unit. The processor now issues a single `SpendCreditV4` call per extraction with all line items combined.
</Accordion>

**\[Changed]** Default operator prices realigned with the public pricing page

Default credit prices for several operators now match [anyformat.ai/pricing](https://anyformat.ai/pricing): Parse 25 (was 30), Extract 35 (was 20), Splitter 25 (was 10), Parse (Agentic) 100 (was 75), Extract (Agentic) 150 (was 75). Classify is now billed at 10 credits/page (previously a no-op — running a Classify node was free regardless of the published rate). Organizations with enterprise-specific overrides on `OperatorPriceTable` are unaffected.

**\[Improved]** Documentation restructured into Concepts / Guides / API Reference

[docs.anyformat.ai](https://docs.anyformat.ai) has been reorganized into three scoped tabs. **Concepts** holds definitional pages (Workflows, Schemas, Fields, etc.). **Guides** holds task-oriented walkthroughs (Quickstart, Build workflows, Recipes). **API Reference** holds the REST spec only. The four overlapping "create → run → results" walkthroughs are merged into one Quickstart with UI / curl / Python tabs at each step, and every extraction recipe now uses the canonical typed-graph workflow shape (`{name, nodes, edges}`) instead of the legacy `{fields:[...]}` shortcut. Old URLs redirect.

**\[Improved]** Post-signup `/welcome` page is now reachable only from the signup callback

Loading `/welcome` directly (typed URL, bookmark, fresh tab) now bounces to `/`. The page is only reachable after a brand-new Auth0 signup callback, which closes a path that could let the upstream GA4 / GTM conversion event fire for already-signed-up users.

**\[Improved]** `@anyformat/skill` synced to npm 0.2.2

The `@anyformat/skill` package version in this repo is now aligned with what's live on npm (`0.2.2`). The publish script uses token-based auth and bumps from `max(local, registry)` to prevent version collisions on the next release.

**\[Improved]** Release-PR generation now audits merged PRs upfront

The `/prod-release` Claude Code command (used to open this release PR) now treats the merged-PR list as the ground truth and the changelog as a derived artifact. The audit step surfaced the gap that produced the entries below — previously, PRs merged without a changelog entry could quietly drop out of the release notes.

### May 27, 2026

**\[New]** Attach smart lookup files to fields in the workflow SDK

The Python and JS SDKs can now mark a field as a smart lookup field and attach a reference file when creating a typed workflow. Flag the field with `lookup` and point it at a local CSV or text file; the SDK reads the file and uploads it inline, and extraction resolves that field's values against your reference data. Lookup files are validated as UTF-8 text before anything is written — base64 payloads are never stored or returned.

<Accordion title="Technical detail">
  `POST /api/v2/workflows/` accepts a per-field `lookup` flag (mapped to `source=smart_lookup`) and inline base64 `lookup_files`. Files are pre-validated as UTF-8 text/CSV (with cp1252→UTF-8 transcoding); binary or non-text uploads are rejected with a `400 UNSUPPORTED_LOOKUP_FILE` before any S3 writes. Only S3-backed lookup file references are persisted; base64 payloads and URIs are never stored or echoed back.
</Accordion>

**\[Improved]** Excel workbooks now convert to one Markdown page per sheet

Excel→Markdown conversion now emits a separate page for each worksheet, in workbook order, instead of concatenating every sheet into a single page. Multi-tab spreadsheets now classify and process per tab, keeping each sheet's content distinct downstream.

**\[New]** Official Python SDK now available on PyPI

`pip install anyformat-sdk` — the `anyformat-sdk` package is now live on PyPI. Use the fluent builder to author, create, run, and await workflows end-to-end without writing HTTP code; sync and async clients are included, with typed errors (`BadRequest`, `Unauthorized`, `Forbidden`, `NotFound`, `RateLimited`, `ServerError`, `SDKTimeout`) and a `.raw` escape hatch on every response. The Claude Code skill at `@anyformat/skill` now documents the Python SDK alongside the JS/TS one.

**\[New]** Post-signup `/welcome` landing page

Brand-new signups now land on a dedicated `/welcome` page (instead of going straight to `/`), giving marketing a stable URL for Google Ads conversion configuration via GA4 / GTM URL match. Returning users continue to land on `/` (or the originally-requested URL) as before.

**\[Fixed]** New signups for an already-registered email address are now rejected

Across every API write site that can create a User row (Auth0 just-in-time provisioning, the E2E test-auth endpoint, the `create_user` management command), case-insensitive duplicate emails are now refused before a second row can be created. Auth0 email rotation that would collide with an existing user is also refused. Previously, mixed-case variants of the same address could silently mint duplicate users.

**\[Fixed]** Resolved four frontend dependency vulnerabilities

`qs` (DoS in stringify), `tmp` (path traversal), `uuid` (missing buffer bounds check), and `js-cookie` (prototype hijack) advisories are now resolved via package overrides in `anyformat/services/frontend`. Upstream parents (cypress, exceljs, segment-analytics) hadn't yet adopted the patched versions, so overrides were the minimally-disruptive path.

**\[Improved]** Authenticated images are no longer refetched multiple times per page load

The sidebar mounts the organization logo from three places, which previously meant three identical requests to `/api/v3/iam/organizations/{id}/logo/` on every page load. The image hook now uses TanStack Query so concurrent subscribers share one in-flight request and the result is cached for 5 minutes.

**\[Improved]** Local development supports both `localhost` and `127.0.0.1` for OAuth

For developers running the stack locally, both `http://localhost:<port>/callback` and `http://127.0.0.1:<port>/callback` are now valid OAuth callback URIs and CORS / CSRF origins. Previously only one was registered per environment, which caused login failures if the dev frontend was opened at the other hostname.

**\[Improved]** JS SDK build toolchain refreshed

`esbuild` (0.21.5 → 0.27.7) and `vitest` (2.1.9 → 4.1.7) in the JS SDK package have been updated. No behavioral changes to the SDK itself.

### May 26, 2026

**\[Improved]** Higher monthly allowance on the Business plan

The Business plan's monthly credit grant has increased to 500,000 credits.

**\[Changed]** Free tier now gets a one-time 50,000-credit signup grant instead of a monthly allowance

Free organizations no longer participate in the monthly credit rotation. Each new free org now receives a single 50,000-credit signup grant that never expires — generous enough to evaluate the product, finite enough to make upgrading the natural next step. Paying tiers (Business, Enterprise) keep their recurring monthly allowance unchanged.

**\[Changed]** Stripe top-up pricing is now tier-aware

Top-up checkout rates depend on the organization's tier: free orgs pay €1.50 per 1,000 credits (€0.0015 / credit), Business and Enterprise orgs pay €1.00 per 1,000 credits (€0.001 / credit). Top-up amounts are now denominated in credits (minimum 30,000) instead of euros.

### May 25, 2026

**\[New]** Visualize parse layout with `result.parse.draw()`

The Python SDK can now render a document's parsed layout as an image overlay. Call `result.parse.draw()` to get the page with detected blocks drawn on top — useful for debugging extraction and verifying bounding boxes. A new cookbook example walks through it.

**\[Changed]** Simplified parse configuration

The `engine` (Fast/Performant), `effort`, and `visual_grounding` parse knobs have been removed from Studio and the workflow SDK. `engine` was a no-op (both options mapped to the same model), and the other two now use sensible built-in defaults. `figure_enhancement` and `mode` remain configurable. Existing workflows are unaffected — stored values for the removed knobs are ignored gracefully.

**\[New]** Install the anyformat Claude Code skill from npm

The anyformat Claude Code skill is now published on npm as `@anyformat/skill`. Run `npx @anyformat/skill` to install it — Claude Code can then author, run, and inspect anyformat workflows directly from your editor using your `ANYFORMAT_API_KEY`. Public docs include a new "Claude Code Skill" section under the SDKs page.

**\[Fixed]** Classify and split workflows authored via the Python SDK now route to extraction correctly

Workflows built with the Python SDK's `classify(...)` / `split(...)` builders no longer silently skip downstream extraction when a category or split rule is attached by name. The typed graph renderer now resolves branch ports by name (matching the schema's edge labels) instead of internal IDs, so SDK-authored classify/split workflows produce the same extractions as Studio-authored ones.

**\[Fixed]** OCR text containing NUL bytes no longer fails extraction

Extractions whose parse output contained a U+0000 (NUL) character inside a text value previously failed at the database write step, marking the entire batch as errored. The worker now strips NUL bytes from text at the parsing boundary so these documents complete normally.

**\[API]** Organization balance endpoint now returns a per-bucket breakdown

The new organization balance endpoint reports credits broken down by their source bucket — monthly grant, top-ups, and each redeemed voucher — along with the organization's tier and the monthly anchor day. Read-only by design: querying the endpoint never triggers a balance reset or recompute.

<Accordion title="Technical detail">
  `GET /api/v3/billing/organizations/{org_id}/balance/` returns `{ tier, monthly_anchor_at, monthly: {...}, topup: {...}, vouchers: [...], overdraft: {...} }`. Pre-migration organizations return zeroed buckets with a stable shape. Membership is enforced; any role can read.
</Accordion>

**\[New]** New organizations receive a monthly credit grant on signup

Newly created organizations now receive a monthly credit grant scoped to their tier, which resets on the organization's anchor day each month. This replaces the previous one-time welcome top-up and makes ongoing free usage predictable instead of front-loaded.

**\[Improved]** Voucher credits now expire after a configurable lifetime

Vouchers can now carry a `credit_lifetime_days` setting (default 90 days). When a voucher is redeemed, the granted credits expire after that many days, so promotional credits don't sit in a wallet indefinitely. Existing vouchers continue to behave as before until the new field is set.

**\[Fixed]** Signup voucher hint shows voucher value only, in your locale

The voucher preview on the create-organization and join-or-create pages no longer double-counts the welcome grant — a 5,000-credit voucher now renders as "5,000 credits" instead of being added on top of the standard signup grant. The number is also formatted using the browser's locale instead of being hardcoded to German formatting.

### May 22, 2026

**\[New]** Official Python SDK and `afx` CLI

The `anyformat-sdk` package wraps the workflow API with sync and async clients so you can author, create, run, and await a workflow end-to-end without writing HTTP code. Typed errors (`BadRequest`, `Unauthorized`, `Forbidden`, `NotFound`, `RateLimited`, `ServerError`, `SDKTimeout`) make integration code easier to write correctly, and a `.raw` escape hatch on every response exposes the underlying JSON when you need it.

<Accordion title="Technical detail">
  ```python theme={null}
  from pathlib import Path
  from anyformat.sdk import Client
  from anyformat.workflow import Schema

  client = Client(api_key="...")
  workflow = (
      client.workflow("Invoices")
          .parse(mode="agentic")
          .extract([
              Schema.string("vendor", "The company that issued the invoice"),
              Schema.string("total",  "The total amount due, including currency"),
          ])
          .create()
  )
  run = workflow.run(file=Path("invoice.pdf"))
  result = run.wait(timeout=300)

  print(result.fields["vendor"].value)
  ```

  `AsyncClient` mirrors the sync surface for async codebases.
</Accordion>

**\[New]** `afx` CLI for one-shot parse and extract

The SDK ships an `afx` CLI for quick command-line use. Run `afx parse --file invoice.pdf` to get markdown, or `afx extract --file invoice.pdf --field "vendor:vendor company" --field "total:total amount"` to extract structured fields. Pass `--schema fields.json` to load a richer schema, `--json` for machine-readable output, and `-o/--output FILE` to write results to disk.

<Accordion title="Technical detail">
  ```
  export ANYFORMAT_API_KEY=sk_...
  afx parse   --file invoice.pdf
  afx extract --file invoice.pdf --field "vendor:vendor company" --field "total:total amount"
  afx extract --file invoice.pdf --schema fields.json --json -o results.json
  ```

  Spinners, step breadcrumbs, and the Markdown/table renderers all detect TTYs and stay quiet when output is piped or `--json` is set, so the CLI is safe to script.
</Accordion>

**\[New]** Workflow versions are now semver-numbered

Workflow versions now display as `v1.0`, `v1.1`, `v2.0` across Studio, version history, and the save toast. Saving a workflow only mints a new version when it actually changes — major bump on graph topology or node config changes, minor bump on extraction-schema changes, and no bump at all when you only move nodes on the canvas or reorder columns in the data viewer.

**\[API]** Save layout and reorder fields without minting a new workflow version

New workflow-centric write endpoints separate cosmetic edits from material ones. Layout drags and column reorders no longer create new versions or break stable links to a specific version of your workflow. The legacy v2 write endpoints continue to work but now respond with `Deprecation: true`.

<Accordion title="Technical detail">
  * `PATCH /api/v3/workflows/{id}/config/` — save (may bump major or minor)
  * `PATCH /api/v3/workflows/{id}/layout/?version_id=X` — canvas position only (never bumps)
  * `PATCH /api/v3/workflows/{id}/field-order/?version_id=X` — column reorder only (never bumps)

  Concurrent edits surface a `409` with `latest_version_id` in the response body so your client can rebase without losing local edits. The deprecated v2 endpoints (`PUT /api/v2/.../workflows/{id}/config/` and `PATCH /api/v2/.../workflows/{id}/reorder-fields/`) still function during the transition window.
</Accordion>

**\[New]** `npx anyformat-skill` installs the anyformat Claude Code skill

If you use Claude Code, you can now install the anyformat skill with a single command: `npx anyformat-skill` drops it into `~/.claude/skills/anyformat`, or `npx anyformat-skill --project` installs it into the current project. The skill teaches Claude how to drive the typed-graph workflow API end to end — create a workflow with nodes and edges, upload a file, run extraction, and poll for results.

**\[Fixed]** Multi-tab login no longer fails with a code-verifier error

Logging in from two browser tabs at once could clobber the shared PKCE code verifier and surface as a generic "Login failed" page. The callback now silently retries once on this race and lands you on your original destination URL.

### May 21, 2026

**\[Improved]** Cleaner parsed markdown — no structural metadata wrappers

Parsed markdown saved to S3 and returned from the parse-result endpoints no longer carries `<DOCUMENT>` framing or `<section data-bbox=…>` wrappers. Block boundaries are now expressed as inert `<a id></a>` anchors that render as nothing in any markdown viewer, and page boundaries are joined with blank lines. Bounding boxes, page numbers, and confidences remain available via the structured results endpoints — they just stop leaking into the human-readable file body.

**\[New]** Modelo 200 workflow template (Spanish corporate income tax)

A new "Modelo 200" template lands in the workflow gallery, oriented around credit underwriting and portfolio early-warning. It covers balance-sheet and profit-and-loss extraction with 45 root fields (liquidity, leverage, profitability, cash conversion, coverage ratios) plus two table fields for directors and principal shareholders. The template is pre-configured to use agentic parsing with Balanced effort.

**\[Fixed]** Nested table columns no longer dropped by the smart-table extractor

Columns of a list-of-objects field (for example, items inside a product table) could previously be missed because the smart-table planner tried to extract them as standalone scalars. Nested fields are now correctly treated as per-row columns of their parent table and reach the final extraction every time.

**\[Fixed]** Editing object-table cells no longer loses focus mid-typing

Validating a value in an object-field table sometimes stole keyboard focus from the input before you finished typing, occasionally swallowing characters. Cell focus is now preserved across both the validation round-trip and the background refetch, and screen-reader row labels stay in sync after row swaps.

**\[Fixed]** Parse → split workflows now return their splits

Workflows that parse and split a document without a downstream extract were silently dropping the resulting splits. Terminal-splitter workflows now persist and return their splits as expected, surfaced via the file-list endpoint with `?include=splits`.

### May 20, 2026

**\[API]** `POST /v2/workflows/` now takes a typed graph; the old fields-only body is gone

`POST /v2/workflows/` now accepts the typed `nodes` + `edges` graph that previously lived at `POST /v2/workflows/with_graph/`. The legacy fields-only body and the `with_graph` path have been removed — there is one canonical workflow-create endpoint going forward.

<Accordion title="Technical detail">
  The `with_graph/` path no longer exists at `api.anyformat.ai/v2/workflows/`. Send the same payload (`{name, description?, nodes, edges}`) to `POST /v2/workflows/`. The legacy body `{name, fields}` is no longer accepted — wrap your fields in an explicit `parse → extract` graph instead. The legacy `multipart/form-data` upload path (creating a workflow from a downloaded `json_file`) has also been removed; rehydrate the workflow into the typed-graph body client-side.
</Accordion>

**\[Changed]** `POST /v2/workflows/{id}/run/` response `status` is now `"pending"` instead of `"success"`

The public run endpoint accepts requests asynchronously (`202 Accepted` — the extraction is enqueued, not complete). The previous `"success"` value conflated acceptance with completion; the new `"pending"` matches the vocabulary used by `WorkflowRunListItem.status` elsewhere on the v2 surface. Clients that polled for results regardless of this field are unaffected. Clients that branched on `"success"` to skip polling should now poll on `"pending"`; semantics match what was always intended.

**\[New]** Per-branch Validation tabs across workflow and file detail pages

The Validate node now gets the same per-branch UX as Extract. The workflow page renders one Validation tab per branch alongside its existing Category and Extraction tabs, and the file detail page mirrors the same layout. Split workflows include a subdocument selector with chevrons and a `1/N` counter, so you can step through per-subdocument validation results without leaving the tab.

**\[Improved]** Clear "Not validated" placeholders for rules added after extraction

When you add or edit a validation rule, files that were processed before the change now render explicit "Not validated" cards and cells with a hover-card explanation, instead of an ambiguous "N/A". This makes it obvious which results reflect the current rule set and which would need a re-run.

**\[Improved]** Redesigned Validate node configuration menu

The Validate node menu now uses a tabbed Config / Rules layout aligned with the Classify menu, with a dedicated rule editor that supports inline field chips inside rule descriptions. The previous template-import button has been removed in favor of the new layout.

**\[Fixed]** Validation verdicts now populate for split workflows

In workflows with a Split node, validation verdicts could come back empty in the file list and detail view because the API only inspected the parent extraction. Validation results now traverse parent and child extractions so per-subdocument verdicts surface correctly across the workflow page Validation tab, the file detail page, and the file results UI.

**\[Fixed]** CSV and master-file uploads decode reliably from non-UTF-8 encodings

Uploaded CSV files and master files saved in cp1252 (Word/Outlook smart quotes, Spanish accents), Shift-JIS, GB18030, KOI8, and similar non-UTF-8 encodings could previously either silently produce mojibake or fail extraction with a Unicode decode error. Uploads are now transcoded to UTF-8 at the API boundary, with binary payloads disguised as text rejected up front, so downstream parsing and lookups receive consistent, correctly-decoded bytes.

<Accordion title="Technical detail">
  The master-file upload endpoint and workflow CSV upload path now run inputs through a strict Western decoding ladder (`utf-8-sig` → `utf-8` → `cp1252`) before falling back to charset detection, and persist the re-encoded bytes to storage with a truthful `Content-Type: …; charset=utf-8` header. Allowed text extensions are `csv`, `txt`, `md`, `markdown`, and `rst`. Non-text payloads (PDF, ZIP, images, executables) are rejected at upload time rather than being mis-decoded.
</Accordion>

### May 19, 2026

**\[API]** Fetch parsed blocks with bounding boxes via the new parse-result endpoint

The new parse-result endpoint returns the structured page-nested blocks produced by parsing — including bounding boxes, type, confidence, reading order, and layout — along with a presigned URL for the raw markdown. Studio's PDF viewer and JSON view now use these blocks to render overlays consistently across tabs.

<Accordion title="Technical detail">
  `GET /api/v3/files/{file_id}/parse-result/` returns `{ blocks: [...], markdown_url: "..." }`. Blocks are grouped per page and carry a flat `bbox: { x0, y0, x1, y1 }`. The endpoint accepts an optional `execution_id` to scope results to a specific workflow execution.
</Accordion>

**\[API]** Slim split results without nested rows

You can now request slim per-split results that omit nested object rows entirely, and then load nested rows lazily per split via a dedicated endpoint. Each slim split carries a per-split version token so caches can invalidate at the split granularity instead of the whole file.

<Accordion title="Technical detail">
  Pass `?include=splits_lite` on the file results endpoint to receive slim per-split entries (mutually exclusive with `?include=splits`). Use `GET /workflow-versions/{version}/files/{file_id}/splits/{split_id}/object-results/{field_persistent_id}/` to fetch paginated object rows scoped to a single split.
</Accordion>

**\[Improved]** Faster Studio category tab on workflows with many splits

The Studio category tab now requests slim payloads and fetches nested object rows lazily per split, so opening a file with many splits no longer waits on full results materialization up front. Bounded prefetching when hovering across many cells also keeps the page responsive.

**\[Improved]** Parse JSON view shows the full block on expand

Expanding a block in the Parse → JSON tab now reveals the complete block object — id, bbox, confidence, content, hyperlinks, rows, and `image_base64` — rendered as inert JSON. Pages are grouped under collapsible sticky headers (collapsed by default) for easier navigation in multi-page documents, and the displayed shape mirrors the public parse-result response exactly.

**\[Improved]** Client-side markdown hydration with embedded images

The Markdown tab now hydrates images directly in the browser using each block's bounding box — including `figure` blocks — so the in-tab view and Markdown downloads stay consistent. The dedicated server-side hydration pipeline has been retired in favor of this lighter approach.

**\[New]** "Beta" badge on Studio nodes still being stabilized

Studio nodes can now display a BETA badge in the sidebar, on the canvas header, and in the sidebar hover card. The **Validate** node is the first to use it, marking it as still being stabilized for general production use.

**\[Fixed]** Workflows created via the typed graph endpoint now run extraction

Workflows created via the typed `with_graph` create endpoint sometimes failed to run extraction and surfaced as `EXTRACTION_FAILED (422)` because the extract node was persisted without an engine. New workflows now always include an engine and execute end-to-end as expected.

<Accordion title="Technical detail">
  `POST /v2/workflows/with_graph/` now persists a default engine on every extract node it creates.
</Accordion>

**\[Fixed]** Object rows in sibling extract branches no longer collide

When a workflow had sibling extract branches that defined fields with overlapping names, object rows from one branch could be written against the wrong field. Field schema resolution is now scoped per extract node, so each branch's rows land where they belong.

**\[Fixed]** Parse results reliably appear in the UI for new workflow versions

Some parsed files showed missing pages or stale `parsed_at` because parse callbacks resolved graph nodes by an identifier that wasn't unique across workflow versions. Callbacks now scope to the active workflow version, so parse results are persisted to the correct file every time.

**\[Improved]** Other improvements and fixes

Deleted object rows are now consistently excluded from both per-split and file-level row counts; the file results UI deduplicates parse-result requests when navigating between files; and extraction progress estimates fall back to organization-scoped percentiles instead of cross-tenant numbers.

### May 18, 2026

**\[New]** Handwritten text is now a first-class block type in agentic parsing

Agentic PDF-to-Markdown now recognizes handwritten text as its own block type, with a dedicated extraction strategy that combines a vision-language transcription with OCR-derived hints. Handwritten blocks are labeled "Handwritten text" in the Studio UI, distinct from machine-printed text, and can be tuned independently from regular text extraction.

**\[Improved]** More consistent agentic parsing when text-class blocks have multiple sources

When a text-class block has multiple candidate text sources, the agentic parser now writes the chosen source back into the block's elements and regenerates its segments before rendering. Downstream grounding (overlays, citations) now matches the markdown that's actually rendered.

### May 14, 2026

**\[New]** "Beta" badge on agentic parsing mode

The Parse node mode card now shows a "Beta" badge next to **Agentic parsing**, making it clear that this parse mode is still being stabilized for general production use.

**\[New]** Calibrated confidence scores on smart-table extractions

Both table cells and scalar fields produced by the smart-table extractor now carry a calibrated 0–100 confidence, so review UIs and downstream consumers can prioritize low-confidence values. The scores degrade gracefully when scoring is unavailable, so extractions still complete normally.

<Accordion title="Technical detail">
  Confidence is surfaced on each extracted value's `metadata.confidence` field as an integer in `[0, 100]`. Both judges flip together via the `smart_table_confidence_strategy` setting (`auto` / `judge` / `none`); the default is on.
</Accordion>

**\[New]** Review threshold filter for file results

File results now include a confidence-threshold slider that filters Fields and JSON to show only extractions at or below the selected confidence — useful for prioritizing low-confidence rows that need human review. The threshold is persisted across reloads and tabs, and an inline hover card explains how the filter works.

**\[Improved]** Image previews use the PDF viewer toolbar

Image uploads (JPEG, PNG, GIF, BMP, TIFF) now render through the same viewer as PDFs and inherit its zoom, rotate, and full-page controls. Multi-page TIFFs gain a page selector, and the viewer now reports specific errors for unsupported formats or images that exceed browser size and canvas limits instead of failing silently.

**\[Improved]** More reliable dense-table parsing in agentic mode

Agentic PDF-to-Markdown table parsing no longer stalls on dense tables. Reasoning-token budgets are now properly honored across the heal, audit, and sub-region passes; partially valid heal plans are applied operation-by-operation instead of being discarded on a single bad op; and the table-parsing token cap has been raised so large tables are no longer truncated mid-document. Workflows that previously stalled for over 15 minutes now finish in roughly 3.

**\[Improved]** Organization logos stay cached when unrelated settings change

The organization logo URL is now invalidated only when the logo bytes themselves change, not whenever any field on the organization is updated. Renaming the org, toggling support access, or any other non-logo edit no longer forces every client to re-download the logo.

**\[Fixed]** Bulk-download of selected files no longer 400s

`GET /api/v2/saas_manager/workflow-versions/{id}/results/` rejected the `file__id__in` and `file__id__not_in` query parameters with `Field 'id' expected a number but got '...'` whenever the frontend passed UUIDs (the v2 file API now returns file `id` as a UUID string). The filter was still joining against the legacy integer PK; it now joins on the file UUID field, matching the regular file filter. Every bulk-download flow that relies on row selection — and the "select all minus N" pattern — is unblocked.

**\[Fixed]** Stricter validation of logo and workflow file uploads

Logo and workflow file uploads are now validated against their actual bytes via magic-byte sniffing rather than the client-supplied `Content-Type` header, and only file types the parse pipeline can actually process are accepted. Uploading a renamed executable, ZIP, or arbitrary JSON as a workflow file is now rejected at the API boundary instead of silently failing later in the pipeline. Stored objects also carry the correct `Content-Type` on the storage layer instead of being labeled `binary/octet-stream`.

**\[Fixed]** Switching organizations on a resource page no longer reverts

Switching the active organization from a workflow detail page (or any other resource page deep-linked from a different org) could occasionally snap back to the original org after showing a "switched" toast. The selection is now preserved on the first click.

**\[Fixed]** Object-field row counts exclude deleted rows

The paginated object-field results endpoint and the slim workflow-results list both now exclude soft-deleted rows from page contents and the row-count total. Counts shown in the UI and returned by the API now match the rows you can actually see.

**\[Fixed]** Sub-table detail rows no longer clip at the bottom

Expanded sub-tables inside object-field result tables previously cut off the bottom row because the row-height calculation didn't account for the surrounding padding and border. The detail row now renders at full height regardless of how many sub-rows it contains.

**\[Fixed]** Classifier category tabs render again on workflow results

Category tabs on classifier workflow results were rendering empty because the slim results response had stopped including the classifier's `graph_node_id`. The field is back in the response (with a batched prefetch to avoid per-file queries), so files routed by a classifier now appear under their correct tab.

**\[Fixed]** Workflow list updates immediately after create, duplicate, or delete

The home page workflow list previously kept the stale list cached after creating, duplicating, or deleting a workflow, so the change only appeared after a refresh. All three mutations now invalidate the workflow list cache and the list reflects the change right away.

### May 13, 2026

<Warning>
  **\[Breaking + API]** Workflow creation no longer accepts file uploads

  `POST /api/v2/saas_manager/organizations/{org_id}/workflows/` previously accepted a `multipart/form-data` body to import a workflow from a JSON or XLSX template file. That entry point and its template downloads have been removed; the endpoint now accepts JSON request bodies only.

  **Action required:** if you were uploading a JSON or XLSX template to this endpoint, switch to sending the equivalent JSON body directly. The chat-driven and from-scratch flows in Studio are unaffected.
</Warning>

**\[New]** Studio is now generally available

The visual workflow editor (Studio) is now visible to every organization member from the workflow page and the new-workflow page. Write access is still controlled by your workflow role. The legacy schema editor at `/workflows/edit/...` has been removed — the Studio tab is now the single editing surface.

**\[New]** Row-level actions on object-field result tables

Object-field result tables now expose a per-row dropdown with **Validate row**, **Add row above**, **Add row below**, and **Remove row**, plus inline up/down chevrons for adjacent reordering. The trailing-column header has a new **Validate all rows** action that promotes every datapoint in the field to human-verified in a single click.

<Accordion title="Technical detail">
  New endpoints under `/api/v2/datapoints/object-fields/{field_id}/`:

  * `POST .../rows/` (with an `index` in `[0, n_live_rows]`) inserts an empty row and shifts live siblings up by one in the same transaction.
  * `POST .../rows/{a}/swap/{b}/` exchanges two adjacent live rows.
  * `POST .../rows/{position}/validate/` bulk-promotes every datapoint in a row to human-verified.
  * `POST .../validate-all/` does the same for every live row of the object field.

  All four endpoints require Member role.
</Accordion>

**\[New]** Signup vouchers now give live feedback during onboarding

When a user lands on the signup flow with a voucher code, the create-organization screen shows the total welcome credits they will receive (the standard signup grant plus the voucher's value, fetched live from the backend) if the voucher is valid, or an amber "voucher no longer valid" hint if it is expired or unknown. Previously the page was silent in both cases.

<Accordion title="Technical detail">
  `GET /api/v3/iam/signup-vouchers/{code}/` returns a read-only preview of the voucher (`{ code, additional_credits }`) and never consumes a redemption — actual redemption still happens atomically inside organization creation. Returns `404 voucher_not_found` for unknown codes and `410 voucher_expired` for expired or exhausted ones. Responses set `Cache-Control: no-store, private` so admin edits to a voucher are reflected on the next page load.
</Accordion>

**\[Improved]** Workflow page list is dramatically faster

The workflow page's file list now uses a slim response that omits per-cell datapoint detail until you actually expand an object row. On a workload of 50 files with 2,000 line-items each, the list now returns in \~240 ms instead of \~6 s and the response payload is roughly 770× smaller. Object cells pre-warm on hover so expanding feels instant; edits to a datapoint refresh the expanded sub-table without a poll cycle.

<Accordion title="Technical detail">
  `GET /api/v2/saas_manager/workflow-versions/{id}/files/?include=results_lite` returns scalar datapoints as `{ id, display, status }` and object fields as `{ row_count }`. Full sub-tables are fetched lazily through `GET /api/v2/saas_manager/workflow-versions/{id}/files/{file_id}/fields/{field_persistent_id}/object-results/`. The existing `include=results` (full) path is unchanged and still available for legacy integrators.
</Accordion>

**\[Improved]** Stripe top-up is now open to all users and confirms the purchased amount

The **Top up** button on the Usage page now opens the Stripe checkout dialog for every user — the previous fallback to a `mailto:sales@anyformat.ai` link is gone. After a successful payment, the success page now states how many credits were added and tells the user they can start running workflows immediately.

**\[Improved]** Cleaner extraction on scanned PDFs

The agentic PDF-to-markdown parser now honors the routing model's verdict on which text source to trust (the embedded PDF text layer, OCR, or vision) for every block type, including miscellaneous blocks that previously ignored the verdict. On scanned PDFs where the embedded text layer is corrupted, blocks now render the clean OCR text instead of garbled bytes.

**\[Fixed]** Returning from Stripe checkout no longer fails authentication

If you stayed on Stripe long enough for the short-lived access token to expire, returning to anyformat used to land on an authorization error screen — even though the payment had already succeeded on Stripe's side. The app now silently refreshes the token in the background (and replays the first request that hits a 401), so the top-up success page renders cleanly even after a long checkout.

**\[Fixed]** Cross-tab refresh storm and support-elevation isolation

Opening the app in multiple tabs no longer triggers refetch storms when you navigate inside one tab. As a separate fix, support agents who elevate into a customer organization stay elevated only in the current tab — new tabs default back to their own organization until they explicitly re-elevate.

**\[Fixed]** New users are immediately enabled after signup

Users who created their own organization or requested to join an existing one were previously left in a disabled state until an admin reviewed their request, which blocked the create-org path entirely. Both onboarding paths now enable the profile immediately. Join-request approval is still required to gain membership in the target org.

**\[Improved]** Other improvements and fixes

* Structured-output extractions on Amazon Nova models no longer occasionally fail with an empty-JSON parsing error.
* The Usage page no longer renders the "Page consumption" summary box at the top — the cleaner credit-balance card now stands on its own.
* Long organization lists on the join-or-create page no longer get clipped below the fold.

### May 12, 2026

<Warning>
  **\[Breaking + API]** v2 file responses now use UUID hex for file IDs

  Every v2 endpoint that previously returned a numeric file ID now emits a 32-character UUID hex string. The OpenAPI schema already declared these fields as UUID strings, so generated SDK clients are unaffected — but raw HTTP integrations that parsed the value as an integer, or that constructed URLs from cached numeric IDs, will need to update. Old integer-shaped URLs return `404`.

  **Action required:** treat v2 `file_id` values as opaque strings, and refresh any cached file URLs that embedded integer IDs.

  <Accordion title="Affected surfaces">
    * File detail, upload responses, and file-position (`prev_id` / `next_id`)
    * Extraction responses and the verification URL builder
    * File filters: `id__in`, `id__not_in`, `file_id__in`, `file_id__not_in` now resolve against UUID hex
  </Accordion>
</Warning>

**\[New]** Delete rows from object-field result tables

Object-field result tables now have a per-row trash action. Remaining rows are reindexed in place so curated positions stay stable and your verification carries over correctly when the file is re-extracted. Rows the model produced are soft-deleted (so accuracy metrics still reflect the model's miss); rows you added yourself are removed entirely.

<Accordion title="Technical detail">
  `DELETE /api/v2/datapoints/object-fields/{id}/rows/{position}/`. Deletion is transactional. A partial unique constraint allows deleted rows to retain their original index, and the result-building SQL filters soft-deleted rows.
</Accordion>

**\[API]** Workflow responses now include `organization_id`

Workflow and workflow-version responses (including the nested `versions` collection on workflow detail) now expose a normalized 32-character hex `organization_id`. Useful when a client needs to identify which org owns a workflow without an extra round-trip.

**\[Improved]** Tax ID and billing address now collected during Stripe Checkout

Top-up Checkout sessions now collect the customer's tax ID and billing address before invoices are finalized. This unblocks invoice correctness in EU/UK jurisdictions. No action required for existing customers — the values are persisted on the next Checkout.

**\[Improved]** Smoother markdown viewer on long documents

The "visual" markdown view is significantly snappier on long documents. Scroll-driven page detection is now frame-throttled and only fires on real page transitions, sections subscribe to fewer state changes, and highlight-scroll listeners share a single event bus instead of one per section.

**\[Improved]** Faster agentic PDF-to-markdown parsing

Agentic PDF→Markdown parsing now parallelizes work per page — single-call vision blocks fan out and agentic blocks run in a small per-page pool, so page latency tracks the slowest block instead of the serial sum. Layout-routing calls were also tuned for lower latency.

**\[Fixed]** Org switcher no longer goes blank for orgs without a logo

The organization switcher trigger now always renders the org's initials as the base layer and only overlays the logo image once it loads successfully — so switching to an org without a logo no longer flashes blank.

**\[Improved]** Parse confidence is now reported even when the model omits logprobs

Per-block parse confidence is now computed and stamped on the rendered markdown regardless of whether the parsing model returns token-level log probabilities. Downstream consumers can read `data-parse-confidence` consistently across all supported parsing providers.

### May 11, 2026

<Warning>
  **\[Breaking + API]** v2 SaaS Manager list, create, and member endpoints now require an organization in the URL

  The bare `/api/v2/saas_manager/...` list and create paths, and the legacy `current` organization shortcuts, have been removed. Workflow versions, file collections, member management, and suggestion routes must now include the org id in the URL. Suggestion endpoints additionally require Member role — Viewers get `403`.

  **Action required:** template the org id into the URL for every v2 SaaS Manager request that previously relied on a bare list/create path or the `current` shortcut.

  <Accordion title="Affected paths">
    * `GET / POST /api/v2/saas_manager/workflow-versions/` → `.../organizations/{org_id}/workflow-versions/`
    * `GET / POST /api/v2/saas_manager/file-collections/` → `.../organizations/{org_id}/file-collections/`
    * `PATCH /api/v2/saas_manager/organizations/current/` → `.../organizations/{id}/`
    * `PATCH /api/v2/saas_manager/organizations/current/logo/` → `.../organizations/{id}/logo/`
    * `GET / POST /api/v2/saas_manager/organizations/current/members/` → `.../organizations/{org_id}/members/`
    * `DELETE / PATCH /api/v2/saas_manager/organizations/current/members/{user_id}/[role/]` → `.../organizations/{org_id}/members/{user_id}/[role/]`
    * `POST /api/v2/suggestions/{fields,field-description,workflow-description,workflow-name,upload-sample}/` → `.../organizations/{org_id}/suggestions/{...}/`
  </Accordion>
</Warning>

**\[New]** Self-service invoice downloads from the usage page

Owners and admins can now open Stripe's hosted Customer Portal from the Usage page to download invoices for past top-ups. Members, reviewers, and viewers don't see the button — invoices are financial documents.

<Accordion title="Technical detail">
  `POST /api/v3/billing/organizations/{org_id}/portal-sessions/` mints a short-lived portal URL bound to the org's billing customer record. Orgs that have never run a top-up return `409 stripe_customer_not_provisioned`; the UI surfaces this as a "Top up first" toast. New environment variable `STRIPE_BILLING_PORTAL_RETURN_URL` falls through to `${FRONTEND_URL}/usage`. Configure the operator side (Invoice History toggle, ToS / privacy URLs, default return link) in the Stripe dashboard.
</Accordion>

**\[Improved]** Cross-tenant probes on v3 detail endpoints now return 404

v3 file results, file-collection extraction status, file-collection results, and webhook delete now derive the active organization from the resource itself. URLs are unchanged. Cross-organization requests get a `404` to mask resource existence; same-org callers are unaffected.

**\[Improved]** Hovering a stacked bounding box highlights every overlapping box

When several bounding boxes overlap, hovering reveals each one stacked under the cursor instead of just the topmost. Rapid mouse movement is collapsed to one diff per frame so the UI stays responsive.

**\[Improved]** Consistent extracted-object UI across Define, Refine, and File Results

Object and table fields now render the same way across the schema builder, refine step, and file-results panel — same headers, same skeletons, same status states. Refine correctly shows error / cancelled copy on failed extractions instead of an empty data view.

### May 9, 2026

<Warning>
  **\[Breaking + API]** Detail and per-resource v2 endpoints no longer read `X-Current-Org`

  Datapoints, manual datapoints, extraction results, workflow-version files, file collections, workflow versions, and workflow detail endpoints now derive the active organization from the resource itself. Non-resource list/create endpoints (API key CRUD, manual-datapoint create, workflow list/create) moved under `/api/v2/saas_manager/organizations/<org_id>/<resource>/`.

  **Action required:** stop sending `X-Current-Org` to v2 endpoints — derive the org from the resource (URLs unchanged for detail) or from the new org-scoped URL for list/create. Non-members on a same-resource probe now get `404`; in-org callers with insufficient role get `403`.
</Warning>

### May 8, 2026

**\[API]** Create workflows with full graph definition in a single call

You can now define a workflow's parse, classify, split, and extract steps in one request — the workflow, its initial version, and all extraction fields are created atomically. The existing workflow-create endpoint is also fully typed, so SDKs get auto-complete and field-level validation.

<Accordion title="Technical detail">
  `POST /v2/workflows/with_graph/` accepts the complete graph plus per-extract-node fields and creates everything in a single transaction. The original `POST /v2/workflows/` body is now declared with discriminated field types.
</Accordion>

**\[New]** Python builder for typed workflows

The Python SDK now ships a fluent `Workflow` builder: `.parse().classify(...).extract(...).split(...).build()` produces a fully validated request. `branch=` and `route_from=` accept either an id string or the typed category/rule you registered earlier, so typos fail at the call site instead of at build time.

**\[Improved]** Workflow graph validation catches more errors before run

Non-branching nodes with multiple outgoing edges, missing `route_from` on classify→split chains, and broken graph transitions are now rejected with clear validation errors instead of producing confusing runtime failures. The new typed graph payload also rolls back cleanly when persistence of one node fails partway.

**\[New]** Voucher codes on signup grant extra wallet credit

Organization creation now accepts an optional `voucher_code` that grants extra wallet credit to the new org. Vouchers (expiry, usage cap, amount) are managed in the admin panel; invalid or expired codes surface as a clear `400` instead of breaking signup.

**\[Improved]** Doc-viewer table polish — resize, unified sync toggle, safer persisted state

The object-results table is now vertically resizable with sensible bounds (header + one row min, all-rows-visible max). The PDF toolbar's separate auto-focus and auto-scroll preferences collapse into a single "Sync document and results" toggle, and the lookup-files dialog no longer overflows. Co-mounted views in the same tab stay in sync, and corrupt persisted preferences self-recover instead of breaking the UI on load.

### May 7, 2026

**\[Improved]** Org isolation on v2 detail endpoints is now resource-derived

Detail endpoints (datapoints, manual datapoints, extractions, workflow-version files, file collections, workflow versions, workflows) now authorize the caller's role against the resource's organization instead of a request header. Cross-tenant probes return `404`; in-tenant callers with insufficient role return `403`. URLs are unchanged.

**\[Improved]** More accurate extraction confidence via LLM judge

For extractions where the model doesn't return logprobs (or where you prefer the judge), a separate judge model now scores each extracted value against its source markdown and outputs a calibrated 0–100 confidence. Object and table cells are batched for cost efficiency. Choose the strategy per task with `extraction_confidence_strategy = none | auto | judge`.

**\[Fixed]** Login no longer spins on token-exchange failures

The post-login callback now exits cleanly when token exchange or profile fetch fails: a 15-second watchdog renders an error page with a logout option, OAuth errors surface explicitly, and tokens are purged on error.

**\[New]** Bounding boxes for split + extract workflows

The PDF viewer now renders bounding boxes for split + extract workflows by scoping mapped locations per panel — the matching split when navigating by partition, the file's extraction otherwise. Parse, split, classify, and validation panels paint no boxes, as intended.

**\[Fixed]** Renamed split categories no longer lose historical sub-documents

When you rename a split rule across versions (e.g. "Invoice" → "Bill"), historical sub-documents continue to surface under the renamed tab — the UI now filters by the stable split-node identifier instead of the display string.

### May 6, 2026

**\[New]** Top up credits with Stripe Checkout

You can now purchase wallet credit through Stripe Checkout from the Usage page. Sessions are idempotent (no duplicate charges on retry), and every successful top-up generates an invoice you can download later from the Customer Portal.

<Accordion title="Technical detail">
  `POST /api/v3/billing/organizations/{id}/checkout-sessions/` validates the requested credit count against per-session min/max bounds and returns a hosted-page URL. Inbound `checkout.session.completed` webhooks are HMAC-verified and serialized to guarantee exactly-once credit grant under contention.
</Accordion>

**\[New]** File results panel with explicit processing states

The per-file extraction panel now renders explicit Unprocessed, Processing, Error, and Processed states with run and retry actions. Heavy result queries only run once the file is actually processed, removing wasted requests and inconsistent loaders for unprocessed files.

**\[New]** Search inside the PDF viewer

The PDF viewer now has a toolbar search bar with case-sensitive/insensitive matching, prev/next navigation that wraps, a `Cmd/Ctrl-F` shortcut, and cross-page highlighting. Search stays responsive on large PDFs thanks to debounced lazy text extraction.

**\[API]** Workflow results now expose classifications, splits, and flat extractions

`GET /v2/workflows/{wid}/files/{cid}/results/` now returns `classifications`, detailed `splits` (with per-file pages and partitions), and a flat `extractions` list keyed by `(split_name, partition)` alongside the existing fields. The v3 file-collection results payload exposes the same keys in place — no new endpoint required.

**\[Improved]** Org switcher orders by ownership, membership, then support access

Organizations are now returned in three alphabetically-sorted buckets — owned, member, support access — replacing the prior database-discovery order. Helpful for accounts that belong to many orgs.

**\[Fixed]** Bulk extraction status no longer marks processed files as pending

Bulk status resolution now aligns identifier formats and queries root extractions only, so processed parents are correctly reported instead of stuck on "pending".

**\[Improved]** Other improvements and fixes

File status badge in the file-results header, PDF toolbar staying visible on multi-line wrap, clearer toasts when a dropped file's type isn't accepted, and a fix for crashes on empty drag-and-drop selections.

### May 5, 2026

**\[Improved]** Smart Lookup retries based on residuals

Smart Lookup now wraps the worker in an orchestrator + auditor loop that can iterate up to three times if residuals or tool-call signals suggest more work is possible. Search blends BM25 and Jaro-Winkler matching with a per-call query cap to improve recall while keeping runtime bounded.

**\[Improved]** Splits panel is now navigable

Clicking a split or partition now switches the extraction tab, persists the selected partition in a URL search param, and scrolls the PDF viewer to the matching page. Refresh-safe deep links to a specific partition just work.

**\[Fixed]** Duplicating workflows with reused field names no longer fails

Workflows that contain multiple extract nodes reusing the same top-level or nested field names can now be duplicated cleanly — field lookups are keyed per source node.

**\[Fixed]** Classify-only workflows handle unwired LLM categories

When the classifier returns a category with no downstream edge, the run now falls through to the end instead of crashing. Classify-only workflows also persist classification records so results are available afterwards.

**\[Improved]** Smart-table list-of-object rows are page-ordered

Smart-table list-of-object extraction rows are now sorted by source page and block, so output is stable and easy to align with the source document across runs. Rows with missing page info are placed last.

**\[Improved]** Cleaner empty states for splits and category grids

Workflows with no rows now render a clean overlay in the splits and category grids instead of stale placeholders.

### May 4, 2026

<Warning>
  **\[Breaking]** Audio uploads are no longer accepted

  The dropzone no longer accepts audio files (`.mp3`, `.wav`, etc.). Existing audio files already in your collections remain viewable and downloadable.

  **Action required:** if your integration uploaded audio for transcription, switch to a supported format. No deprecation window — audio uploads are rejected starting today.
</Warning>

**\[Fixed]** Authentication errors now return the correct status codes

Unauthenticated requests now return `401` (with a `WWW-Authenticate` header) instead of being coerced to `403`. Missing email-claim or unknown-key authentication paths return `AuthenticationFailed` cleanly instead of `500` errors.

**\[Improved]** Profile-fetch failures show a clear error page

Repeated `/profile` failures after login now surface an error page with a logout option instead of stranding users on a spinner. Global query errors are also reported with the failing endpoint for faster triage.

**\[Fixed]** Corrupted local preferences no longer break the UI on load

Persisted UI preferences (sync toggles, panel sizes, etc.) are now safely parsed; corrupt keys are dropped on initial load and reported to monitoring with the offending storage key.

## April 2026

### April 30, 2026

**\[New]** New organizations get 5,000 free credits

Every new organization is now automatically granted 5,000 credits on creation, recorded as a standard billing transaction.

### April 28, 2026

<Warning>
  **\[Breaking + API]** Split confidence is now a 0–100 integer

  `SplitRange.confidence` is now an integer in `[0, 100]` end-to-end (graph output, backend validation, database, and frontend rendering) instead of a 0–1 float. Existing values were migrated automatically.

  **Action required:** if your integration reads `confidence` from split results, expect integers (`87`) instead of floats (`0.87`). Confidence badges no longer render `0.87%` or `8700%`.
</Warning>

<Warning>
  **\[Breaking + API]** Workflow results response shape is now canonical

  `GET /v2/workflows/{wid}/files/{cid}/results/` now always returns `{collection_id, verification_url, parse, extraction}` — parse-only workflows return `extraction: null` instead of omitting the key. The internal `field_id` no longer leaks in responses.

  **Action required:** if your integration relied on `extraction` being absent for parse-only workflows, check for `null` instead. Cookbook examples and the API reference have been rewritten to match.
</Warning>

**\[New]** Classify nodes return evidence and confidence

Classify nodes now return a structured result with category, evidence, and confidence. The UI surfaces confidence and evidence in the category data viewer, with a dedicated Classify tab at workflow level and a per-file Classify tab with verdict cards. The new `classifications` include option exposes the same data via the file API.

**\[New]** HTML files render inside the document viewer

You can now upload `.html` and `.htm` files and view them in the document viewer. Files render inside a sandboxed, sanitized iframe to block script execution and same-origin access.

**\[API]** Better OpenAPI for SDKs and try-it surfaces

`POST /v2/workflows/` now declares its JSON request body, so SDKs and AI agents stop falling back to raw `extra_body` dicts. Bearer authentication is now declared as a proper API-key security scheme, so try-it panels render the lock icon and treat auth as required. Webhook responses now expose `url` as an `HttpUrl` and `created_at` as a `datetime`.

**\[Improved]** Smart-table search now reaches inside table cells

Smart-table search now scans both page prose and table cells, returning cell matches with `{table, row, column, value}` provenance. Scalar values inside tables are no longer dropped behind prose hits.

**\[Improved]** Per-category accuracy and confidence charts in monitoring

The monitoring tab now groups per-field accuracy and confidence charts into per-category sections for multi-schema workflows, with sensible fallbacks and an improved empty state.

**\[Improved]** Validate view now shows results for classify-routed files

Per-file validation now displays extracted data for files routed via Classify by synthesizing entries from the file's results when pinned to an extract node. The partition selector and sibling tabs are hidden when they don't apply.

**\[Fixed]** Improved processing of `.docx` and `.msg` uploads

`.docx`, `.msg`, and other non-PDF uploads now hydrate from their converted PDF instead of feeding the original binary into the PDF parser. Markdown for these formats is now correct and no longer triggers spurious "corrupt PDF" warnings.

**\[Improved]** Other improvements and fixes

Reliable PDF callback uploads via short-lived presigned URLs (no more "Request Too Big" errors on large rendered PDFs), file-detail columns in splits and per-category tables, `search_text` helper exposed in the smart-table sandbox, object/list extraction fields no longer dropped from v3 results, timeouts and retries on frontend list requests, lighter `/files` list payloads, and `eval` and XFA disabled in the PDF viewer for a smaller attack surface.

### April 27, 2026

**\[Fixed]** Smart-table extraction no longer drops scalar fields

Smart-table extraction plans now must assign every schema field to either a table or a `scalar_fields` list, rejecting incomplete plans so scalar fields can no longer be silently dropped.

**\[Fixed]** Master-file + extract workflows now run without error

Non-split workflows that combine a master-file node with an extract node now run cleanly — master-file nodes can augment an extract branch without triggering single-non-split-node validation.

### April 24, 2026

**\[Improved]** Tabbed data viewer on workflow detail

Reworked the workflow detail data viewer into a tabbed layout with shared grid skeletons and improved loading behavior. Added an inline workflow-name editor and tighter studio reload/loader behavior when workflows change.

**\[Fixed]** Empty splits tab now shows a clean overlay

The splits tab now uses a proper no-rows overlay (resolving the previous "Element type is invalid" error) and filters out splits with no results, so empty entries no longer appear.

**\[Improved]** More content captured during PDF parsing

Text missed by layout segmentation now reaches downstream parsing through orphan OCR detection and synthetic text-block injection. Orphans are clustered into lines, merged into nearby blocks where appropriate, or promoted to new blocks; redundant smaller blocks are absorbed to prevent duplicates.

**\[Fixed]** Date parsing in the smart-table sandbox no longer silently returns `None`

The sandbox now allows `datetime.strptime`'s runtime dependencies, so parsed dates are correctly returned instead of being silently converted to `None`.

**\[Fixed]** Smart-table no longer overwrites populated fields with empty values

Storing an empty list can no longer overwrite an already-populated list field (e.g. a transactions table). Column-statistics summaries now clearly label examples and indicate additional unique values so the worker doesn't mistake samples for full datasets.

**\[Improved]** Better table merging when headers differ

Same-title tables with differing headers are disambiguated to avoid dropping rows during merge, and the block-parsing prompt and markdown assembly capture text the model sees that isn't covered by any block id.

**\[Fixed]** Classifier results survive workflow amendments

Category queries continue to return populated classifications and datapoints after prompt adjustments — the version-compatible result builder now maps datapoints across amended versions by their stable persistent identifier.

### April 23, 2026

**\[Fixed]** Partitioned splits now flow into Validate and Smart Lookup

When a split rule declares a `partition_key` and the downstream extract is wired into a Validate or Smart Lookup, each partition is now processed independently — one validation pass per partition, one lookup pass per partition — and the results attach to the right per-partition child extraction. Previously the downstream node saw nothing.

**\[Improved]** Per-partition fan-out is now bounded

Per-partition fan-out is now capped (default `4` parallel branches) so a 100-partition document running through extract → validate → lookup doesn't saturate the model provider.

### April 20, 2026

<Warning>
  **\[Breaking]** Smart Lookup is now part of Extract

  The standalone Smart Lookup node has been removed — lookup behavior now lives on the Extract node. You manage lookup files directly from the Extract menu, enable lookups per-field via a checkbox in the field editor, and see inline badges when lookup files are missing.

  **Action required:** none for most users — existing workflows migrate automatically and inherit Extract's model settings. If you scripted Smart Lookup as a standalone node, switch to enabling lookup fields on Extract.
</Warning>

<Warning>
  **\[Breaking]** Smart Table is now an agentic mode on Extract

  The standalone Smart Table node has been removed — replaced by a Standard / Agentic mode selector on the Extract node.

  **Action required:** none for most users — existing Smart Table workflows migrate automatically to Extract with `mode="agentic"`.
</Warning>

**\[New]** Splits results tab in workflow results

Added a Split tab in the extraction results UI showing split-group cards and a collapsible per-page table, plus a `document_split_groups` include option on the workflow-version file endpoint with category, partition value, page range, and confidence.

**\[Improved]** Redesigned Classify and Split menus

Both nodes now share a Config / Options tab layout with validated card-based editors, uniqueness checks, reserved "Other" handling for Split, click-to-edit from the Config preview tables, and cleaner Save-from-header behavior that closes open forms first.

**\[API]** Split groups consolidated by category and partition value

`document_split_groups` responses now group rows by `(category, partition_value)` and return consolidated `page_ranges` arrays instead of per-row `page_start` / `page_end` fields.

**\[Improved]** Better figure parsing in PDFs

The figure-enhancement prompt now uses chart-type-specific strategies, handles multi-panel figures, gives statistical-annotation guidance, and applies column-header rules. Segment deduplication now preserves the largest picture blocks to avoid losing visual context.

**\[Improved]** Inline field-level errors in the field editor

Replaced the top-level field-form error banner with inline, field-adjacent errors, column-level highlighting, and deduplicated summary messages for enum options and nested fields.

**\[Improved]** Default extraction model upgraded to `gpt-5.2`

The default extraction model is now `gpt-5.2` across markdown and PDF extraction paths.

**\[Fixed]** OneDrive and Google Drive imports work reliably in production

Fixed the cloud-connector reliability issues that affected OneDrive and Google imports in staging and production: a stricter content-security policy now allows the required SDK scripts and frames, SDK load errors surface instead of failing silently, consumer-vs-business OneDrive accounts are detected correctly, and a popup-polling leak is gone.

**\[Improved]** Other improvements and fixes

Validation tab only appears when the workflow includes a Validate node and the file has an extraction; `latest_version` endpoint returns a clean `400` for invalid workflow UUIDs instead of a noisy `500`; OAuth callback token exchange no longer stalls; Smart Lookup handles nested object fields correctly; agentic-extraction description typo fixed; partition tools only exposed when split rules define a partition key; null figure-parsing responses no longer crash; markdown viewer no longer has an infinite scroll loop and the markdown download button moved to the top of the panel.

### April 14, 2026

**\[New]** Redesigned Workflow Studio

The Studio now ships redesigned Parse and Extract node menus, smart node placement with auto-connect, a way to add connected nodes from an output handle, empty-state guidance, an updated sidebar with hover cards, and save-from-header with per-node validation errors.

**\[New]** Validate node for cross-document rule validation

A new Validate node type lets you express cross-document rules that run after extraction.

**\[New]** XML file support

Added an XML-to-PDF converter for files with embedded base64 attachments, so XML files extract end-to-end.

**\[Improved]** API keys are hashed at rest, with multiple named keys per user

API keys are now hashed at rest, support naming, and you can hold multiple keys per user — better security and easier rotation.

**\[Improved]** Per-cell parse confidence with logprob breakdowns

Parse confidence now uses cell-level logprobs when available, with per-cell and per-row logprob breakdowns for table blocks.

**\[Improved]** Studio warns when leaving with unsaved changes

The Studio now blocks tab switches when configuration is dirty or an inline editor is open, and preserves in-progress work across tab navigation.

**\[Improved]** Faster, paginated billing usage

The billing usage table is now server-side paginated with summary cards, dramatically improving query performance on large datasets.

**\[Improved]** Visual grounding toggle for local-model compatibility

Added a toggle to disable visual grounding for environments where the local model doesn't support it.

**\[API]** Files nested under workflows; multi-node results; richer OpenAPI

File endpoints are now nested under workflows. A new multi-node results endpoint returns results across multiple graph nodes in one call. The OpenAPI spec is now enriched for better SDK documentation, and the billing API gained pagination, filtering, and ordering.

**\[Improved]** Other improvements and fixes

OCR confidence is now included in segment and table-cell metadata; the schema builder uses a local config store for faster field editing; toast colors, mobile sidebar, long descriptions in choice fields, lookup-file display overflow, and schema editor header sizing fixes; corrupt-PDF hydration errors are now downgraded to warnings.

### April 6, 2026

**\[New]** Agentic document splitter

The batch LLM splitter has been replaced by an agentic workflow. You can now define custom split rules (name, description, optional partition key for grouping repeating sections) directly in the Studio SplitPages node, with a built-in "Other" catch-all rule.

**\[Fixed]** Non-figure blocks no longer receive figure wrapping

Generic non-figure blocks are no longer treated as pictures, so only true figures receive screenshot crops and figure markup in PDF-to-markdown output.

**\[API]** Removed duplicate health check endpoint

A duplicate health check endpoint has been removed from the API surface.

### April 1, 2026

<Warning>
  **\[Breaking + API]** Workflow results endpoint always returns unified JSON

  `GET /v2/workflows/{id}/results/` now always returns unified JSON (parse markdown plus extraction data per file). The `output_format`, `as_lists`, and file-filter query parameters have been removed; filter to a single file with the new `file_id` query parameter.

  **Action required:** if your integration relied on `output_format`, `as_lists`, or the legacy file-filter param, remove them and use `file_id` instead.
</Warning>

**\[Improved]** Workflow usage endpoint supports pagination and ordering

The workflow usage endpoint now supports server-side pagination (`page`, `page_size`), ordering, and database-level filtering with the standard `results` response wrapper.

**\[API]** v1 routes restored with deprecation headers

v1 routes (workflows, jobs, uploads/results) are back with `Deprecation` and `Sunset` response headers and `X-API-Key` auth fallback for backward compatibility.

## March 2026

### March 30, 2026

**\[Improved]** Rate limits split between submission and general endpoints

Rate limits are now applied independently to file-submission endpoints (60 RPM) and general endpoints (600 RPM). Submission operations (run, upload, create file collection) have a stricter limit; read and management endpoints share a higher limit. The two tiers use independent counters.

### March 26, 2026

<Warning>
  **\[Breaking + API]** Removed deprecated `as_lists` parameter from the v2 extraction endpoint

  The `as_lists` query parameter on the v2 extraction endpoint has been removed.

  **Action required:** drop `as_lists` from any v2 extraction call. The response shape is the unified default introduced earlier this month.
</Warning>

**\[New]** Paste images into the workflow creation chat

Paste images directly into the workflow creation chat for more contextual AI suggestions (up to 3 images, 3 MB each).

**\[Improved]** Concurrent edit protection on workflow saves

Workflow configuration saves now detect concurrent edits and surface them, preventing accidental data loss when two users edit at once.

**\[Fixed]** Image uploads in workflow suggestions accept files within size limit

Image uploads in workflow suggestions are no longer rejected when they're within the allowed size limit.

**\[Improved]** Other improvements and fixes

Workflow-name loading indicator no longer gets stuck; buttons now show the pointer cursor on hover.

### March 25, 2026

<Warning>
  **\[Breaking + API]** v2 extraction is now collection-first

  Run extraction now returns a file-collection UUID; poll results via `GET /v2/files/{id}/extraction/` (`412` while pending, `200` when ready).

  **Action required:** if your integration parsed extraction results synchronously from the run-extraction response, switch to polling the new collection-status endpoint.
</Warning>

<Warning>
  **\[Breaking + API]** Upload endpoint no longer accepts `file_collection_id`

  `POST /v2/files/` no longer accepts a `file_collection_id` parameter — collections are immutable after creation.

  **Action required:** create a new collection per upload batch instead of appending to an existing one.
</Warning>

<Warning>
  **\[Breaking + API]** Deprecated `/jobs/` endpoints removed from the v2 API

  The deprecated `/jobs/` endpoints are gone.

  **Action required:** use `GET /v2/files/{id}/extraction/` to poll extraction state instead.
</Warning>

<Warning>
  **\[Breaking + API]** Simpler v2 pagination shape

  v2 pagination responses now contain `count`, `page`, `page_size`. The old `total_pages`, `next`, and `previous` fields are gone.

  **Action required:** compute next/previous page numbers from `count`, `page`, and `page_size` client-side, or rely on `page` boundaries (`1` and `ceil(count / page_size)`).
</Warning>

**\[Improved]** Upload multiple files in a single request

`POST /v2/files/` now accepts multiple files in a single request.

**\[API]** Customer-facing docs now show Bearer authentication

Customer-facing API docs now reflect the `Authorization: Bearer <token>` authentication method.

**\[API]** v2 documentation reflects path-based versioning

All API documentation has been updated for v2 endpoints with path-based versioning (`/v2/` prefix).

### March 19, 2026

**\[New]** EML and MSG email file support

You can now upload `.eml` and `.msg` (Outlook) email files for extraction. Embedded images in EML files are correctly inlined during conversion.

**\[API]** New `allow_extend` parameter to append rows on submit

Submit now supports an `allow_extend` parameter that lets agents append rows to existing submissions instead of replacing them.

**\[API]** v3 types and graceful v2-to-v3 delegation

Updated v3 types, v2 file endpoints now delegate to v3, and v3 errors are translated cleanly for v2 callers.

**\[Improved]** Better OCR text detection

OCR text detection now uses word-count and spatial-coverage heuristics instead of simple truthy checks, and the screenshot is always sent to the model for context.

**\[Improved]** Smart-table resubmit replaces rows instead of skipping

Smart-table resubmit now replaces the previous output instead of skipping, with scoped worker context and a consolidated briefing.

**\[Improved]** Per-user rate limits via API keys

Authentication and rate-limiting are now separated, with per-user rate limits derived from API key identity.

**\[Improved]** HTTP security headers added per pentest findings

Added HTTP security headers across the API per recent pentest recommendations.

**\[Fixed]** Bounding boxes align on pages with different dimensions

Bounding boxes no longer drift on documents that mix page sizes.

**\[Improved]** Other improvements and fixes

Table grounding improvements (HTML cell-id injection now gated and tuned), better login-page rendering with static assets loading correctly, cleaner full-page loader on the auth callback, and PDF viewer fixes for documents that mix page sizes (e.g. US Letter and A4).

### March 18, 2026

**\[New]** v2 API with versioned endpoints, pagination wrappers, and improved docs

A full v2 API surface ships: versioned endpoints, pagination wrappers, workflow routes, list-extractions, provenance API, and richer OpenAPI documentation.

**\[API]** Bearer authentication alongside API key

The API now supports the standard `Authorization: Bearer` header alongside the existing API key header.

**\[API]** API rate limiting

The API now enforces per-endpoint rate limits.

**\[Improved]** Smart-table extraction planning with table-scoped delegates

The smart-table extraction planner now uses table-scoped delegates, agent limits that scale with document complexity, and improved sandbox tools.

**\[Improved]** Better row-level grounding for tables

Row-level grounding for tables is now more robust via token-overlap pre-mapping and accuracy fixes to anchor finders.

**\[Improved]** Faster page rotation

A cascade decision engine with landscape support and optimized center-crop reuse makes page-rotation detection faster.

**\[Improved]** File metadata toggle and filtered counts

A new metadata visibility toggle on the table header persists locally. File counts now reflect filtered results instead of selected rows, and action buttons show loading state.

**\[Fixed]** File-collection creation works in production

A regression that returned `403` for all file-collection creation calls in production has been fixed.

**\[Fixed]** File download naming

Markdown downloads no longer end up with doubled file extensions.

**\[Improved]** Other improvements and fixes

Security update for vulnerable dependencies; a botched migration that assigned the same UUID to all files has been corrected.

### March 5, 2026

**\[New]** Raw JSON viewer alongside validated results

Raw JSON extraction results are now available in a dedicated tab alongside validated results, for easier inspection.

**\[Improved]** Better document layout analysis

Enhanced layout analysis for more accurate table and section detection.

**\[Improved]** Hover cards on simple field results

Simple field results now display hover cards with additional context.

**\[Improved]** Resizable workflow creation chat

The workflow creation chat panel is now resizable.

**\[Improved]** Refined validation result cards

Validation result cards have refined styling that makes validated datapoints easier to identify at a glance.

**\[Fixed]** Confidence handling for undefined values

Confidence scores no longer break when a value is undefined.

**\[Fixed]** Clipboard copy errors now show clear feedback

Clipboard copy errors are handled and surfaced with clear feedback instead of failing silently.

**\[Improved]** Other improvements and fixes

UI styling consistency improvements across result cards and the file viewer.

### March 4, 2026

**\[New]** Production-grade webhook system

You can now create webhook subscriptions, receive real-time event notifications, and configure automatic retries on failure.

**\[Improved]** Better error categorization

Error messages now use clearer categories, making it easier to understand and resolve extraction issues.

**\[Improved]** Schema editor auto-saves

Fields and schema changes now auto-save when switching tabs or editing nested fields, preventing accidental data loss.

**\[Improved]** Validation page with keyboard navigation

The validation page now has improved result cards with keyboard navigation and better loading states.

**\[Improved]** Faster OCR processing

OCR text recognition is now significantly faster on large documents thanks to GPU-accelerated batch processing.

**\[Fixed]** Race conditions in field-form submission

Race conditions in field-form submission and save-state management have been eliminated.

**\[Fixed]** Security vulnerability in a third-party dependency

Fixed a security vulnerability in a third-party dependency.

## February 2026

### February 27, 2026

**\[Improved]** Custom date range filter on the usage page

The usage page now lets you filter by a custom date range instead of a fixed year, giving more precise control over billing and usage reports.

**\[Fixed]** Reverting to the original value marks fields as verified

Editing a field value back to its originally extracted value now correctly marks it as human-verified rather than human-edited.

### February 25, 2026

**\[Improved]** Better OCR accuracy

Upgraded text recognition with better support for Latin languages.

**\[Improved]** LLM cost tracking in billing

Billing transactions now include per-extraction LLM token counts and costs for full cost transparency.

**\[Improved]** More accurate table extraction

Improved accuracy when extracting data from tables with closely spaced rows or surrounding text.

**\[Improved]** Faster processing during peak usage

Extraction now scales to handle high volumes, reducing wait times during busy periods.

**\[New]** Sample workflows on new accounts

New accounts are now prepopulated with sample workflows and documents for a guided onboarding experience.

**\[Fixed]** Smart-lookup crash on empty corpus

Smart lookup no longer crashes when the search corpus is empty.

**\[Improved]** Faster extraction callbacks and field-list endpoints

Performance improvements on extraction callbacks and field-list endpoints.

**\[Fixed]** Readable file download filenames

File download filenames are now readable by the frontend.

### February 21, 2026

**\[Improved]** Visual grounding 2.0

Completely revamped evidence highlighting with block-level and element-level precision, making it easier to trace extracted values back to their exact location in the source document.

**\[New]** Billing dashboard

New usage dashboard with remaining credits, weekly summaries, per-workflow usage breakdown, and CSV export.

**\[Improved]** Advanced document layout analysis enabled in production

The advanced document segmentation pipeline is now enabled in production for improved table and section detection.

**\[Fixed]** Bounding-box highlights no longer stuck after navigating fields

Highlights are now cleared correctly when you move between fields.

**\[Fixed]** Confidence calibration

Fixed calibration configuration for confidence scoring.

### February 14, 2026

**\[New]** Redesigned home page

A personalized home with recent workflows, at-a-glance file-status indicators, and a streamlined workflow creation experience with templates.

**\[New]** Redesigned schema builder

A new two-step Define & Refine flow with AI-powered field suggestions, inline file previews, and drag-and-drop uploads.

**\[Improved]** Faster page rotation

About 2× faster page-orientation detection per page.

**\[Improved]** More reliable AI field suggestions

Field suggestions now use an upgraded model for higher reliability.

**\[Fixed]** Smart-lookup settings saved on publish

Smart-lookup settings are now correctly saved when publishing a workflow version.

**\[Fixed]** Workflow studio sidebar overflow

Fixed sidebar overflow in the workflow studio.

### February 7, 2026

**\[Improved]** New dashboard visual design

Refreshed visual design across the platform with updated components, colors, and layout.

**\[Improved]** Faster, more reliable evidence highlighting

Evidence highlighting now uses an event-driven architecture, making interactions faster and more reliable across all views.

**\[Improved]** Files and organization logos served securely through the backend

Files and organization logos are now served securely through the backend, improving security and simplifying local development.

**\[New]** AI workflow-name suggestions

New AI-powered workflow-name suggestions based on your description and uploaded files.

**\[Improved]** File status counts in workflow listings

Workflow listings now show at-a-glance counts of files by processing status.

**\[Fixed]** Document-element classification

Fixed a classification regression that could cause incorrect layout detection in some documents.

### February 3, 2026

**\[New]** Smart Lookup

New feature to enrich extracted data with external reference files using AI-powered matching.

**\[Improved]** Field source tracking

Fields now track their source (extraction vs. smart lookup) for better traceability.

**\[Fixed]** Manual field creation processing

Field data processing now correctly handles manually created fields.

## January 2026

### January 27, 2026

**\[New]** Usage-based billing

Introduced billing for data extraction operations with transparent per-page pricing.

**\[Improved]** Faster, more reliable field suggestions

Field suggestions are now faster and more reliable when creating workflows.

**\[Fixed]** Field value upload formats

Resolved an issue where certain field value formats could cause upload errors.

### January 20, 2026

**\[Improved]** Faster OCR processing

Document processing speed improved with optimized text recognition.

**\[New]** Extraction review from the dashboard

Administrators can now review extractions directly from the dashboard.

**\[Improved]** Organization credits management

Added the ability to manage organization credits for usage billing.

### January 13, 2026

**\[Improved]** Faster document processing

Significantly improved processing speed for large documents.

**\[New]** Filter results by row count

Filter extraction results by the number of rows extracted.

**\[New]** Configurable per-workflow page rotation

Configure custom page-rotation strategies per workflow.

### January 6, 2026

**\[Improved]** Enhanced results filtering on download

Enhanced results download with additional filtering options.

## December 2025

### December 15, 2025

**\[Improved]** Improved reliability

Enhanced system stability and faster deployments.

### December 1, 2025

**\[Improved]** Faster results retrieval for workflows with many documents

Optimized results retrieval for workflows that span many documents.

**\[Improved]** Detailed usage tracking

Added more detailed usage tracking for extraction operations.

## November 2025

### November 24, 2025

**\[New]** Create data rows from specific extractions

You can now create new data rows directly from specific extractions.

### November 17, 2025

**\[Improved]** More accurate table extraction

Improved accuracy when extracting data from complex tables.

**\[API]** New `/user/is-enabled` endpoint

A new endpoint lets you check account status.

**\[Improved]** User info on file validation responses

File validation responses now include the user who performed the validation.

### November 10, 2025

**\[Improved]** Better handling of long documents

Better handling of documents with many pages via smart chunking.

**\[Improved]** Validation tracking

Track which values have been manually validated or edited.

**\[Improved]** More accurate extraction

Enhanced extraction accuracy with an updated default model.

### November 3, 2025

**\[Improved]** Better extraction on documents over the context limit

Improved extraction quality for documents exceeding context limits.

**\[Improved]** Highest-confidence merge

When merging extractions, the highest-confidence value is now preserved.

## October 2025

### October 27, 2025

**\[Improved]** Performance improvements across the platform

General performance improvements across the platform.

### October 13, 2025

**\[Fixed]** CSV export field ordering

Fixed field ordering in CSV exports to match workflow configuration.

**\[Improved]** Consistent results ordering

Extraction results now consistently respect field order.

### October 6, 2025

**\[API]** Endpoint to reorder fields within object-type fields

A new endpoint lets you reorder fields within object-type fields.

**\[Improved]** Verification URLs on job results

Job results now include verification URLs for easy access.

**\[Fixed]** Fields with similar names in nested structures

Fixed handling of fields with similar names in nested structures.

### September 30, 2025

**\[Improved]** Redesigned extraction pipeline

Redesigned extraction workflow for better reliability.