Changelog - anyformat

July 2026

July 7, 2026

[New + API] API v3 launched — document packets and runs as first-class resources The public API’s new stable major is live under /v3/, rebuilt around three resources: workflows, document packets (the runnable bundle of 1+ files — v2’s “file collection”, renamed and promoted), and runs (one execution attempt of a workflow against a packet, with its own id, status, and results). The full reference lives in the new v3 API tree; v2 users can follow the path-by-path migration guide. Highlights:

Flat run reads, no more 412-polling — GET /v3/runs/{run_id}/ always returns 200 with a status field; results carries the (unchanged) results envelope inline once status reaches processed. The /results/ sub-path is gone.
One upload shape — POST .../upload/ groups 1–10 files into a single packet atomically, .../upload/run/ does upload + run in one call and returns a run_id, and .../upload/from-url/ imports 1–10 HTTPS URLs all-or-nothing.
Re-run without re-uploading — POST /v3/document-packets/{id}/run/ creates a fresh run each call; earlier runs stay readable.
The workflow read is the definition — GET /v3/workflows/{workflow_id}/ returns the complete typed graph with a persistent_id per field, and PATCH echoes that shape back: echo the persistent_id and a renamed field keeps its identity (analytics, ground truth, and quality metrics stay attached). There is no separate /definition/ endpoint and no PUT in v3.
Keyset pagination everywhere — every list returns {items, next_cursor} with ?limit= (max 100) and ?cursor=; no totals or page numbers.
Idempotent submission — the upload and run-trigger POSTs accept an Idempotency-Key header; a retry with the same key replays the original response instead of creating a duplicate packet or billing a second extraction.
X-API-Version: 3.0.0 is stamped on every /v3/ response.

v2 is unchanged and keeps serving traffic through its announced deprecation window (see the July 3 entry below). Authentication, the error envelope, the results envelope, webhooks, and rate limits are identical across both versions.

July 6, 2026

[Changed + API] Dataset membership moves to collection-keyed, idempotent endpoints Test-set membership is being renamed to the product’s dataset vocabulary and re-keyed onto the collection — the identity a document already carries everywhere else (the id in every document response). Membership now lives under the workflow: PUT /api/v3/workflows/{workflow_id}/dataset/files/{collection_id}/ adds a document to the dataset (optionally snapshotting a chosen extraction as ground truth via { "extraction_id": … }), DELETE the same path removes it, and POST /api/v3/workflows/{workflow_id}/dataset/files/bulk/ adds a filtered grid selection. All three writes are idempotent — re-adding a member or removing a non-member is a 200/204 no-op instead of a 409. The test-file list rows now include a collection_id so callers can address membership without a second lookup. The previous file-scoped routes POST /api/v3/files/{file_id}/test-set/ and DELETE /api/v3/files/{file_id}/test-set/ still work but are deprecated and now idempotent too; migrate to the dataset/ paths above.

July 3, 2026

[Deprecated + API] v2 enters its deprecation window — Deprecation/Sunset headers on every response With v3 as the new stable major, /v2/ is frozen: no new features, existing behavior untouched. Every /v2/ response now carries the RFC 8594 deprecation signals — Deprecation: Fri, 03 Jul 2026 00:00:00 GMT and Sunset: Sat, 03 Oct 2026 00:00:00 GMT. Per-route Link: <...>; rel="successor-version" headers pointing at the v3 equivalent follow as the mappings are wired up. /v3/ responses carry none of these headers. Action recommended: wire an alert on the Sunset header so migration work can’t be missed — Versioning & deprecation shows a five-line client middleware that does it. Then plan the move with the v2 → v3 migration guide. No client changes are required today; v2 keeps working until the sunset date.

June 2026

June 29, 2026

[New + API] Deterministic validation rules — instant, free checks alongside AI rules A Validate step can now mix two kinds of rules. AI rules stay as they were: you describe the condition in plain language and a model judges it. Deterministic rules are new — structured checks that run in plain code with no model call, so they’re instant and free. Seven check types ship: number range, date range (with a relative today bound for not-expired checks), arithmetic (subtotal + tax = total, with a tolerance), comparison (a field vs a value or another field), one-of (value in an allowed set), pattern (regex), and required (present and non-empty). Build them in Studio with the new AI / Deterministic toggle on each rule, or author them via the API and SDKs — deterministic rules reference fields by name and carry a check payload, AI rules carry a description. Pattern checks run on a linear-time regex engine, so a user-supplied pattern can’t hang an extraction. Existing workflows are unchanged: rules default to kind: "ai". See Validation rules.

June 22, 2026

[New] Self-serve Business subscriptions, recurring monthly credits, and a billing incident banner Business is now a self-serve plan. From the dashboard, an organisation owner or admin can click Upgrade to Business to open Stripe Checkout, enter a card, and become a paying Business customer in a single round-trip — no support ticket required. Once active, the organisation receives a fresh 100,000-credit monthly grant on each successful renewal, replacing the previous month’s grant rather than stacking on top of it. Enterprise is unchanged in spirit (still sales-led) but rides the same plumbing: support agrees on a custom monthly grant and price, mints a dedicated Stripe price, and pins both to the org on a BillingPlan row. Top-ups still work on top of the recurring grant for one-off credit additions. Subscription customers get a Customer Portal link from the Usage page to manage their payment method and cancel the subscription themselves. When Stripe reports a failed renewal, a billing incident banner appears on every authenticated page describing the current state (PastDue during the grace window, Suspended if grace expires, Canceled after cancellation) with a deep-link to the action that resolves it — update the card, reactivate, or restart the subscription. Late-payment recovery is exact: if a card is fixed after the grace window has zeroed the grant, the next successful invoice restores cycle_monthly_grant − consumed_during_unpaid_cycle credits (floored at zero), so a customer never pays for and then loses what they already used. This rollout is gated behind BILLING_SUBSCRIPTIONS_ENABLED. Existing Business and Enterprise organisations whose tier was set by the legacy one-shot upgrade keep their tier through a fallback path until their first real subscription invoice clears.

June 18, 2026

[New + API] GET /v2/workflows/{workflow_id}/definition/ — round-trippable workflow definition A new sub-resource returns the workflow’s typed graph in exactly the shape PUT /v2/workflows/{workflow_id}/ accepts. SDK callers can now GET, mutate one field, and PUT the result back without re-authoring the rest of the schema. The PATCH side preserves persistent_id per (node_id, field_name) so an unchanged echo cuts no new version (ChangeKind.NONE modulo edge order), and genuine edits keep persistent_id stable for unchanged-by-id fields — analytics, ground-truth, and monitoring continuity all hold across edits. [Changed + API] Field description is no longer required to be non-empty The typed surface (AnyField.description) previously enforced min_length=1, but Path A persistence (the Studio frontend’s save path) never enforced it — so legacy workflows can carry fields with empty descriptions that the new GET-definition endpoint needs to round-trip cleanly. Empty descriptions are now accepted on POST /v2/workflows/, PUT /v2/workflows/{id}/, and the generated SDKs. SDKs and docs still encourage non-empty descriptions — an empty value degrades extraction quality, it isn’t structurally invalid. [New + API] POST /v2/workflows/{workflow_id}/files/from-url/ — upload by URL instead of multipart A new route accepts an HTTPS URL and tells anyformat to fetch the bytes server-side, returning the same CreateCollectionResponse shape as the multipart POST /v2/workflows/{workflow_id}/files/ upload. Use it when the source document already lives in object storage you can presign (S3, GCS, R2, …) or at a public HTTPS URL — no need to stream bytes through your own backend. The fetch is bounded by a 10-second timeout and the same 20 MB cap as multipart upload, and runs through the outbound-URL SSRF guard so URLs resolving to non-globally-routable IPs (loopback, RFC1918, link-local, cloud-metadata, IPv4-mapped IPv6) are refused before any connection opens. HTTPS-only at the gateway; non-2xx upstream, DNS failure, or timeout surfaces as 422. Documented under Upload File from URL and in Files.

June 15, 2026

[Fixed + API] SDK consumers can pull per-file results through /v2/workflows/{wf}/results/?file_id=... again The v3 route GET /api/v3/files/{file_id}/results/ declared its workflow_version_id query parameter as int, but the v2 gateway forwards the 10-character WorkflowVersion.external_id string returned by /api/v2/.../latest_version/. Ninja’s int coercion rejected the string with 422, breaking any SDK call that chained latest_version into the per-file-results lookup. The route now accepts workflow_version_id as a string, resolves the external_id to the internal PK in-route, and returns 404 (not 422) for an unknown id — matching every other v3 endpoint’s “the public external_id is the single public identifier” contract.

June 11, 2026

[Breaking + API] Object fields support only one level of nestingObject fields have always been limited to one level of nesting — children of an object must be non-object types — but the API surface advertised the opposite: ObjectField.nested_fields was typed recursively, so the generated OpenAPI spec listed object-in-object schemas as valid. The contract is now enforced at the source: posting a schema with a nested object to POST /v2/workflows/ returns 422, and the same constraint is mirrored in the regenerated TypeScript SDK types and the Python SDK CLI (which now returns a clean “only one level of nesting is supported” error instead of a raw pydantic traceback).Action required: schemas that declared an object inside an object must be flattened before submission. Single-level object fields are unaffected.

[Breaking + API] POST /api/v3/extractions/progress removed; replaced by the workflow-progress digestThe batch progress endpoint POST /api/v3/extractions/progress is gone. Callers that want per-extraction progress alongside per-status file counts for a workflow version should switch to the new POST /api/v3/extractions/workflow-progress-digest (see the [New + API] entry below). Single-extraction progress via GET /api/v3/extractions/{uuid}/progress is unchanged.

[New + API] POST /api/v3/extractions/workflow-progress-digest — one round-trip for workflow polling A new digest endpoint returns per-extraction progress for a caller-supplied set of extraction_uuids AND unfiltered per-status file counts for the surrounding workflow_version_id in a single response. The dashboard now polls this endpoint once every 4 seconds while files are processing, replacing three separate polls (/extractions/progress at 2.5s, and the v2 /files/count/ and /files/ polls at 5s). Polling stops automatically when no extraction is queued or in-progress. Counts come back null for unknown or cross-organisation workflow versions; cross-organisation extraction UUIDs are silently dropped, matching the existing single-extraction-progress semantics. [New + API] Cell-level grounding on extraction evidence Table-row and scalar evidence now carries true cell provenance end-to-end: evidence and mapped_locations entries on extraction responses gain optional cell and row keys identifying the exact source cell that fed each value. Evidence without cell-level grounding (non-table datapoints, or table rows the model couldn’t ground precisely) stays byte-identical to before — the new keys are emitted only when present. On the reference invoice that motivated this work, table rows with precise grounding went from 0% to 100% (626/626) with zero wrong-row assignments, and scalars whose value lives in a header table (invoice ids, dates) now resolve to one precise evidence on an LLM-cited page instead of a page-backfilled guess. [New + API] Test-set membership for files Files can now be designated as part of a curated, locked test set. A new boolean is_test_file is returned on file responses, and two v3 routes manage membership: POST /api/v3/files/{file_id}/test-set to add and DELETE /api/v3/files/{file_id}/test-set to remove. Files in the test set are frozen from re-verification: every verification write (verify, unverify, edit, discard at the datapoint level; verify-extraction; add/delete/swap/verify-row/verify-all at the datarow level, across both v2 and v3) now returns 409 Conflict with code ground_truth_locked, so an extraction designated as ground truth cannot be mutated out from under a downstream accuracy scorer. Ground-truth values and accuracy reporting land in a follow-up release. [New] Download lookup files from the Extract node UI Lookup files (master files attached to an Extract node) were upload-only — once added, you couldn’t get them back. The Manage files dialog in the Extract node now has a download button next to each lookup file, backed by a new presigned-URL endpoint GET /api/v2/saas_manager/workflows/{id}/master_file/download/?uri=<s3-uri>. Org membership and the workflow-scoped S3 key prefix are both enforced; the presign is pinned to the canonical bucket so a crafted cross-bucket URI cannot redirect the signed GET. [Improved] Table extraction now always uses the refine engine Table parsing previously branched upfront on a heuristic between a cheap simple_table call (one shot, no retry — silent data loss on empty/truncated output) and a much slower dense_table agent (~10–20× the cost and latency). The classification was fuzzy and data integrity depended on getting that one-shot guess right, so the routing prompt was biased toward dense, over-escalating and burning cost. Every table now takes the same path — cheap parse → audit critic → in-place refinement with tools — with a best-draft-wins guarantee, so a misclassification can no longer lose data. On an 18-document Iberia benchmark refine was 27% cheaper (cheaper on every doc) and substantially faster on the slow tail (worst single doc 1346s → 317s), with all four full_content quality metrics improving. As part of the change the routing prompt now requires ≥2 rows × ≥2 columns of tabular evidence before sending a block to the table pipeline, so label/value pairs, signatures and logos are no longer misrouted as tables. [Improved + API] Workflow analytics timeseries response is much smaller and faster GET /api/v3/workflows/{id}/analytics/timeseries now serves the default monitoring view in a few KB of gzipped JSON instead of 1.7–3.9 MB. The “Overall” series across all fields is shipped pre-collapsed as a new overall_rows array (one entry per extraction), and an opt-in repeatable field_ids query parameter narrows the legacy rows payload to only the fields you want — present-but-empty (?field_ids=) returns counts and overall_rows only. Calls that omit field_ids still receive the full legacy rows, so existing integrations keep working unchanged. The cold-path recomputation now scopes the live aggregation to cold versions instead of the full workflow, self-heals to all-warm on the first full-universe read, and the response is gzip-compressed.

June 10, 2026

[New] Per-user timezone preference for API datetimes User profiles now carry an IANA timezone preference, defaulting to Europe/Madrid. Datetime fields in API responses are serialized in the user’s chosen zone, and the dashboard Account page gains a searchable timezone selector that persists through the existing profile-update flow. Storage stays UTC end-to-end — only the wire serialization (and transitively, what the dashboard renders) reflects the preference. [Fixed + API] Oversized request bodies now return 400 instead of 500 Any v3 endpoint that reads a request body returned a 500 when the JSON payload exceeded the 20 MB limit (DATA_UPLOAD_MAX_MEMORY_SIZE): Django raised RequestDataTooBig while django-ninja read request.body during parameter resolution, before the view ever ran, so it could not be caught inside the endpoint. A global exception handler on the v3 API now returns 400 with {"detail": "Request body is too large. Maximum allowed size is 20 MB."}. The limit in the message is derived from the setting, so it stays accurate if the cap is tuned. [API] Rate limiting now enforced on the external API in production The external API’s rate limiter is now active in production. It had been silently disabled for roughly three months because RATE_LIMIT_ENABLED and REDIS_URL were added to the dev config but never propagated to prod. Callers exceeding the configured limits may now receive 429 responses.

[Breaking + API] Deprecated v2 datapoint and file-verification endpoints removedThe following v2 endpoints are gone. They were already marked deprecated and verified unreachable from the dashboard, the external API, the SDKs, and the cronjobs before removal:

PATCH /api/v2/.../datapoints/{id}/ and POST / DELETE /api/v2/.../datapoints/{id}/validation/
POST /api/v2/.../files/{id}/verify/

Action required: integrations that still target these paths must switch to the v3 replacements: POST /api/v3/extractions/{uuid}/datapoints/{verify,unverify} for verification toggles, POST /api/v3/extractions/{uuid}/datapoints/{edit,discard} for value edits and discards, and POST /api/v3/extractions/{uuid}/verify for whole-extraction verification.

[Fixed] Verify Document button now works on split + extract workflows The header Verify Document button used to be a permanent no-op on workflows that combine a Split node with an Extract: it targeted the file-level extraction, but extracted data on a split lives on per-split child extractions, so verifying the parent verified nothing. The button now targets the active split’s child extraction (and is labelled Verify Extraction on a split tab to match), and is disabled on parse, split, and validation tabs where there is nothing to verify. A split file is reported as verified only once every non-empty child extraction is verified — empty category tabs no longer wedge the rollup. Shift+Enter advances through unverified splits before moving on to the next file. [Improved + API] New verifier API surface; validator deprecated The “human-validation → verification” rename now reaches the file-level API. A new endpoint POST / PATCH /api/v2/.../files/{id}/verifier/ assigns and updates the verifier for a file, a verifier__in filter narrows file lists by assigned verifier, and ?include=verifier returns the assigned verifier in file responses. The legacy /validator action, validator__in filter, and ?include=validator option remain live for backward compatibility but are now flagged deprecated in the OpenAPI spec; integrations should plan the move. [Fixed + API] Non-member reads of workflow analytics now return 404 GET /api/v3/workflows/{id}/analytics/summary and GET /api/v3/workflows/{id}/analytics/timeseries previously returned 403 Forbidden to a caller outside the workflow’s organisation, leaking the existence of the workflow. They now return 404 Not Found, matching GET /api/v3/workflows/{id}. Members continue to receive analytics as before. [Fixed] New free accounts now see their signup credits on the Usage page The Usage page counted only purchased and granted credits, so a brand-new free account — whose only balance is its signup bonus — showed a zero balance and the empty-state screen instead of the credits it could actually spend. Signup credits are now included in both the headline total and the per-source breakdown.

June 9, 2026

[API] New v3 endpoints for row-level edits on object fields The v3 API now exposes positional row operations on object-field tables: add, delete, swap, verify-row, and verify-all. Rows are addressed positionally by (extraction_uuid, field_persistent_id, index) instead of an opaque datarow id, and the dense [0, n_live] invariant is enforced at the boundary (negative indexes now return 422 instead of a silent 404). The corresponding v2 actions on DatarowViewSet are now marked deprecated but remain live; integrations should plan to migrate. [Improved] Document parsing now retries transient LLM failures more reliably The parse pipeline now retries LengthFinishReasonError (model hit max-tokens, often non-deterministic) and detects silent parse failures where the LLM returned a 200 but the response could not be coerced into the schema. Both previously skipped straight to the fallback model on the first attempt; they now go through the configured retry budget on the primary model first. The per-page parsing token budget was also raised from 8K to 32K to reduce length-truncation failures, and ParseFailedError now carries the underlying per-page error detail. [Fixed] Lookup + Validate workflows now complete end-to-end Two independent bugs that broke the same Lookup + Validate workflow are fixed:

A Validate node downstream of an Extract node with lookup fields previously returned every rule as inconclusive (“No extraction data available to validate.”), because the validate step resolved its upstream to a synthetic lookup node and read a state key nothing ever wrote. It now resolves through smart-lookup hops back to the real producing extract.
The Gemini page-sum verification schema was missing a top-level title, which made model-build raise outside the per-task try/except and aborted the whole extraction whenever Phase-1 sum verification failed and Gemini verification was attempted.

[Improved] Faster admin pages, file listings, and date-filtered results Several database hotspots flagged by APM are now indexed and the corresponding queries rewritten to be sargable:

Extraction.uuid and File.name get btree indexes (built CONCURRENTLY), eliminating sequential scans on the hundreds of lookups per day they served.
A new composite FileCollection(workflow_id, created_at DESC) index backs the per-workflow collection listing.
created_at date filters across the admin dashboard, billing workflow-usage queries, and the public ResultsFilter are rewritten from (created_at AT TIME ZONE 'UTC')::date casts (which disabled the index, costing up to ~2s on a single filter path) to half-open created_at ranges. Behavior is unchanged because the deployment timezone is UTC.

[Breaking + API] Deprecated v2 datarow endpoints removedThe v2 DatarowViewSet actions marked deprecated in the same release as the v3 datarow rollout — create_empty, delete_by_position, swap, validate_row, validate_all — and their five /api/v2/.../files/{file_id}/datarows/ routes are now removed.Action required: integrations that still drive row-level edits through the v2 paths must migrate to the v3 surface: POST /api/v3/extractions/{extraction_uuid}/datarows/{add,delete,swap,verify-row,verify-all}.

[Improved] Per-category grid restores split-granular column filtering Workflows that include a Classify or Split node show their extracted data in the Per-Category grid in the data viewer. Column filters on that grid had been deliberately disabled because the previous file-paginated path compared against the wrong extraction. A new split-granular results endpoint backs the grid: a filter on an extracted field now narrows to exactly the matching split results, paginated and counted server-side, with filter and sort behavior matching the main file grid.

June 5, 2026

[Improved] Transient frontend request failures now auto-retry with backoff The web app retries idempotent React Query requests that fail with transient connectivity errors or 5xx responses, using capped exponential backoff (up to ~10s) and up to 3 attempts. Deterministic 4xx responses and unknown errors still fail fast. Connectivity errors are now detected via the Fetch spec’s TypeError instead of message matching, which cuts down on false-positive error reports during brief network blips. [API] Malformed Authorization: Bearer headers now return 401 Authorization headers with malformed bearer tokens (for example, an empty token or whitespace inside the token value) now return 401 Unauthorized instead of 500 Internal Server Error. The successful auth path and well-formed token-rejection responses are unchanged. [Fixed] Newly created workflows lay out their nodes with wider default spacing Newly created workflow nodes are placed with a larger default gap so the auto-layout reads more clearly out of the box and matches the spacing the rest of the editor assumes. [Fixed] Full-page loading spinner no longer jumps during initial load The full-page spinner stays anchored in place while the app boots, instead of shifting position as the surrounding layout settles in. [New] How credits work — new pricing & credits docs page A new Concepts → Account page, How credits work, documents per-operator credit costs (including Validate at 5 credits per rule per page), plan €/credit conversion, included credits, usage tracking, and tips for keeping cost down. The Usage & Billing page links to it in place of the previous “contact us for pricing” stub. [Improved] Monitoring over-time chart now plots on a true time axis The accuracy and confidence over-time chart on the Monitoring tab positions each extraction at its real timestamp on a time-scaled axis, instead of forcing points onto an evenly-spaced grid. Time-bucketing now adapts to the visible span, so long, dense histories coarsen gracefully while short spans stay at full resolution. Version markers sit at their exact creation time and merge into a single ranged label (for example v1.0–v1.2) when several versions ship the same day, and the empty states now distinguish between no rows, no reviewed extractions, and no recorded confidence scores. [Improved] Large workflow versions no longer time out when loading file counts Loading the file counts for a high-volume workflow version no longer tips over the request timeout. The count is now served by a single annotated aggregate backed by an index-only scan, replacing the per-file status scan that caused worker timeouts on workflows with many files. [Fixed] Filtering and sorting restored on smart-lookup columns Columns backed by a smart-lookup field keep their sort and filter controls in the data viewer while still showing the lookup indicator. Previously the custom lookup header replaced the entire header cell, hiding the filter button on those columns even though filtering still worked underneath. [Fixed] Overlapping bounding-box highlights no longer wash out the PDF page In the PDF visualizer, hovering a stack of overlapping bounding boxes now reads as a single translucent tint on both light and dark pages, instead of compounding toward opaque as more boxes overlapped. Boxes shared by many fields, such as table sections, also render more efficiently.

June 2, 2026

[Breaking + API] Monitoring summary total_extractions now counts processed extractions onlyGET /api/v3/workflows/{id}/analytics/summary previously counted every queued extraction in total_extractions, including in-flight rows that had not yet finished processing. It now counts processed extractions only — matching the verified, through, and per-field counts already returned by the same endpoint, and matching what the cold path (no version filter) has always reported.Action required: if your integration reads total_extractions from the summary endpoint and expects in-flight rows to be included, switch to a non-summary query or add the in-flight count from your own tracking. The Studio monitoring UI is unaffected.

[Improved] Monitoring page now reveals data progressively Each card on the Monitoring tab paints its title and controls immediately and shows a skeleton only for the body that is still loading, off the data source that feeds it. A fast endpoint can paint without waiting on a slower sibling, instead of the whole page sitting behind a single page-level skeleton. The time-series chart also now keeps daily resolution on 30-day ranges instead of collapsing into ~5 weekly points. [Improved] Faster workflow version analytics summary GET /api/v3/workflows/{id}/analytics/summary is significantly faster on high-volume workflow versions. Overall metrics, per-field metrics, verified counts, and throughput are now computed in two table scans instead of five, with no change to the response shape.

Technical detail

The endpoint previously routed through the paginated read path and ran the per-extraction aggregate twice plus a discarded pagination subquery. It now serves the summary from a dedicated read that combines the by-field GROUP BY and a single per-extraction COUNT(...) FILTER(...) aggregate, with the paginated read path unchanged for the timeseries and detail endpoints.

[Improved] Faster workflow version analytics timeseries GET /api/v3/workflows/{id}/analytics/timeseries is now significantly faster on high-volume workflow versions, applying the same optimization as the summary endpoint. The response shape is unchanged.

Technical detail

The timeseries endpoint consumes only the per-extraction detail rows, but was still served by the full five-scan paginated read — computing the overall sum, per-field GROUP BY, and through-rate counts only to discard them. It now uses a detail-only read, and resolves the latest N extractions from the Extraction table (which owns processed_at) instead of grouping the multi-million-row analytics cache — roughly 34× faster on the top-N lookup, with existing indexes only and no migration.

[API] Deprecated combined /analytics endpoint removed The deprecated GET /api/v3/workflows/{id}/analytics route (flagged for removal when the frontend migration completed) is gone. Use GET /analytics/summary for per-version overall and per-field metrics, and GET /analytics/timeseries for the cross-version series. [Fixed] File validation status stays in sync with its datapoints When you confirm the last unverified datapoint on a file, the file badge now reliably flips to Validated, and when you unverify a previously-verified datapoint the file rolls back to Processed. Previously, the file-level validation status could drift from its datapoints — most commonly, the last cell confirmation would not flip the file badge. [Improved] Organization switcher now scales to many organizations The sidebar organization switcher caps at 10 visible organizations by default with a Show all expander, and adds a search box that fuzzy-matches organization names across your full list. Organizations you’re a direct member of are prioritized over support-access ones, and a per-user most-recently-used ordering surfaces the organizations you actually work in first. The current organization is always shown. [Fixed] Deep links to a file’s results now switch to the workflow’s organization Opening a workflow file via a direct URL or in a new tab (middle-click) now elevates the active organization to the one the workflow belongs to, matching click-through navigation. Previously a cold-loaded file results page kept the UI on your own organization, hiding the support-access banner and showing the wrong organization in the sidebar. [Fixed] Organization selector rows stay a readable size on long lists When you belong to many organizations, each row in the sidebar organization selector keeps a consistent, tappable height with visible spacing, instead of compressing to near edge-to-edge as the list grows.

June 1, 2026

[New] Workflow version selector The workflow page header now has a version selector (on the Studio and Monitoring tabs) so you can view any saved version of a workflow. Selecting a non-latest version on Studio disables editing with an explainer — only the latest version can be edited — while Monitoring scopes its analytics to the version you pick. [New] Per-version monitoring history The Monitoring tab now shows version history: the accuracy/confidence chart spans all versions with markers at each version change, a changelog panel lists what changed, and KPIs and by-field metrics can be scoped to a selected version.

Technical detail

The analytics endpoint was split into purpose-built reads — GET /analytics/summary?version_id= (per-version overall + per-field means and through-rate counts) and GET /analytics/timeseries?limit= (cross-version series) — and analytics are now paginated by extraction rather than by (extraction, field) row, so wide schemas no longer collapse the window. The old /analytics endpoint is deprecated but retained until the frontend migration completes.

[API] Python SDK now installs as anyformat The official Python SDK’s PyPI distribution has been renamed from anyformat-sdk to anyformat — pip install anyformat — mirroring @anyformat/sdk on npm. The import path is unchanged (from anyformat.sdk import Client).

Technical detail

The distribution starts at version 0.6.0 (the next free slot past the legacy anyformat==0.5.0 on PyPI). The internal CLI binary that previously occupied the anyformat console-script slot was renamed to af; from anyformat.cli import ... still works.

[Improved] Workflow data sub-tabs keep their structural entry points When a workflow has many data sub-tabs, the collapsed row keeps the leading tabs (Documents, Split/Classify, the first categories) and pins the selected tab to the end, with the overflow in a “+N” menu — instead of hiding everything but Documents and the active tab. Rows with five or fewer sub-tabs show all of them.

May 2026

May 28, 2026

[Changed] Validator nodes now consume credits ValidateNode previously ran for free. Each rule in a validator now costs 25 credits, billed once per extraction. The bill scales with the number of rules in the validator, not with the number of pages or files in the document packet.

Technical detail

A new validate operator (Credits(25)) was added to DEFAULT_OPERATOR_PRICES. The internal billing dispatch was reshaped from (item) -> list[OperatorName] to (item, page_count) -> list[LineItem], so non-page-priced operators like validators can ignore the page count and substitute len(rules) as the billed unit. The processor now issues a single SpendCreditV4 call per extraction with all line items combined.

[Changed] Default operator prices realigned with the public pricing page Default credit prices for several operators now match anyformat.ai/pricing: Parse 25 (was 30), Extract 35 (was 20), Splitter 25 (was 10), Parse (Agentic) 100 (was 75), Extract (Agentic) 150 (was 75). Classify is now billed at 10 credits/page (previously a no-op — running a Classify node was free regardless of the published rate). Organizations with enterprise-specific overrides on OperatorPriceTable are unaffected. [Improved] Documentation restructured into Concepts / Guides / API Reference docs.anyformat.ai has been reorganized into three scoped tabs. Concepts holds definitional pages (Workflows, Schemas, Fields, etc.). Guides holds task-oriented walkthroughs (Quickstart, Build workflows, Recipes). API Reference holds the REST spec only. The four overlapping “create → run → results” walkthroughs are merged into one Quickstart with UI / curl / Python tabs at each step, and every extraction recipe now uses the canonical typed-graph workflow shape ({name, nodes, edges}) instead of the legacy {fields:[...]} shortcut. Old URLs redirect. [Improved] Post-signup /welcome page is now reachable only from the signup callback Loading /welcome directly (typed URL, bookmark, fresh tab) now bounces to /. The page is only reachable after a brand-new Auth0 signup callback, which closes a path that could let the upstream GA4 / GTM conversion event fire for already-signed-up users. [Improved] @anyformat/skill synced to npm 0.2.2 The @anyformat/skill package version in this repo is now aligned with what’s live on npm (0.2.2). The publish script uses token-based auth and bumps from max(local, registry) to prevent version collisions on the next release. [Improved] Release-PR generation now audits merged PRs upfront The /prod-release Claude Code command (used to open this release PR) now treats the merged-PR list as the ground truth and the changelog as a derived artifact. The audit step surfaced the gap that produced the entries below — previously, PRs merged without a changelog entry could quietly drop out of the release notes.

May 27, 2026

[New] Attach smart lookup files to fields in the workflow SDK The Python and JS SDKs can now mark a field as a smart lookup field and attach a reference file when creating a typed workflow. Flag the field with lookup and point it at a local CSV or text file; the SDK reads the file and uploads it inline, and extraction resolves that field’s values against your reference data. Lookup files are validated as UTF-8 text before anything is written — base64 payloads are never stored or returned.

Technical detail

POST /api/v2/workflows/ accepts a per-field lookup flag (mapped to source=smart_lookup) and inline base64 lookup_files. Files are pre-validated as UTF-8 text/CSV (with cp1252→UTF-8 transcoding); binary or non-text uploads are rejected with a 400 UNSUPPORTED_LOOKUP_FILE before any S3 writes. Only S3-backed lookup file references are persisted; base64 payloads and URIs are never stored or echoed back.

[Improved] Excel workbooks now convert to one Markdown page per sheet Excel→Markdown conversion now emits a separate page for each worksheet, in workbook order, instead of concatenating every sheet into a single page. Multi-tab spreadsheets now classify and process per tab, keeping each sheet’s content distinct downstream. [New] Official Python SDK now available on PyPI pip install anyformat-sdk — the anyformat-sdk package is now live on PyPI. Use the fluent builder to author, create, run, and await workflows end-to-end without writing HTTP code; sync and async clients are included, with typed errors (BadRequest, Unauthorized, Forbidden, NotFound, RateLimited, ServerError, SDKTimeout) and a .raw escape hatch on every response. The Claude Code skill at @anyformat/skill now documents the Python SDK alongside the JS/TS one. [New] Post-signup /welcome landing page Brand-new signups now land on a dedicated /welcome page (instead of going straight to /), giving marketing a stable URL for Google Ads conversion configuration via GA4 / GTM URL match. Returning users continue to land on / (or the originally-requested URL) as before. [Fixed] New signups for an already-registered email address are now rejected Across every API write site that can create a User row (Auth0 just-in-time provisioning, the E2E test-auth endpoint, the create_user management command), case-insensitive duplicate emails are now refused before a second row can be created. Auth0 email rotation that would collide with an existing user is also refused. Previously, mixed-case variants of the same address could silently mint duplicate users. [Fixed] Resolved four frontend dependency vulnerabilities qs (DoS in stringify), tmp (path traversal), uuid (missing buffer bounds check), and js-cookie (prototype hijack) advisories are now resolved via package overrides in anyformat/services/frontend. Upstream parents (cypress, exceljs, segment-analytics) hadn’t yet adopted the patched versions, so overrides were the minimally-disruptive path. [Improved] Authenticated images are no longer refetched multiple times per page load The sidebar mounts the organization logo from three places, which previously meant three identical requests to /api/v3/iam/organizations/{id}/logo/ on every page load. The image hook now uses TanStack Query so concurrent subscribers share one in-flight request and the result is cached for 5 minutes. [Improved] Local development supports both localhost and 127.0.0.1 for OAuth For developers running the stack locally, both http://localhost:<port>/callback and http://127.0.0.1:<port>/callback are now valid OAuth callback URIs and CORS / CSRF origins. Previously only one was registered per environment, which caused login failures if the dev frontend was opened at the other hostname. [Improved] JS SDK build toolchain refreshed esbuild (0.21.5 → 0.27.7) and vitest (2.1.9 → 4.1.7) in the JS SDK package have been updated. No behavioral changes to the SDK itself.

May 26, 2026

[Improved] Higher monthly allowance on the Business plan The Business plan’s monthly credit grant has increased to 500,000 credits. [Changed] Free tier now gets a one-time 50,000-credit signup grant instead of a monthly allowance Free organizations no longer participate in the monthly credit rotation. Each new free org now receives a single 50,000-credit signup grant that never expires — generous enough to evaluate the product, finite enough to make upgrading the natural next step. Paying tiers (Business, Enterprise) keep their recurring monthly allowance unchanged. [Changed] Stripe top-up pricing is now tier-aware Top-up checkout rates depend on the organization’s tier: free orgs pay €1.50 per 1,000 credits (€0.0015 / credit), Business and Enterprise orgs pay €1.00 per 1,000 credits (€0.001 / credit). Top-up amounts are now denominated in credits (minimum 30,000) instead of euros.

May 25, 2026

[New] Visualize parse layout with result.parse.draw() The Python SDK can now render a document’s parsed layout as an image overlay. Call result.parse.draw() to get the page with detected blocks drawn on top — useful for debugging extraction and verifying bounding boxes. A new cookbook example walks through it. [Changed] Simplified parse configuration The engine (Fast/Performant), effort, and visual_grounding parse knobs have been removed from Studio and the workflow SDK. engine was a no-op (both options mapped to the same model), and the other two now use sensible built-in defaults. figure_enhancement and mode remain configurable. Existing workflows are unaffected — stored values for the removed knobs are ignored gracefully. [New] Install the anyformat Claude Code skill from npm The anyformat Claude Code skill is now published on npm as @anyformat/skill. Run npx @anyformat/skill to install it — Claude Code can then author, run, and inspect anyformat workflows directly from your editor using your ANYFORMAT_API_KEY. Public docs include a new “Claude Code Skill” section under the SDKs page. [Fixed] Classify and split workflows authored via the Python SDK now route to extraction correctly Workflows built with the Python SDK’s classify(...) / split(...) builders no longer silently skip downstream extraction when a category or split rule is attached by name. The typed graph renderer now resolves branch ports by name (matching the schema’s edge labels) instead of internal IDs, so SDK-authored classify/split workflows produce the same extractions as Studio-authored ones. [Fixed] OCR text containing NUL bytes no longer fails extraction Extractions whose parse output contained a U+0000 (NUL) character inside a text value previously failed at the database write step, marking the entire batch as errored. The worker now strips NUL bytes from text at the parsing boundary so these documents complete normally. [API] Organization balance endpoint now returns a per-bucket breakdown The new organization balance endpoint reports credits broken down by their source bucket — monthly grant, top-ups, and each redeemed voucher — along with the organization’s tier and the monthly anchor day. Read-only by design: querying the endpoint never triggers a balance reset or recompute.

Technical detail

GET /api/v3/billing/organizations/{org_id}/balance/ returns { tier, monthly_anchor_at, monthly: {...}, topup: {...}, vouchers: [...], overdraft: {...} }. Pre-migration organizations return zeroed buckets with a stable shape. Membership is enforced; any role can read.

[New] New organizations receive a monthly credit grant on signup Newly created organizations now receive a monthly credit grant scoped to their tier, which resets on the organization’s anchor day each month. This replaces the previous one-time welcome top-up and makes ongoing free usage predictable instead of front-loaded. [Improved] Voucher credits now expire after a configurable lifetime Vouchers can now carry a credit_lifetime_days setting (default 90 days). When a voucher is redeemed, the granted credits expire after that many days, so promotional credits don’t sit in a wallet indefinitely. Existing vouchers continue to behave as before until the new field is set. [Fixed] Signup voucher hint shows voucher value only, in your locale The voucher preview on the create-organization and join-or-create pages no longer double-counts the welcome grant — a 5,000-credit voucher now renders as “5,000 credits” instead of being added on top of the standard signup grant. The number is also formatted using the browser’s locale instead of being hardcoded to German formatting.

May 22, 2026

[New] Official Python SDK and afx CLI The anyformat-sdk package wraps the workflow API with sync and async clients so you can author, create, run, and await a workflow end-to-end without writing HTTP code. Typed errors (BadRequest, Unauthorized, Forbidden, NotFound, RateLimited, ServerError, SDKTimeout) make integration code easier to write correctly, and a .raw escape hatch on every response exposes the underlying JSON when you need it.

Technical detail

from pathlib import Path
from anyformat.sdk import Client
from anyformat.workflow import Schema

client = Client(api_key="...")
workflow = (
    client.workflow("Invoices")
        .parse(mode="agentic")
        .extract([
            Schema.string("vendor", "The company that issued the invoice"),
            Schema.string("total",  "The total amount due, including currency"),
        ])
        .create()
)
run = workflow.run(file=Path("invoice.pdf"))
result = run.wait(timeout=300)

print(result.fields["vendor"].value)

AsyncClient mirrors the sync surface for async codebases.

[New] afx CLI for one-shot parse and extract The SDK ships an afx CLI for quick command-line use. Run afx parse --file invoice.pdf to get markdown, or afx extract --file invoice.pdf --field "vendor:vendor company" --field "total:total amount" to extract structured fields. Pass --schema fields.json to load a richer schema, --json for machine-readable output, and -o/--output FILE to write results to disk.

Technical detail

export ANYFORMAT_API_KEY=sk_...
afx parse   --file invoice.pdf
afx extract --file invoice.pdf --field "vendor:vendor company" --field "total:total amount"
afx extract --file invoice.pdf --schema fields.json --json -o results.json

Spinners, step breadcrumbs, and the Markdown/table renderers all detect TTYs and stay quiet when output is piped or --json is set, so the CLI is safe to script.

[New] Workflow versions are now semver-numbered Workflow versions now display as v1.0, v1.1, v2.0 across Studio, version history, and the save toast. Saving a workflow only mints a new version when it actually changes — major bump on graph topology or node config changes, minor bump on extraction-schema changes, and no bump at all when you only move nodes on the canvas or reorder columns in the data viewer. [API] Save layout and reorder fields without minting a new workflow version New workflow-centric write endpoints separate cosmetic edits from material ones. Layout drags and column reorders no longer create new versions or break stable links to a specific version of your workflow. The legacy v2 write endpoints continue to work but now respond with Deprecation: true.

Technical detail

PATCH /api/v3/workflows/{id}/config/ — save (may bump major or minor)
PATCH /api/v3/workflows/{id}/layout/?version_id=X — canvas position only (never bumps)
PATCH /api/v3/workflows/{id}/field-order/?version_id=X — column reorder only (never bumps)

Concurrent edits surface a 409 with latest_version_id in the response body so your client can rebase without losing local edits. The deprecated v2 endpoints (PUT /api/v2/.../workflows/{id}/config/ and PATCH /api/v2/.../workflows/{id}/reorder-fields/) still function during the transition window.

[New] npx anyformat-skill installs the anyformat Claude Code skill If you use Claude Code, you can now install the anyformat skill with a single command: npx anyformat-skill drops it into ~/.claude/skills/anyformat, or npx anyformat-skill --project installs it into the current project. The skill teaches Claude how to drive the typed-graph workflow API end to end — create a workflow with nodes and edges, upload a file, run extraction, and poll for results. [Fixed] Multi-tab login no longer fails with a code-verifier error Logging in from two browser tabs at once could clobber the shared PKCE code verifier and surface as a generic “Login failed” page. The callback now silently retries once on this race and lands you on your original destination URL.

May 21, 2026

[Improved] Cleaner parsed markdown — no structural metadata wrappers Parsed markdown saved to S3 and returned from the parse-result endpoints no longer carries <DOCUMENT> framing or <section data-bbox=…> wrappers. Block boundaries are now expressed as inert <a id></a> anchors that render as nothing in any markdown viewer, and page boundaries are joined with blank lines. Bounding boxes, page numbers, and confidences remain available via the structured results endpoints — they just stop leaking into the human-readable file body. [New] Modelo 200 workflow template (Spanish corporate income tax) A new “Modelo 200” template lands in the workflow gallery, oriented around credit underwriting and portfolio early-warning. It covers balance-sheet and profit-and-loss extraction with 45 root fields (liquidity, leverage, profitability, cash conversion, coverage ratios) plus two table fields for directors and principal shareholders. The template is pre-configured to use agentic parsing with Balanced effort. [Fixed] Nested table columns no longer dropped by the smart-table extractor Columns of a list-of-objects field (for example, items inside a product table) could previously be missed because the smart-table planner tried to extract them as standalone scalars. Nested fields are now correctly treated as per-row columns of their parent table and reach the final extraction every time. [Fixed] Editing object-table cells no longer loses focus mid-typing Validating a value in an object-field table sometimes stole keyboard focus from the input before you finished typing, occasionally swallowing characters. Cell focus is now preserved across both the validation round-trip and the background refetch, and screen-reader row labels stay in sync after row swaps. [Fixed] Parse → split workflows now return their splits Workflows that parse and split a document without a downstream extract were silently dropping the resulting splits. Terminal-splitter workflows now persist and return their splits as expected, surfaced via the file-list endpoint with ?include=splits.

May 20, 2026

[API] POST /v2/workflows/ now takes a typed graph; the old fields-only body is gone POST /v2/workflows/ now accepts the typed nodes + edges graph that previously lived at POST /v2/workflows/with_graph/. The legacy fields-only body and the with_graph path have been removed — there is one canonical workflow-create endpoint going forward.

Technical detail

The with_graph/ path no longer exists at api.anyformat.ai/v2/workflows/. Send the same payload ({name, description?, nodes, edges}) to POST /v2/workflows/. The legacy body {name, fields} is no longer accepted — wrap your fields in an explicit parse → extract graph instead. The legacy multipart/form-data upload path (creating a workflow from a downloaded json_file) has also been removed; rehydrate the workflow into the typed-graph body client-side.

[Changed] POST /v2/workflows/{id}/run/ response status is now "pending" instead of "success" The public run endpoint accepts requests asynchronously (202 Accepted — the extraction is enqueued, not complete). The previous "success" value conflated acceptance with completion; the new "pending" matches the vocabulary used by WorkflowRunListItem.status elsewhere on the v2 surface. Clients that polled for results regardless of this field are unaffected. Clients that branched on "success" to skip polling should now poll on "pending"; semantics match what was always intended. [New] Per-branch Validation tabs across workflow and file detail pages The Validate node now gets the same per-branch UX as Extract. The workflow page renders one Validation tab per branch alongside its existing Category and Extraction tabs, and the file detail page mirrors the same layout. Split workflows include a subdocument selector with chevrons and a 1/N counter, so you can step through per-subdocument validation results without leaving the tab. [Improved] Clear “Not validated” placeholders for rules added after extraction When you add or edit a validation rule, files that were processed before the change now render explicit “Not validated” cards and cells with a hover-card explanation, instead of an ambiguous “N/A”. This makes it obvious which results reflect the current rule set and which would need a re-run. [Improved] Redesigned Validate node configuration menu The Validate node menu now uses a tabbed Config / Rules layout aligned with the Classify menu, with a dedicated rule editor that supports inline field chips inside rule descriptions. The previous template-import button has been removed in favor of the new layout. [Fixed] Validation verdicts now populate for split workflows In workflows with a Split node, validation verdicts could come back empty in the file list and detail view because the API only inspected the parent extraction. Validation results now traverse parent and child extractions so per-subdocument verdicts surface correctly across the workflow page Validation tab, the file detail page, and the file results UI. [Fixed] CSV and master-file uploads decode reliably from non-UTF-8 encodings Uploaded CSV files and master files saved in cp1252 (Word/Outlook smart quotes, Spanish accents), Shift-JIS, GB18030, KOI8, and similar non-UTF-8 encodings could previously either silently produce mojibake or fail extraction with a Unicode decode error. Uploads are now transcoded to UTF-8 at the API boundary, with binary payloads disguised as text rejected up front, so downstream parsing and lookups receive consistent, correctly-decoded bytes.

Technical detail

The master-file upload endpoint and workflow CSV upload path now run inputs through a strict Western decoding ladder (utf-8-sig → utf-8 → cp1252) before falling back to charset detection, and persist the re-encoded bytes to storage with a truthful Content-Type: …; charset=utf-8 header. Allowed text extensions are csv, txt, md, markdown, and rst. Non-text payloads (PDF, ZIP, images, executables) are rejected at upload time rather than being mis-decoded.

May 19, 2026

[API] Fetch parsed blocks with bounding boxes via the new parse-result endpoint The new parse-result endpoint returns the structured page-nested blocks produced by parsing — including bounding boxes, type, confidence, reading order, and layout — along with a presigned URL for the raw markdown. Studio’s PDF viewer and JSON view now use these blocks to render overlays consistently across tabs.

Technical detail

GET /api/v3/files/{file_id}/parse-result/ returns { blocks: [...], markdown_url: "..." }. Blocks are grouped per page and carry a flat bbox: { x0, y0, x1, y1 }. The endpoint accepts an optional execution_id to scope results to a specific workflow execution.

[API] Slim split results without nested rows You can now request slim per-split results that omit nested object rows entirely, and then load nested rows lazily per split via a dedicated endpoint. Each slim split carries a per-split version token so caches can invalidate at the split granularity instead of the whole file.

Technical detail

Pass ?include=splits_lite on the file results endpoint to receive slim per-split entries (mutually exclusive with ?include=splits). Use GET /workflow-versions/{version}/files/{file_id}/splits/{split_id}/object-results/{field_persistent_id}/ to fetch paginated object rows scoped to a single split.

[Improved] Faster Studio category tab on workflows with many splits The Studio category tab now requests slim payloads and fetches nested object rows lazily per split, so opening a file with many splits no longer waits on full results materialization up front. Bounded prefetching when hovering across many cells also keeps the page responsive. [Improved] Parse JSON view shows the full block on expand Expanding a block in the Parse → JSON tab now reveals the complete block object — id, bbox, confidence, content, hyperlinks, rows, and image_base64 — rendered as inert JSON. Pages are grouped under collapsible sticky headers (collapsed by default) for easier navigation in multi-page documents, and the displayed shape mirrors the public parse-result response exactly. [Improved] Client-side markdown hydration with embedded images The Markdown tab now hydrates images directly in the browser using each block’s bounding box — including figure blocks — so the in-tab view and Markdown downloads stay consistent. The dedicated server-side hydration pipeline has been retired in favor of this lighter approach. [New] “Beta” badge on Studio nodes still being stabilized Studio nodes can now display a BETA badge in the sidebar, on the canvas header, and in the sidebar hover card. The Validate node is the first to use it, marking it as still being stabilized for general production use. [Fixed] Workflows created via the typed graph endpoint now run extraction Workflows created via the typed with_graph create endpoint sometimes failed to run extraction and surfaced as EXTRACTION_FAILED (422) because the extract node was persisted without an engine. New workflows now always include an engine and execute end-to-end as expected.

Technical detail

POST /v2/workflows/with_graph/ now persists a default engine on every extract node it creates.

[Fixed] Object rows in sibling extract branches no longer collide When a workflow had sibling extract branches that defined fields with overlapping names, object rows from one branch could be written against the wrong field. Field schema resolution is now scoped per extract node, so each branch’s rows land where they belong. [Fixed] Parse results reliably appear in the UI for new workflow versions Some parsed files showed missing pages or stale parsed_at because parse callbacks resolved graph nodes by an identifier that wasn’t unique across workflow versions. Callbacks now scope to the active workflow version, so parse results are persisted to the correct file every time. [Improved] Other improvements and fixes Deleted object rows are now consistently excluded from both per-split and file-level row counts; the file results UI deduplicates parse-result requests when navigating between files; and extraction progress estimates fall back to organization-scoped percentiles instead of cross-tenant numbers.

May 18, 2026

[New] Handwritten text is now a first-class block type in agentic parsing Agentic PDF-to-Markdown now recognizes handwritten text as its own block type, with a dedicated extraction strategy that combines a vision-language transcription with OCR-derived hints. Handwritten blocks are labeled “Handwritten text” in the Studio UI, distinct from machine-printed text, and can be tuned independently from regular text extraction. [Improved] More consistent agentic parsing when text-class blocks have multiple sources When a text-class block has multiple candidate text sources, the agentic parser now writes the chosen source back into the block’s elements and regenerates its segments before rendering. Downstream grounding (overlays, citations) now matches the markdown that’s actually rendered.

May 14, 2026

[New] “Beta” badge on agentic parsing mode The Parse node mode card now shows a “Beta” badge next to Agentic parsing, making it clear that this parse mode is still being stabilized for general production use. [New] Calibrated confidence scores on smart-table extractions Both table cells and scalar fields produced by the smart-table extractor now carry a calibrated 0–100 confidence, so review UIs and downstream consumers can prioritize low-confidence values. The scores degrade gracefully when scoring is unavailable, so extractions still complete normally.

Technical detail

Confidence is surfaced on each extracted value’s metadata.confidence field as an integer in [0, 100]. Both judges flip together via the smart_table_confidence_strategy setting (auto / judge / none); the default is on.

[New] Review threshold filter for file results File results now include a confidence-threshold slider that filters Fields and JSON to show only extractions at or below the selected confidence — useful for prioritizing low-confidence rows that need human review. The threshold is persisted across reloads and tabs, and an inline hover card explains how the filter works. [Improved] Image previews use the PDF viewer toolbar Image uploads (JPEG, PNG, GIF, BMP, TIFF) now render through the same viewer as PDFs and inherit its zoom, rotate, and full-page controls. Multi-page TIFFs gain a page selector, and the viewer now reports specific errors for unsupported formats or images that exceed browser size and canvas limits instead of failing silently. [Improved] More reliable dense-table parsing in agentic mode Agentic PDF-to-Markdown table parsing no longer stalls on dense tables. Reasoning-token budgets are now properly honored across the heal, audit, and sub-region passes; partially valid heal plans are applied operation-by-operation instead of being discarded on a single bad op; and the table-parsing token cap has been raised so large tables are no longer truncated mid-document. Workflows that previously stalled for over 15 minutes now finish in roughly 3. [Improved] Organization logos stay cached when unrelated settings change The organization logo URL is now invalidated only when the logo bytes themselves change, not whenever any field on the organization is updated. Renaming the org, toggling support access, or any other non-logo edit no longer forces every client to re-download the logo. [Fixed] Bulk-download of selected files no longer 400s GET /api/v2/saas_manager/workflow-versions/{id}/results/ rejected the file__id__in and file__id__not_in query parameters with Field 'id' expected a number but got '...' whenever the frontend passed UUIDs (the v2 file API now returns file id as a UUID string). The filter was still joining against the legacy integer PK; it now joins on the file UUID field, matching the regular file filter. Every bulk-download flow that relies on row selection — and the “select all minus N” pattern — is unblocked. [Fixed] Stricter validation of logo and workflow file uploads Logo and workflow file uploads are now validated against their actual bytes via magic-byte sniffing rather than the client-supplied Content-Type header, and only file types the parse pipeline can actually process are accepted. Uploading a renamed executable, ZIP, or arbitrary JSON as a workflow file is now rejected at the API boundary instead of silently failing later in the pipeline. Stored objects also carry the correct Content-Type on the storage layer instead of being labeled binary/octet-stream. [Fixed] Switching organizations on a resource page no longer reverts Switching the active organization from a workflow detail page (or any other resource page deep-linked from a different org) could occasionally snap back to the original org after showing a “switched” toast. The selection is now preserved on the first click. [Fixed] Object-field row counts exclude deleted rows The paginated object-field results endpoint and the slim workflow-results list both now exclude soft-deleted rows from page contents and the row-count total. Counts shown in the UI and returned by the API now match the rows you can actually see. [Fixed] Sub-table detail rows no longer clip at the bottom Expanded sub-tables inside object-field result tables previously cut off the bottom row because the row-height calculation didn’t account for the surrounding padding and border. The detail row now renders at full height regardless of how many sub-rows it contains. [Fixed] Classifier category tabs render again on workflow results Category tabs on classifier workflow results were rendering empty because the slim results response had stopped including the classifier’s graph_node_id. The field is back in the response (with a batched prefetch to avoid per-file queries), so files routed by a classifier now appear under their correct tab. [Fixed] Workflow list updates immediately after create, duplicate, or delete The home page workflow list previously kept the stale list cached after creating, duplicating, or deleting a workflow, so the change only appeared after a refresh. All three mutations now invalidate the workflow list cache and the list reflects the change right away.

May 13, 2026

[Breaking + API] Workflow creation no longer accepts file uploadsPOST /api/v2/saas_manager/organizations/{org_id}/workflows/ previously accepted a multipart/form-data body to import a workflow from a JSON or XLSX template file. That entry point and its template downloads have been removed; the endpoint now accepts JSON request bodies only.Action required: if you were uploading a JSON or XLSX template to this endpoint, switch to sending the equivalent JSON body directly. The chat-driven and from-scratch flows in Studio are unaffected.

[New] Studio is now generally available The visual workflow editor (Studio) is now visible to every organization member from the workflow page and the new-workflow page. Write access is still controlled by your workflow role. The legacy schema editor at /workflows/edit/... has been removed — the Studio tab is now the single editing surface. [New] Row-level actions on object-field result tables Object-field result tables now expose a per-row dropdown with Validate row, Add row above, Add row below, and Remove row, plus inline up/down chevrons for adjacent reordering. The trailing-column header has a new Validate all rows action that promotes every datapoint in the field to human-verified in a single click.

Technical detail

New endpoints under /api/v2/datapoints/object-fields/{field_id}/:

POST .../rows/ (with an index in [0, n_live_rows]) inserts an empty row and shifts live siblings up by one in the same transaction.
POST .../rows/{a}/swap/{b}/ exchanges two adjacent live rows.
POST .../rows/{position}/validate/ bulk-promotes every datapoint in a row to human-verified.
POST .../validate-all/ does the same for every live row of the object field.

All four endpoints require Member role.

[New] Signup vouchers now give live feedback during onboarding When a user lands on the signup flow with a voucher code, the create-organization screen shows the total welcome credits they will receive (the standard signup grant plus the voucher’s value, fetched live from the backend) if the voucher is valid, or an amber “voucher no longer valid” hint if it is expired or unknown. Previously the page was silent in both cases.

Technical detail

GET /api/v3/iam/signup-vouchers/{code}/ returns a read-only preview of the voucher ({ code, additional_credits }) and never consumes a redemption — actual redemption still happens atomically inside organization creation. Returns 404 voucher_not_found for unknown codes and 410 voucher_expired for expired or exhausted ones. Responses set Cache-Control: no-store, private so admin edits to a voucher are reflected on the next page load.

[Improved] Workflow page list is dramatically faster The workflow page’s file list now uses a slim response that omits per-cell datapoint detail until you actually expand an object row. On a workload of 50 files with 2,000 line-items each, the list now returns in ~240 ms instead of ~6 s and the response payload is roughly 770× smaller. Object cells pre-warm on hover so expanding feels instant; edits to a datapoint refresh the expanded sub-table without a poll cycle.

Technical detail

GET /api/v2/saas_manager/workflow-versions/{id}/files/?include=results_lite returns scalar datapoints as { id, display, status } and object fields as { row_count }. Full sub-tables are fetched lazily through GET /api/v2/saas_manager/workflow-versions/{id}/files/{file_id}/fields/{field_persistent_id}/object-results/. The existing include=results (full) path is unchanged and still available for legacy integrators.

[Improved] Stripe top-up is now open to all users and confirms the purchased amount The Top up button on the Usage page now opens the Stripe checkout dialog for every user — the previous fallback to a mailto:sales@anyformat.ai link is gone. After a successful payment, the success page now states how many credits were added and tells the user they can start running workflows immediately. [Improved] Cleaner extraction on scanned PDFs The agentic PDF-to-markdown parser now honors the routing model’s verdict on which text source to trust (the embedded PDF text layer, OCR, or vision) for every block type, including miscellaneous blocks that previously ignored the verdict. On scanned PDFs where the embedded text layer is corrupted, blocks now render the clean OCR text instead of garbled bytes. [Fixed] Returning from Stripe checkout no longer fails authentication If you stayed on Stripe long enough for the short-lived access token to expire, returning to anyformat used to land on an authorization error screen — even though the payment had already succeeded on Stripe’s side. The app now silently refreshes the token in the background (and replays the first request that hits a 401), so the top-up success page renders cleanly even after a long checkout. [Fixed] Cross-tab refresh storm and support-elevation isolation Opening the app in multiple tabs no longer triggers refetch storms when you navigate inside one tab. As a separate fix, support agents who elevate into a customer organization stay elevated only in the current tab — new tabs default back to their own organization until they explicitly re-elevate. [Fixed] New users are immediately enabled after signup Users who created their own organization or requested to join an existing one were previously left in a disabled state until an admin reviewed their request, which blocked the create-org path entirely. Both onboarding paths now enable the profile immediately. Join-request approval is still required to gain membership in the target org. [Improved] Other improvements and fixes

Structured-output extractions on Amazon Nova models no longer occasionally fail with an empty-JSON parsing error.
The Usage page no longer renders the “Page consumption” summary box at the top — the cleaner credit-balance card now stands on its own.
Long organization lists on the join-or-create page no longer get clipped below the fold.

May 12, 2026

[Breaking + API] v2 file responses now use UUID hex for file IDsEvery v2 endpoint that previously returned a numeric file ID now emits a 32-character UUID hex string. The OpenAPI schema already declared these fields as UUID strings, so generated SDK clients are unaffected — but raw HTTP integrations that parsed the value as an integer, or that constructed URLs from cached numeric IDs, will need to update. Old integer-shaped URLs return 404.Action required: treat v2 file_id values as opaque strings, and refresh any cached file URLs that embedded integer IDs.

Affected surfaces

File detail, upload responses, and file-position (prev_id / next_id)
Extraction responses and the verification URL builder
File filters: id__in, id__not_in, file_id__in, file_id__not_in now resolve against UUID hex

[New] Delete rows from object-field result tables Object-field result tables now have a per-row trash action. Remaining rows are reindexed in place so curated positions stay stable and your verification carries over correctly when the file is re-extracted. Rows the model produced are soft-deleted (so accuracy metrics still reflect the model’s miss); rows you added yourself are removed entirely.

Technical detail

DELETE /api/v2/datapoints/object-fields/{id}/rows/{position}/. Deletion is transactional. A partial unique constraint allows deleted rows to retain their original index, and the result-building SQL filters soft-deleted rows.

[API] Workflow responses now include organization_id Workflow and workflow-version responses (including the nested versions collection on workflow detail) now expose a normalized 32-character hex organization_id. Useful when a client needs to identify which org owns a workflow without an extra round-trip. [Improved] Tax ID and billing address now collected during Stripe Checkout Top-up Checkout sessions now collect the customer’s tax ID and billing address before invoices are finalized. This unblocks invoice correctness in EU/UK jurisdictions. No action required for existing customers — the values are persisted on the next Checkout. [Improved] Smoother markdown viewer on long documents The “visual” markdown view is significantly snappier on long documents. Scroll-driven page detection is now frame-throttled and only fires on real page transitions, sections subscribe to fewer state changes, and highlight-scroll listeners share a single event bus instead of one per section. [Improved] Faster agentic PDF-to-markdown parsing Agentic PDF→Markdown parsing now parallelizes work per page — single-call vision blocks fan out and agentic blocks run in a small per-page pool, so page latency tracks the slowest block instead of the serial sum. Layout-routing calls were also tuned for lower latency. [Fixed] Org switcher no longer goes blank for orgs without a logo The organization switcher trigger now always renders the org’s initials as the base layer and only overlays the logo image once it loads successfully — so switching to an org without a logo no longer flashes blank. [Improved] Parse confidence is now reported even when the model omits logprobs Per-block parse confidence is now computed and stamped on the rendered markdown regardless of whether the parsing model returns token-level log probabilities. Downstream consumers can read data-parse-confidence consistently across all supported parsing providers.

May 11, 2026

[Breaking + API] v2 SaaS Manager list, create, and member endpoints now require an organization in the URLThe bare /api/v2/saas_manager/... list and create paths, and the legacy current organization shortcuts, have been removed. Workflow versions, file collections, member management, and suggestion routes must now include the org id in the URL. Suggestion endpoints additionally require Member role — Viewers get 403.Action required: template the org id into the URL for every v2 SaaS Manager request that previously relied on a bare list/create path or the current shortcut.

Affected paths

GET / POST /api/v2/saas_manager/workflow-versions/ → .../organizations/{org_id}/workflow-versions/
GET / POST /api/v2/saas_manager/file-collections/ → .../organizations/{org_id}/file-collections/
PATCH /api/v2/saas_manager/organizations/current/ → .../organizations/{id}/
PATCH /api/v2/saas_manager/organizations/current/logo/ → .../organizations/{id}/logo/
GET / POST /api/v2/saas_manager/organizations/current/members/ → .../organizations/{org_id}/members/
DELETE / PATCH /api/v2/saas_manager/organizations/current/members/{user_id}/[role/] → .../organizations/{org_id}/members/{user_id}/[role/]
POST /api/v2/suggestions/{fields,field-description,workflow-description,workflow-name,upload-sample}/ → .../organizations/{org_id}/suggestions/{...}/

[New] Self-service invoice downloads from the usage page Owners and admins can now open Stripe’s hosted Customer Portal from the Usage page to download invoices for past top-ups. Members, reviewers, and viewers don’t see the button — invoices are financial documents.

Technical detail

POST /api/v3/billing/organizations/{org_id}/portal-sessions/ mints a short-lived portal URL bound to the org’s billing customer record. Orgs that have never run a top-up return 409 stripe_customer_not_provisioned; the UI surfaces this as a “Top up first” toast. New environment variable STRIPE_BILLING_PORTAL_RETURN_URL falls through to ${FRONTEND_URL}/usage. Configure the operator side (Invoice History toggle, ToS / privacy URLs, default return link) in the Stripe dashboard.

[Improved] Cross-tenant probes on v3 detail endpoints now return 404 v3 file results, file-collection extraction status, file-collection results, and webhook delete now derive the active organization from the resource itself. URLs are unchanged. Cross-organization requests get a 404 to mask resource existence; same-org callers are unaffected. [Improved] Hovering a stacked bounding box highlights every overlapping box When several bounding boxes overlap, hovering reveals each one stacked under the cursor instead of just the topmost. Rapid mouse movement is collapsed to one diff per frame so the UI stays responsive. [Improved] Consistent extracted-object UI across Define, Refine, and File Results Object and table fields now render the same way across the schema builder, refine step, and file-results panel — same headers, same skeletons, same status states. Refine correctly shows error / cancelled copy on failed extractions instead of an empty data view.

May 9, 2026

[Breaking + API] Detail and per-resource v2 endpoints no longer read X-Current-OrgDatapoints, manual datapoints, extraction results, workflow-version files, file collections, workflow versions, and workflow detail endpoints now derive the active organization from the resource itself. Non-resource list/create endpoints (API key CRUD, manual-datapoint create, workflow list/create) moved under /api/v2/saas_manager/organizations/<org_id>/<resource>/.Action required: stop sending X-Current-Org to v2 endpoints — derive the org from the resource (URLs unchanged for detail) or from the new org-scoped URL for list/create. Non-members on a same-resource probe now get 404; in-org callers with insufficient role get 403.

May 8, 2026

[API] Create workflows with full graph definition in a single call You can now define a workflow’s parse, classify, split, and extract steps in one request — the workflow, its initial version, and all extraction fields are created atomically. The existing workflow-create endpoint is also fully typed, so SDKs get auto-complete and field-level validation.

Technical detail

POST /v2/workflows/with_graph/ accepts the complete graph plus per-extract-node fields and creates everything in a single transaction. The original POST /v2/workflows/ body is now declared with discriminated field types.

[New] Python builder for typed workflows The Python SDK now ships a fluent Workflow builder: .parse().classify(...).extract(...).split(...).build() produces a fully validated request. branch= and route_from= accept either an id string or the typed category/rule you registered earlier, so typos fail at the call site instead of at build time. [Improved] Workflow graph validation catches more errors before run Non-branching nodes with multiple outgoing edges, missing route_from on classify→split chains, and broken graph transitions are now rejected with clear validation errors instead of producing confusing runtime failures. The new typed graph payload also rolls back cleanly when persistence of one node fails partway. [New] Voucher codes on signup grant extra wallet credit Organization creation now accepts an optional voucher_code that grants extra wallet credit to the new org. Vouchers (expiry, usage cap, amount) are managed in the admin panel; invalid or expired codes surface as a clear 400 instead of breaking signup. [Improved] Doc-viewer table polish — resize, unified sync toggle, safer persisted state The object-results table is now vertically resizable with sensible bounds (header + one row min, all-rows-visible max). The PDF toolbar’s separate auto-focus and auto-scroll preferences collapse into a single “Sync document and results” toggle, and the lookup-files dialog no longer overflows. Co-mounted views in the same tab stay in sync, and corrupt persisted preferences self-recover instead of breaking the UI on load.

May 7, 2026

[Improved] Org isolation on v2 detail endpoints is now resource-derived Detail endpoints (datapoints, manual datapoints, extractions, workflow-version files, file collections, workflow versions, workflows) now authorize the caller’s role against the resource’s organization instead of a request header. Cross-tenant probes return 404; in-tenant callers with insufficient role return 403. URLs are unchanged. [Improved] More accurate extraction confidence via LLM judge For extractions where the model doesn’t return logprobs (or where you prefer the judge), a separate judge model now scores each extracted value against its source markdown and outputs a calibrated 0–100 confidence. Object and table cells are batched for cost efficiency. Choose the strategy per task with extraction_confidence_strategy = none | auto | judge. [Fixed] Login no longer spins on token-exchange failures The post-login callback now exits cleanly when token exchange or profile fetch fails: a 15-second watchdog renders an error page with a logout option, OAuth errors surface explicitly, and tokens are purged on error. [New] Bounding boxes for split + extract workflows The PDF viewer now renders bounding boxes for split + extract workflows by scoping mapped locations per panel — the matching split when navigating by partition, the file’s extraction otherwise. Parse, split, classify, and validation panels paint no boxes, as intended. [Fixed] Renamed split categories no longer lose historical sub-documents When you rename a split rule across versions (e.g. “Invoice” → “Bill”), historical sub-documents continue to surface under the renamed tab — the UI now filters by the stable split-node identifier instead of the display string.

May 6, 2026

[New] Top up credits with Stripe Checkout You can now purchase wallet credit through Stripe Checkout from the Usage page. Sessions are idempotent (no duplicate charges on retry), and every successful top-up generates an invoice you can download later from the Customer Portal.

Technical detail

POST /api/v3/billing/organizations/{id}/checkout-sessions/ validates the requested credit count against per-session min/max bounds and returns a hosted-page URL. Inbound checkout.session.completed webhooks are HMAC-verified and serialized to guarantee exactly-once credit grant under contention.

[New] File results panel with explicit processing states The per-file extraction panel now renders explicit Unprocessed, Processing, Error, and Processed states with run and retry actions. Heavy result queries only run once the file is actually processed, removing wasted requests and inconsistent loaders for unprocessed files. [New] Search inside the PDF viewer The PDF viewer now has a toolbar search bar with case-sensitive/insensitive matching, prev/next navigation that wraps, a Cmd/Ctrl-F shortcut, and cross-page highlighting. Search stays responsive on large PDFs thanks to debounced lazy text extraction. [API] Workflow results now expose classifications, splits, and flat extractions GET /v2/workflows/{wid}/files/{cid}/results/ now returns classifications, detailed splits (with per-file pages and partitions), and a flat extractions list keyed by (split_name, partition) alongside the existing fields. The v3 file-collection results payload exposes the same keys in place — no new endpoint required. [Improved] Org switcher orders by ownership, membership, then support access Organizations are now returned in three alphabetically-sorted buckets — owned, member, support access — replacing the prior database-discovery order. Helpful for accounts that belong to many orgs. [Fixed] Bulk extraction status no longer marks processed files as pending Bulk status resolution now aligns identifier formats and queries root extractions only, so processed parents are correctly reported instead of stuck on “pending”. [Improved] Other improvements and fixes File status badge in the file-results header, PDF toolbar staying visible on multi-line wrap, clearer toasts when a dropped file’s type isn’t accepted, and a fix for crashes on empty drag-and-drop selections.

May 5, 2026

[Improved] Smart Lookup retries based on residuals Smart Lookup now wraps the worker in an orchestrator + auditor loop that can iterate up to three times if residuals or tool-call signals suggest more work is possible. Search blends BM25 and Jaro-Winkler matching with a per-call query cap to improve recall while keeping runtime bounded. [Improved] Splits panel is now navigable Clicking a split or partition now switches the extraction tab, persists the selected partition in a URL search param, and scrolls the PDF viewer to the matching page. Refresh-safe deep links to a specific partition just work. [Fixed] Duplicating workflows with reused field names no longer fails Workflows that contain multiple extract nodes reusing the same top-level or nested field names can now be duplicated cleanly — field lookups are keyed per source node. [Fixed] Classify-only workflows handle unwired LLM categories When the classifier returns a category with no downstream edge, the run now falls through to the end instead of crashing. Classify-only workflows also persist classification records so results are available afterwards. [Improved] Smart-table list-of-object rows are page-ordered Smart-table list-of-object extraction rows are now sorted by source page and block, so output is stable and easy to align with the source document across runs. Rows with missing page info are placed last. [Improved] Cleaner empty states for splits and category grids Workflows with no rows now render a clean overlay in the splits and category grids instead of stale placeholders.

May 4, 2026

[Breaking] Audio uploads are no longer acceptedThe dropzone no longer accepts audio files (.mp3, .wav, etc.). Existing audio files already in your collections remain viewable and downloadable.Action required: if your integration uploaded audio for transcription, switch to a supported format. No deprecation window — audio uploads are rejected starting today.

[Fixed] Authentication errors now return the correct status codes Unauthenticated requests now return 401 (with a WWW-Authenticate header) instead of being coerced to 403. Missing email-claim or unknown-key authentication paths return AuthenticationFailed cleanly instead of 500 errors. [Improved] Profile-fetch failures show a clear error page Repeated /profile failures after login now surface an error page with a logout option instead of stranding users on a spinner. Global query errors are also reported with the failing endpoint for faster triage. [Fixed] Corrupted local preferences no longer break the UI on load Persisted UI preferences (sync toggles, panel sizes, etc.) are now safely parsed; corrupt keys are dropped on initial load and reported to monitoring with the offending storage key.

April 2026

April 30, 2026

[New] New organizations get 5,000 free credits Every new organization is now automatically granted 5,000 credits on creation, recorded as a standard billing transaction.

April 28, 2026

[Breaking + API] Split confidence is now a 0–100 integerSplitRange.confidence is now an integer in [0, 100] end-to-end (graph output, backend validation, database, and frontend rendering) instead of a 0–1 float. Existing values were migrated automatically.Action required: if your integration reads confidence from split results, expect integers (87) instead of floats (0.87). Confidence badges no longer render 0.87% or 8700%.

[Breaking + API] Workflow results response shape is now canonicalGET /v2/workflows/{wid}/files/{cid}/results/ now always returns {collection_id, verification_url, parse, extraction} — parse-only workflows return extraction: null instead of omitting the key. The internal field_id no longer leaks in responses.Action required: if your integration relied on extraction being absent for parse-only workflows, check for null instead. Cookbook examples and the API reference have been rewritten to match.

[New] Classify nodes return evidence and confidence Classify nodes now return a structured result with category, evidence, and confidence. The UI surfaces confidence and evidence in the category data viewer, with a dedicated Classify tab at workflow level and a per-file Classify tab with verdict cards. The new classifications include option exposes the same data via the file API. [New] HTML files render inside the document viewer You can now upload .html and .htm files and view them in the document viewer. Files render inside a sandboxed, sanitized iframe to block script execution and same-origin access. [API] Better OpenAPI for SDKs and try-it surfaces POST /v2/workflows/ now declares its JSON request body, so SDKs and AI agents stop falling back to raw extra_body dicts. Bearer authentication is now declared as a proper API-key security scheme, so try-it panels render the lock icon and treat auth as required. Webhook responses now expose url as an HttpUrl and created_at as a datetime. [Improved] Smart-table search now reaches inside table cells Smart-table search now scans both page prose and table cells, returning cell matches with {table, row, column, value} provenance. Scalar values inside tables are no longer dropped behind prose hits. [Improved] Per-category accuracy and confidence charts in monitoring The monitoring tab now groups per-field accuracy and confidence charts into per-category sections for multi-schema workflows, with sensible fallbacks and an improved empty state. [Improved] Validate view now shows results for classify-routed files Per-file validation now displays extracted data for files routed via Classify by synthesizing entries from the file’s results when pinned to an extract node. The partition selector and sibling tabs are hidden when they don’t apply. [Fixed] Improved processing of .docx and .msg uploads .docx, .msg, and other non-PDF uploads now hydrate from their converted PDF instead of feeding the original binary into the PDF parser. Markdown for these formats is now correct and no longer triggers spurious “corrupt PDF” warnings. [Improved] Other improvements and fixes Reliable PDF callback uploads via short-lived presigned URLs (no more “Request Too Big” errors on large rendered PDFs), file-detail columns in splits and per-category tables, search_text helper exposed in the smart-table sandbox, object/list extraction fields no longer dropped from v3 results, timeouts and retries on frontend list requests, lighter /files list payloads, and eval and XFA disabled in the PDF viewer for a smaller attack surface.

April 27, 2026

[Fixed] Smart-table extraction no longer drops scalar fields Smart-table extraction plans now must assign every schema field to either a table or a scalar_fields list, rejecting incomplete plans so scalar fields can no longer be silently dropped. [Fixed] Master-file + extract workflows now run without error Non-split workflows that combine a master-file node with an extract node now run cleanly — master-file nodes can augment an extract branch without triggering single-non-split-node validation.

April 24, 2026

[Improved] Tabbed data viewer on workflow detail Reworked the workflow detail data viewer into a tabbed layout with shared grid skeletons and improved loading behavior. Added an inline workflow-name editor and tighter studio reload/loader behavior when workflows change. [Fixed] Empty splits tab now shows a clean overlay The splits tab now uses a proper no-rows overlay (resolving the previous “Element type is invalid” error) and filters out splits with no results, so empty entries no longer appear. [Improved] More content captured during PDF parsing Text missed by layout segmentation now reaches downstream parsing through orphan OCR detection and synthetic text-block injection. Orphans are clustered into lines, merged into nearby blocks where appropriate, or promoted to new blocks; redundant smaller blocks are absorbed to prevent duplicates. [Fixed] Date parsing in the smart-table sandbox no longer silently returns None The sandbox now allows datetime.strptime’s runtime dependencies, so parsed dates are correctly returned instead of being silently converted to None. [Fixed] Smart-table no longer overwrites populated fields with empty values Storing an empty list can no longer overwrite an already-populated list field (e.g. a transactions table). Column-statistics summaries now clearly label examples and indicate additional unique values so the worker doesn’t mistake samples for full datasets. [Improved] Better table merging when headers differ Same-title tables with differing headers are disambiguated to avoid dropping rows during merge, and the block-parsing prompt and markdown assembly capture text the model sees that isn’t covered by any block id. [Fixed] Classifier results survive workflow amendments Category queries continue to return populated classifications and datapoints after prompt adjustments — the version-compatible result builder now maps datapoints across amended versions by their stable persistent identifier.

April 23, 2026

[Fixed] Partitioned splits now flow into Validate and Smart Lookup When a split rule declares a partition_key and the downstream extract is wired into a Validate or Smart Lookup, each partition is now processed independently — one validation pass per partition, one lookup pass per partition — and the results attach to the right per-partition child extraction. Previously the downstream node saw nothing. [Improved] Per-partition fan-out is now bounded Per-partition fan-out is now capped (default 4 parallel branches) so a 100-partition document running through extract → validate → lookup doesn’t saturate the model provider.

April 20, 2026

[Breaking] Smart Lookup is now part of ExtractThe standalone Smart Lookup node has been removed — lookup behavior now lives on the Extract node. You manage lookup files directly from the Extract menu, enable lookups per-field via a checkbox in the field editor, and see inline badges when lookup files are missing.Action required: none for most users — existing workflows migrate automatically and inherit Extract’s model settings. If you scripted Smart Lookup as a standalone node, switch to enabling lookup fields on Extract.

[Breaking] Smart Table is now an agentic mode on ExtractThe standalone Smart Table node has been removed — replaced by a Standard / Agentic mode selector on the Extract node.Action required: none for most users — existing Smart Table workflows migrate automatically to Extract with mode="agentic".

[New] Splits results tab in workflow results Added a Split tab in the extraction results UI showing split-group cards and a collapsible per-page table, plus a document_split_groups include option on the workflow-version file endpoint with category, partition value, page range, and confidence. [Improved] Redesigned Classify and Split menus Both nodes now share a Config / Options tab layout with validated card-based editors, uniqueness checks, reserved “Other” handling for Split, click-to-edit from the Config preview tables, and cleaner Save-from-header behavior that closes open forms first. [API] Split groups consolidated by category and partition value document_split_groups responses now group rows by (category, partition_value) and return consolidated page_ranges arrays instead of per-row page_start / page_end fields. [Improved] Better figure parsing in PDFs The figure-enhancement prompt now uses chart-type-specific strategies, handles multi-panel figures, gives statistical-annotation guidance, and applies column-header rules. Segment deduplication now preserves the largest picture blocks to avoid losing visual context. [Improved] Inline field-level errors in the field editor Replaced the top-level field-form error banner with inline, field-adjacent errors, column-level highlighting, and deduplicated summary messages for enum options and nested fields. [Improved] Default extraction model upgraded to gpt-5.2 The default extraction model is now gpt-5.2 across markdown and PDF extraction paths. [Fixed] OneDrive and Google Drive imports work reliably in production Fixed the cloud-connector reliability issues that affected OneDrive and Google imports in staging and production: a stricter content-security policy now allows the required SDK scripts and frames, SDK load errors surface instead of failing silently, consumer-vs-business OneDrive accounts are detected correctly, and a popup-polling leak is gone. [Improved] Other improvements and fixes Validation tab only appears when the workflow includes a Validate node and the file has an extraction; latest_version endpoint returns a clean 400 for invalid workflow UUIDs instead of a noisy 500; OAuth callback token exchange no longer stalls; Smart Lookup handles nested object fields correctly; agentic-extraction description typo fixed; partition tools only exposed when split rules define a partition key; null figure-parsing responses no longer crash; markdown viewer no longer has an infinite scroll loop and the markdown download button moved to the top of the panel.

April 14, 2026

[New] Redesigned Workflow Studio The Studio now ships redesigned Parse and Extract node menus, smart node placement with auto-connect, a way to add connected nodes from an output handle, empty-state guidance, an updated sidebar with hover cards, and save-from-header with per-node validation errors. [New] Validate node for cross-document rule validation A new Validate node type lets you express cross-document rules that run after extraction. [New] XML file support Added an XML-to-PDF converter for files with embedded base64 attachments, so XML files extract end-to-end. [Improved] API keys are hashed at rest, with multiple named keys per user API keys are now hashed at rest, support naming, and you can hold multiple keys per user — better security and easier rotation. [Improved] Per-cell parse confidence with logprob breakdowns Parse confidence now uses cell-level logprobs when available, with per-cell and per-row logprob breakdowns for table blocks. [Improved] Studio warns when leaving with unsaved changes The Studio now blocks tab switches when configuration is dirty or an inline editor is open, and preserves in-progress work across tab navigation. [Improved] Faster, paginated billing usage The billing usage table is now server-side paginated with summary cards, dramatically improving query performance on large datasets. [Improved] Visual grounding toggle for local-model compatibility Added a toggle to disable visual grounding for environments where the local model doesn’t support it. [API] Files nested under workflows; multi-node results; richer OpenAPI File endpoints are now nested under workflows. A new multi-node results endpoint returns results across multiple graph nodes in one call. The OpenAPI spec is now enriched for better SDK documentation, and the billing API gained pagination, filtering, and ordering. [Improved] Other improvements and fixes OCR confidence is now included in segment and table-cell metadata; the schema builder uses a local config store for faster field editing; toast colors, mobile sidebar, long descriptions in choice fields, lookup-file display overflow, and schema editor header sizing fixes; corrupt-PDF hydration errors are now downgraded to warnings.

April 6, 2026

[New] Agentic document splitter The batch LLM splitter has been replaced by an agentic workflow. You can now define custom split rules (name, description, optional partition key for grouping repeating sections) directly in the Studio SplitPages node, with a built-in “Other” catch-all rule. [Fixed] Non-figure blocks no longer receive figure wrapping Generic non-figure blocks are no longer treated as pictures, so only true figures receive screenshot crops and figure markup in PDF-to-markdown output. [API] Removed duplicate health check endpoint A duplicate health check endpoint has been removed from the API surface.

April 1, 2026

[Breaking + API] Workflow results endpoint always returns unified JSONGET /v2/workflows/{id}/results/ now always returns unified JSON (parse markdown plus extraction data per file). The output_format, as_lists, and file-filter query parameters have been removed; filter to a single file with the new file_id query parameter.Action required: if your integration relied on output_format, as_lists, or the legacy file-filter param, remove them and use file_id instead.

[Improved] Workflow usage endpoint supports pagination and ordering The workflow usage endpoint now supports server-side pagination (page, page_size), ordering, and database-level filtering with the standard results response wrapper. [API] v1 routes restored with deprecation headers v1 routes (workflows, jobs, uploads/results) are back with Deprecation and Sunset response headers and X-API-Key auth fallback for backward compatibility.

March 2026

March 30, 2026

[Improved] Rate limits split between submission and general endpoints Rate limits are now applied independently to file-submission endpoints (60 RPM) and general endpoints (600 RPM). Submission operations (run, upload, create file collection) have a stricter limit; read and management endpoints share a higher limit. The two tiers use independent counters.

March 26, 2026

[Breaking + API] Removed deprecated as_lists parameter from the v2 extraction endpointThe as_lists query parameter on the v2 extraction endpoint has been removed.Action required: drop as_lists from any v2 extraction call. The response shape is the unified default introduced earlier this month.

[New] Paste images into the workflow creation chat Paste images directly into the workflow creation chat for more contextual AI suggestions (up to 3 images, 3 MB each). [Improved] Concurrent edit protection on workflow saves Workflow configuration saves now detect concurrent edits and surface them, preventing accidental data loss when two users edit at once. [Fixed] Image uploads in workflow suggestions accept files within size limit Image uploads in workflow suggestions are no longer rejected when they’re within the allowed size limit. [Improved] Other improvements and fixes Workflow-name loading indicator no longer gets stuck; buttons now show the pointer cursor on hover.

March 25, 2026

[Breaking + API] v2 extraction is now collection-firstRun extraction now returns a file-collection UUID; poll results via GET /v2/files/{id}/extraction/ (412 while pending, 200 when ready).Action required: if your integration parsed extraction results synchronously from the run-extraction response, switch to polling the new collection-status endpoint.

[Breaking + API] Upload endpoint no longer accepts file_collection_idPOST /v2/files/ no longer accepts a file_collection_id parameter — collections are immutable after creation.Action required: create a new collection per upload batch instead of appending to an existing one.

[Breaking + API] Deprecated /jobs/ endpoints removed from the v2 APIThe deprecated /jobs/ endpoints are gone.Action required: use GET /v2/files/{id}/extraction/ to poll extraction state instead.

[Breaking + API] Simpler v2 pagination shapev2 pagination responses now contain count, page, page_size. The old total_pages, next, and previous fields are gone.Action required: compute next/previous page numbers from count, page, and page_size client-side, or rely on page boundaries (1 and ceil(count / page_size)).

[Improved] Upload multiple files in a single request POST /v2/files/ now accepts multiple files in a single request. [API] Customer-facing docs now show Bearer authentication Customer-facing API docs now reflect the Authorization: Bearer <token> authentication method. [API] v2 documentation reflects path-based versioning All API documentation has been updated for v2 endpoints with path-based versioning (/v2/ prefix).

March 19, 2026

[New] EML and MSG email file support You can now upload .eml and .msg (Outlook) email files for extraction. Embedded images in EML files are correctly inlined during conversion. [API] New allow_extend parameter to append rows on submit Submit now supports an allow_extend parameter that lets agents append rows to existing submissions instead of replacing them. [API] v3 types and graceful v2-to-v3 delegation Updated v3 types, v2 file endpoints now delegate to v3, and v3 errors are translated cleanly for v2 callers. [Improved] Better OCR text detection OCR text detection now uses word-count and spatial-coverage heuristics instead of simple truthy checks, and the screenshot is always sent to the model for context. [Improved] Smart-table resubmit replaces rows instead of skipping Smart-table resubmit now replaces the previous output instead of skipping, with scoped worker context and a consolidated briefing. [Improved] Per-user rate limits via API keys Authentication and rate-limiting are now separated, with per-user rate limits derived from API key identity. [Improved] HTTP security headers added per pentest findings Added HTTP security headers across the API per recent pentest recommendations. [Fixed] Bounding boxes align on pages with different dimensions Bounding boxes no longer drift on documents that mix page sizes. [Improved] Other improvements and fixes Table grounding improvements (HTML cell-id injection now gated and tuned), better login-page rendering with static assets loading correctly, cleaner full-page loader on the auth callback, and PDF viewer fixes for documents that mix page sizes (e.g. US Letter and A4).

March 18, 2026

[New] v2 API with versioned endpoints, pagination wrappers, and improved docs A full v2 API surface ships: versioned endpoints, pagination wrappers, workflow routes, list-extractions, provenance API, and richer OpenAPI documentation. [API] Bearer authentication alongside API key The API now supports the standard Authorization: Bearer header alongside the existing API key header. [API] API rate limiting The API now enforces per-endpoint rate limits. [Improved] Smart-table extraction planning with table-scoped delegates The smart-table extraction planner now uses table-scoped delegates, agent limits that scale with document complexity, and improved sandbox tools. [Improved] Better row-level grounding for tables Row-level grounding for tables is now more robust via token-overlap pre-mapping and accuracy fixes to anchor finders. [Improved] Faster page rotation A cascade decision engine with landscape support and optimized center-crop reuse makes page-rotation detection faster. [Improved] File metadata toggle and filtered counts A new metadata visibility toggle on the table header persists locally. File counts now reflect filtered results instead of selected rows, and action buttons show loading state. [Fixed] File-collection creation works in production A regression that returned 403 for all file-collection creation calls in production has been fixed. [Fixed] File download naming Markdown downloads no longer end up with doubled file extensions. [Improved] Other improvements and fixes Security update for vulnerable dependencies; a botched migration that assigned the same UUID to all files has been corrected.

March 5, 2026

[New] Raw JSON viewer alongside validated results Raw JSON extraction results are now available in a dedicated tab alongside validated results, for easier inspection. [Improved] Better document layout analysis Enhanced layout analysis for more accurate table and section detection. [Improved] Hover cards on simple field results Simple field results now display hover cards with additional context. [Improved] Resizable workflow creation chat The workflow creation chat panel is now resizable. [Improved] Refined validation result cards Validation result cards have refined styling that makes validated datapoints easier to identify at a glance. [Fixed] Confidence handling for undefined values Confidence scores no longer break when a value is undefined. [Fixed] Clipboard copy errors now show clear feedback Clipboard copy errors are handled and surfaced with clear feedback instead of failing silently. [Improved] Other improvements and fixes UI styling consistency improvements across result cards and the file viewer.

March 4, 2026

[New] Production-grade webhook system You can now create webhook subscriptions, receive real-time event notifications, and configure automatic retries on failure. [Improved] Better error categorization Error messages now use clearer categories, making it easier to understand and resolve extraction issues. [Improved] Schema editor auto-saves Fields and schema changes now auto-save when switching tabs or editing nested fields, preventing accidental data loss. [Improved] Validation page with keyboard navigation The validation page now has improved result cards with keyboard navigation and better loading states. [Improved] Faster OCR processing OCR text recognition is now significantly faster on large documents thanks to GPU-accelerated batch processing. [Fixed] Race conditions in field-form submission Race conditions in field-form submission and save-state management have been eliminated. [Fixed] Security vulnerability in a third-party dependency Fixed a security vulnerability in a third-party dependency.

February 2026

February 27, 2026

[Improved] Custom date range filter on the usage page The usage page now lets you filter by a custom date range instead of a fixed year, giving more precise control over billing and usage reports. [Fixed] Reverting to the original value marks fields as verified Editing a field value back to its originally extracted value now correctly marks it as human-verified rather than human-edited.

February 25, 2026

[Improved] Better OCR accuracy Upgraded text recognition with better support for Latin languages. [Improved] LLM cost tracking in billing Billing transactions now include per-extraction LLM token counts and costs for full cost transparency. [Improved] More accurate table extraction Improved accuracy when extracting data from tables with closely spaced rows or surrounding text. [Improved] Faster processing during peak usage Extraction now scales to handle high volumes, reducing wait times during busy periods. [New] Sample workflows on new accounts New accounts are now prepopulated with sample workflows and documents for a guided onboarding experience. [Fixed] Smart-lookup crash on empty corpus Smart lookup no longer crashes when the search corpus is empty. [Improved] Faster extraction callbacks and field-list endpoints Performance improvements on extraction callbacks and field-list endpoints. [Fixed] Readable file download filenames File download filenames are now readable by the frontend.

February 21, 2026

[Improved] Visual grounding 2.0 Completely revamped evidence highlighting with block-level and element-level precision, making it easier to trace extracted values back to their exact location in the source document. [New] Billing dashboard New usage dashboard with remaining credits, weekly summaries, per-workflow usage breakdown, and CSV export. [Improved] Advanced document layout analysis enabled in production The advanced document segmentation pipeline is now enabled in production for improved table and section detection. [Fixed] Bounding-box highlights no longer stuck after navigating fields Highlights are now cleared correctly when you move between fields. [Fixed] Confidence calibration Fixed calibration configuration for confidence scoring.

February 14, 2026

[New] Redesigned home page A personalized home with recent workflows, at-a-glance file-status indicators, and a streamlined workflow creation experience with templates. [New] Redesigned schema builder A new two-step Define & Refine flow with AI-powered field suggestions, inline file previews, and drag-and-drop uploads. [Improved] Faster page rotation About 2× faster page-orientation detection per page. [Improved] More reliable AI field suggestions Field suggestions now use an upgraded model for higher reliability. [Fixed] Smart-lookup settings saved on publish Smart-lookup settings are now correctly saved when publishing a workflow version. [Fixed] Workflow studio sidebar overflow Fixed sidebar overflow in the workflow studio.

February 7, 2026

[Improved] New dashboard visual design Refreshed visual design across the platform with updated components, colors, and layout. [Improved] Faster, more reliable evidence highlighting Evidence highlighting now uses an event-driven architecture, making interactions faster and more reliable across all views. [Improved] Files and organization logos served securely through the backend Files and organization logos are now served securely through the backend, improving security and simplifying local development. [New] AI workflow-name suggestions New AI-powered workflow-name suggestions based on your description and uploaded files. [Improved] File status counts in workflow listings Workflow listings now show at-a-glance counts of files by processing status. [Fixed] Document-element classification Fixed a classification regression that could cause incorrect layout detection in some documents.

February 3, 2026

[New] Smart Lookup New feature to enrich extracted data with external reference files using AI-powered matching. [Improved] Field source tracking Fields now track their source (extraction vs. smart lookup) for better traceability. [Fixed] Manual field creation processing Field data processing now correctly handles manually created fields.

January 2026

January 27, 2026

[New] Usage-based billing Introduced billing for data extraction operations with transparent per-page pricing. [Improved] Faster, more reliable field suggestions Field suggestions are now faster and more reliable when creating workflows. [Fixed] Field value upload formats Resolved an issue where certain field value formats could cause upload errors.

January 20, 2026

[Improved] Faster OCR processing Document processing speed improved with optimized text recognition. [New] Extraction review from the dashboard Administrators can now review extractions directly from the dashboard. [Improved] Organization credits management Added the ability to manage organization credits for usage billing.

January 13, 2026

[Improved] Faster document processing Significantly improved processing speed for large documents. [New] Filter results by row count Filter extraction results by the number of rows extracted. [New] Configurable per-workflow page rotation Configure custom page-rotation strategies per workflow.

January 6, 2026

[Improved] Enhanced results filtering on download Enhanced results download with additional filtering options.

December 2025

December 15, 2025

[Improved] Improved reliability Enhanced system stability and faster deployments.

December 1, 2025

[Improved] Faster results retrieval for workflows with many documents Optimized results retrieval for workflows that span many documents. [Improved] Detailed usage tracking Added more detailed usage tracking for extraction operations.

November 2025

November 24, 2025

[New] Create data rows from specific extractions You can now create new data rows directly from specific extractions.

November 17, 2025

[Improved] More accurate table extraction Improved accuracy when extracting data from complex tables. [API] New /user/is-enabled endpoint A new endpoint lets you check account status. [Improved] User info on file validation responses File validation responses now include the user who performed the validation.

November 10, 2025

[Improved] Better handling of long documents Better handling of documents with many pages via smart chunking. [Improved] Validation tracking Track which values have been manually validated or edited. [Improved] More accurate extraction Enhanced extraction accuracy with an updated default model.

November 3, 2025

[Improved] Better extraction on documents over the context limit Improved extraction quality for documents exceeding context limits. [Improved] Highest-confidence merge When merging extractions, the highest-confidence value is now preserved.

October 2025

October 27, 2025

[Improved] Performance improvements across the platform General performance improvements across the platform.

October 13, 2025

[Fixed] CSV export field ordering Fixed field ordering in CSV exports to match workflow configuration. [Improved] Consistent results ordering Extraction results now consistently respect field order.

October 6, 2025

[API] Endpoint to reorder fields within object-type fields A new endpoint lets you reorder fields within object-type fields. [Improved] Verification URLs on job results Job results now include verification URLs for easy access. [Fixed] Fields with similar names in nested structures Fixed handling of fields with similar names in nested structures.

September 30, 2025

[Improved] Redesigned extraction pipeline Redesigned extraction workflow for better reliability.

​July 2026

​July 7, 2026

​July 6, 2026

​July 3, 2026

​June 2026

​June 29, 2026

​June 22, 2026

​June 18, 2026

​June 15, 2026

​June 11, 2026

​June 10, 2026

​June 9, 2026

​June 5, 2026

​June 2, 2026

​June 1, 2026

​May 2026

​May 28, 2026

​May 27, 2026

​May 26, 2026

​May 25, 2026

​May 22, 2026

​May 21, 2026

​May 20, 2026

​May 19, 2026

​May 18, 2026

​May 14, 2026

​May 13, 2026

​May 12, 2026

​May 11, 2026

​May 9, 2026

​May 8, 2026

​May 7, 2026

​May 6, 2026

​May 5, 2026

​May 4, 2026

​April 2026

​April 30, 2026

​April 28, 2026

​April 27, 2026

​April 24, 2026

​April 23, 2026

​April 20, 2026

​April 14, 2026

​April 6, 2026

​April 1, 2026

​March 2026

​March 30, 2026

​March 26, 2026

​March 25, 2026

​March 19, 2026

​March 18, 2026

​March 5, 2026

​March 4, 2026

​February 2026

​February 27, 2026

​February 25, 2026

​February 21, 2026

​February 14, 2026

​February 7, 2026

​February 3, 2026

​January 2026

​January 27, 2026

​January 20, 2026

​January 13, 2026

​January 6, 2026

​December 2025

​December 15, 2025

​December 1, 2025

​November 2025

​November 24, 2025

​November 17, 2025

​November 10, 2025

​November 3, 2025

​October 2025

​October 27, 2025

​October 13, 2025

​October 6, 2025

​September 30, 2025

July 2026

July 7, 2026

July 6, 2026

July 3, 2026

June 2026

June 29, 2026

June 22, 2026

June 18, 2026

June 15, 2026

June 11, 2026

June 10, 2026

June 9, 2026

June 5, 2026

June 2, 2026

June 1, 2026

May 2026

May 28, 2026

May 27, 2026

May 26, 2026

May 25, 2026

May 22, 2026

May 21, 2026

May 20, 2026

May 19, 2026

May 18, 2026

May 14, 2026

May 13, 2026

May 12, 2026

May 11, 2026

May 9, 2026

May 8, 2026

May 7, 2026

May 6, 2026

May 5, 2026

May 4, 2026

April 2026

April 30, 2026

April 28, 2026

April 27, 2026

April 24, 2026

April 23, 2026

April 20, 2026

April 14, 2026

April 6, 2026

April 1, 2026

March 2026

March 30, 2026

March 26, 2026

March 25, 2026

March 19, 2026

March 18, 2026

March 5, 2026

March 4, 2026

February 2026

February 27, 2026

February 25, 2026

February 21, 2026

February 14, 2026

February 7, 2026

February 3, 2026

January 2026

January 27, 2026

January 20, 2026

January 13, 2026

January 6, 2026

December 2025

December 15, 2025

December 1, 2025

November 2025

November 24, 2025

November 17, 2025

November 10, 2025

November 3, 2025

October 2025

October 27, 2025

October 13, 2025

October 6, 2025

September 30, 2025