Documentation Index
Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt
Use this file to discover all available pages before exploring further.
May 2026
May 28, 2026
[Changed] Validator nodes now consume creditsValidateNode previously ran for free. Each rule in a validator now costs 25 credits, billed once per extraction. The bill scales with the number of rules in the validator, not with the number of pages or files in the document packet.
Technical detail
Technical detail
A new
validate operator (Credits(25)) was added to DEFAULT_OPERATOR_PRICES. The internal billing dispatch was reshaped from (item) -> list[OperatorName] to (item, page_count) -> list[LineItem], so non-page-priced operators like validators can ignore the page count and substitute len(rules) as the billed unit. The processor now issues a single SpendCreditV4 call per extraction with all line items combined.OperatorPriceTable are unaffected.
[Improved] Documentation restructured into Concepts / Guides / API Reference
docs.anyformat.ai has been reorganized into three scoped tabs. Concepts holds definitional pages (Workflows, Schemas, Fields, etc.). Guides holds task-oriented walkthroughs (Quickstart, Build workflows, Recipes). API Reference holds the REST spec only. The four overlapping “create → run → results” walkthroughs are merged into one Quickstart with UI / curl / Python tabs at each step, and every extraction recipe now uses the canonical typed-graph workflow shape ({name, nodes, edges}) instead of the legacy {fields:[...]} shortcut. Old URLs redirect.
[Improved] Post-signup /welcome page is now reachable only from the signup callback
Loading /welcome directly (typed URL, bookmark, fresh tab) now bounces to /. The page is only reachable after a brand-new Auth0 signup callback, which closes a path that could let the upstream GA4 / GTM conversion event fire for already-signed-up users.
[Improved] @anyformat/skill synced to npm 0.2.2
The @anyformat/skill package version in this repo is now aligned with what’s live on npm (0.2.2). The publish script uses token-based auth and bumps from max(local, registry) to prevent version collisions on the next release.
[Improved] Release-PR generation now audits merged PRs upfront
The /prod-release Claude Code command (used to open this release PR) now treats the merged-PR list as the ground truth and the changelog as a derived artifact. The audit step surfaced the gap that produced the entries below — previously, PRs merged without a changelog entry could quietly drop out of the release notes.
May 27, 2026
[New] Attach smart lookup files to fields in the workflow SDK The Python and JS SDKs can now mark a field as a smart lookup field and attach a reference file when creating a typed workflow. Flag the field withlookup and point it at a local CSV or text file; the SDK reads the file and uploads it inline, and extraction resolves that field’s values against your reference data. Lookup files are validated as UTF-8 text before anything is written — base64 payloads are never stored or returned.
Technical detail
Technical detail
POST /api/v2/workflows/ accepts a per-field lookup flag (mapped to source=smart_lookup) and inline base64 lookup_files. Files are pre-validated as UTF-8 text/CSV (with cp1252→UTF-8 transcoding); binary or non-text uploads are rejected with a 400 UNSUPPORTED_LOOKUP_FILE before any S3 writes. Only S3-backed lookup file references are persisted; base64 payloads and URIs are never stored or echoed back.pip install anyformat-sdk — the anyformat-sdk package is now live on PyPI. Use the fluent builder to author, create, run, and await workflows end-to-end without writing HTTP code; sync and async clients are included, with typed errors (BadRequest, Unauthorized, Forbidden, NotFound, RateLimited, ServerError, SDKTimeout) and a .raw escape hatch on every response. The Claude Code skill at @anyformat/skill now documents the Python SDK alongside the JS/TS one.
[New] Post-signup /welcome landing page
Brand-new signups now land on a dedicated /welcome page (instead of going straight to /), giving marketing a stable URL for Google Ads conversion configuration via GA4 / GTM URL match. Returning users continue to land on / (or the originally-requested URL) as before.
[Fixed] New signups for an already-registered email address are now rejected
Across every API write site that can create a User row (Auth0 just-in-time provisioning, the E2E test-auth endpoint, the create_user management command), case-insensitive duplicate emails are now refused before a second row can be created. Auth0 email rotation that would collide with an existing user is also refused. Previously, mixed-case variants of the same address could silently mint duplicate users.
[Fixed] Resolved four frontend dependency vulnerabilities
qs (DoS in stringify), tmp (path traversal), uuid (missing buffer bounds check), and js-cookie (prototype hijack) advisories are now resolved via package overrides in anyformat/services/frontend. Upstream parents (cypress, exceljs, segment-analytics) hadn’t yet adopted the patched versions, so overrides were the minimally-disruptive path.
[Improved] Authenticated images are no longer refetched multiple times per page load
The sidebar mounts the organization logo from three places, which previously meant three identical requests to /api/v3/iam/organizations/{id}/logo/ on every page load. The image hook now uses TanStack Query so concurrent subscribers share one in-flight request and the result is cached for 5 minutes.
[Improved] Local development supports both localhost and 127.0.0.1 for OAuth
For developers running the stack locally, both http://localhost:<port>/callback and http://127.0.0.1:<port>/callback are now valid OAuth callback URIs and CORS / CSRF origins. Previously only one was registered per environment, which caused login failures if the dev frontend was opened at the other hostname.
[Improved] JS SDK build toolchain refreshed
esbuild (0.21.5 → 0.27.7) and vitest (2.1.9 → 4.1.7) in the JS SDK package have been updated. No behavioral changes to the SDK itself.
May 26, 2026
[Improved] Higher monthly allowance on the Business plan The Business plan’s monthly credit grant has increased to 500,000 credits. [Changed] Free tier now gets a one-time 50,000-credit signup grant instead of a monthly allowance Free organizations no longer participate in the monthly credit rotation. Each new free org now receives a single 50,000-credit signup grant that never expires — generous enough to evaluate the product, finite enough to make upgrading the natural next step. Paying tiers (Business, Enterprise) keep their recurring monthly allowance unchanged. [Changed] Stripe top-up pricing is now tier-aware Top-up checkout rates depend on the organization’s tier: free orgs pay €1.50 per 1,000 credits (€0.0015 / credit), Business and Enterprise orgs pay €1.00 per 1,000 credits (€0.001 / credit). Top-up amounts are now denominated in credits (minimum 30,000) instead of euros.May 25, 2026
[New] Visualize parse layout withresult.parse.draw()
The Python SDK can now render a document’s parsed layout as an image overlay. Call result.parse.draw() to get the page with detected blocks drawn on top — useful for debugging extraction and verifying bounding boxes. A new cookbook example walks through it.
[Changed] Simplified parse configuration
The engine (Fast/Performant), effort, and visual_grounding parse knobs have been removed from Studio and the workflow SDK. engine was a no-op (both options mapped to the same model), and the other two now use sensible built-in defaults. figure_enhancement and mode remain configurable. Existing workflows are unaffected — stored values for the removed knobs are ignored gracefully.
[New] Install the anyformat Claude Code skill from npm
The anyformat Claude Code skill is now published on npm as @anyformat/skill. Run npx @anyformat/skill to install it — Claude Code can then author, run, and inspect anyformat workflows directly from your editor using your ANYFORMAT_API_KEY. Public docs include a new “Claude Code Skill” section under the SDKs page.
[Fixed] Classify and split workflows authored via the Python SDK now route to extraction correctly
Workflows built with the Python SDK’s classify(...) / split(...) builders no longer silently skip downstream extraction when a category or split rule is attached by name. The typed graph renderer now resolves branch ports by name (matching the schema’s edge labels) instead of internal IDs, so SDK-authored classify/split workflows produce the same extractions as Studio-authored ones.
[Fixed] OCR text containing NUL bytes no longer fails extraction
Extractions whose parse output contained a U+0000 (NUL) character inside a text value previously failed at the database write step, marking the entire batch as errored. The worker now strips NUL bytes from text at the parsing boundary so these documents complete normally.
[API] Organization balance endpoint now returns a per-bucket breakdown
The new organization balance endpoint reports credits broken down by their source bucket — monthly grant, top-ups, and each redeemed voucher — along with the organization’s tier and the monthly anchor day. Read-only by design: querying the endpoint never triggers a balance reset or recompute.
Technical detail
Technical detail
GET /api/v3/billing/organizations/{org_id}/balance/ returns { tier, monthly_anchor_at, monthly: {...}, topup: {...}, vouchers: [...], overdraft: {...} }. Pre-migration organizations return zeroed buckets with a stable shape. Membership is enforced; any role can read.credit_lifetime_days setting (default 90 days). When a voucher is redeemed, the granted credits expire after that many days, so promotional credits don’t sit in a wallet indefinitely. Existing vouchers continue to behave as before until the new field is set.
[Fixed] Signup voucher hint shows voucher value only, in your locale
The voucher preview on the create-organization and join-or-create pages no longer double-counts the welcome grant — a 5,000-credit voucher now renders as “5,000 credits” instead of being added on top of the standard signup grant. The number is also formatted using the browser’s locale instead of being hardcoded to German formatting.
May 22, 2026
[New] Official Python SDK andafx CLI
The anyformat-sdk package wraps the workflow API with sync and async clients so you can author, create, run, and await a workflow end-to-end without writing HTTP code. Typed errors (BadRequest, Unauthorized, Forbidden, NotFound, RateLimited, ServerError, SDKTimeout) make integration code easier to write correctly, and a .raw escape hatch on every response exposes the underlying JSON when you need it.
Technical detail
Technical detail
AsyncClient mirrors the sync surface for async codebases.afx CLI for one-shot parse and extract
The SDK ships an afx CLI for quick command-line use. Run afx parse --file invoice.pdf to get markdown, or afx extract --file invoice.pdf --field "vendor:vendor company" --field "total:total amount" to extract structured fields. Pass --schema fields.json to load a richer schema, --json for machine-readable output, and -o/--output FILE to write results to disk.
Technical detail
Technical detail
--json is set, so the CLI is safe to script.v1.0, v1.1, v2.0 across Studio, version history, and the save toast. Saving a workflow only mints a new version when it actually changes — major bump on graph topology or node config changes, minor bump on extraction-schema changes, and no bump at all when you only move nodes on the canvas or reorder columns in the data viewer.
[API] Save layout and reorder fields without minting a new workflow version
New workflow-centric write endpoints separate cosmetic edits from material ones. Layout drags and column reorders no longer create new versions or break stable links to a specific version of your workflow. The legacy v2 write endpoints continue to work but now respond with Deprecation: true.
Technical detail
Technical detail
PATCH /api/v3/workflows/{id}/config/— save (may bump major or minor)PATCH /api/v3/workflows/{id}/layout/?version_id=X— canvas position only (never bumps)PATCH /api/v3/workflows/{id}/field-order/?version_id=X— column reorder only (never bumps)
409 with latest_version_id in the response body so your client can rebase without losing local edits. The deprecated v2 endpoints (PUT /api/v2/.../workflows/{id}/config/ and PATCH /api/v2/.../workflows/{id}/reorder-fields/) still function during the transition window.npx anyformat-skill installs the anyformat Claude Code skill
If you use Claude Code, you can now install the anyformat skill with a single command: npx anyformat-skill drops it into ~/.claude/skills/anyformat, or npx anyformat-skill --project installs it into the current project. The skill teaches Claude how to drive the typed-graph workflow API end to end — create a workflow with nodes and edges, upload a file, run extraction, and poll for results.
[Fixed] Multi-tab login no longer fails with a code-verifier error
Logging in from two browser tabs at once could clobber the shared PKCE code verifier and surface as a generic “Login failed” page. The callback now silently retries once on this race and lands you on your original destination URL.
May 21, 2026
[Improved] Cleaner parsed markdown — no structural metadata wrappers Parsed markdown saved to S3 and returned from the parse-result endpoints no longer carries<DOCUMENT> framing or <section data-bbox=…> wrappers. Block boundaries are now expressed as inert <a id></a> anchors that render as nothing in any markdown viewer, and page boundaries are joined with blank lines. Bounding boxes, page numbers, and confidences remain available via the structured results endpoints — they just stop leaking into the human-readable file body.
[New] Modelo 200 workflow template (Spanish corporate income tax)
A new “Modelo 200” template lands in the workflow gallery, oriented around credit underwriting and portfolio early-warning. It covers balance-sheet and profit-and-loss extraction with 45 root fields (liquidity, leverage, profitability, cash conversion, coverage ratios) plus two table fields for directors and principal shareholders. The template is pre-configured to use agentic parsing with Balanced effort.
[Fixed] Nested table columns no longer dropped by the smart-table extractor
Columns of a list-of-objects field (for example, items inside a product table) could previously be missed because the smart-table planner tried to extract them as standalone scalars. Nested fields are now correctly treated as per-row columns of their parent table and reach the final extraction every time.
[Fixed] Editing object-table cells no longer loses focus mid-typing
Validating a value in an object-field table sometimes stole keyboard focus from the input before you finished typing, occasionally swallowing characters. Cell focus is now preserved across both the validation round-trip and the background refetch, and screen-reader row labels stay in sync after row swaps.
[Fixed] Parse → split workflows now return their splits
Workflows that parse and split a document without a downstream extract were silently dropping the resulting splits. Terminal-splitter workflows now persist and return their splits as expected, surfaced via the file-list endpoint with ?include=splits.
May 20, 2026
[API]POST /v2/workflows/ now takes a typed graph; the old fields-only body is gone
POST /v2/workflows/ now accepts the typed nodes + edges graph that previously lived at POST /v2/workflows/with_graph/. The legacy fields-only body and the with_graph path have been removed — there is one canonical workflow-create endpoint going forward.
Technical detail
Technical detail
The
with_graph/ path no longer exists at api.anyformat.ai/v2/workflows/. Send the same payload ({name, description?, nodes, edges}) to POST /v2/workflows/. The legacy body {name, fields} is no longer accepted — wrap your fields in an explicit parse → extract graph instead. The legacy multipart/form-data upload path (creating a workflow from a downloaded json_file) has also been removed; rehydrate the workflow into the typed-graph body client-side.POST /v2/workflows/{id}/run/ response status is now "pending" instead of "success"
The public run endpoint accepts requests asynchronously (202 Accepted — the extraction is enqueued, not complete). The previous "success" value conflated acceptance with completion; the new "pending" matches the vocabulary used by WorkflowRunListItem.status elsewhere on the v2 surface. Clients that polled for results regardless of this field are unaffected. Clients that branched on "success" to skip polling should now poll on "pending"; semantics match what was always intended.
[New] Per-branch Validation tabs across workflow and file detail pages
The Validate node now gets the same per-branch UX as Extract. The workflow page renders one Validation tab per branch alongside its existing Category and Extraction tabs, and the file detail page mirrors the same layout. Split workflows include a subdocument selector with chevrons and a 1/N counter, so you can step through per-subdocument validation results without leaving the tab.
[Improved] Clear “Not validated” placeholders for rules added after extraction
When you add or edit a validation rule, files that were processed before the change now render explicit “Not validated” cards and cells with a hover-card explanation, instead of an ambiguous “N/A”. This makes it obvious which results reflect the current rule set and which would need a re-run.
[Improved] Redesigned Validate node configuration menu
The Validate node menu now uses a tabbed Config / Rules layout aligned with the Classify menu, with a dedicated rule editor that supports inline field chips inside rule descriptions. The previous template-import button has been removed in favor of the new layout.
[Fixed] Validation verdicts now populate for split workflows
In workflows with a Split node, validation verdicts could come back empty in the file list and detail view because the API only inspected the parent extraction. Validation results now traverse parent and child extractions so per-subdocument verdicts surface correctly across the workflow page Validation tab, the file detail page, and the file results UI.
[Fixed] CSV and master-file uploads decode reliably from non-UTF-8 encodings
Uploaded CSV files and master files saved in cp1252 (Word/Outlook smart quotes, Spanish accents), Shift-JIS, GB18030, KOI8, and similar non-UTF-8 encodings could previously either silently produce mojibake or fail extraction with a Unicode decode error. Uploads are now transcoded to UTF-8 at the API boundary, with binary payloads disguised as text rejected up front, so downstream parsing and lookups receive consistent, correctly-decoded bytes.
Technical detail
Technical detail
The master-file upload endpoint and workflow CSV upload path now run inputs through a strict Western decoding ladder (
utf-8-sig → utf-8 → cp1252) before falling back to charset detection, and persist the re-encoded bytes to storage with a truthful Content-Type: …; charset=utf-8 header. Allowed text extensions are csv, txt, md, markdown, and rst. Non-text payloads (PDF, ZIP, images, executables) are rejected at upload time rather than being mis-decoded.May 19, 2026
[API] Fetch parsed blocks with bounding boxes via the new parse-result endpoint The new parse-result endpoint returns the structured page-nested blocks produced by parsing — including bounding boxes, type, confidence, reading order, and layout — along with a presigned URL for the raw markdown. Studio’s PDF viewer and JSON view now use these blocks to render overlays consistently across tabs.Technical detail
Technical detail
GET /api/v3/files/{file_id}/parse-result/ returns { blocks: [...], markdown_url: "..." }. Blocks are grouped per page and carry a flat bbox: { x0, y0, x1, y1 }. The endpoint accepts an optional execution_id to scope results to a specific workflow execution.Technical detail
Technical detail
Pass
?include=splits_lite on the file results endpoint to receive slim per-split entries (mutually exclusive with ?include=splits). Use GET /workflow-versions/{version}/files/{file_id}/splits/{split_id}/object-results/{field_persistent_id}/ to fetch paginated object rows scoped to a single split.image_base64 — rendered as inert JSON. Pages are grouped under collapsible sticky headers (collapsed by default) for easier navigation in multi-page documents, and the displayed shape mirrors the public parse-result response exactly.
[Improved] Client-side markdown hydration with embedded images
The Markdown tab now hydrates images directly in the browser using each block’s bounding box — including figure blocks — so the in-tab view and Markdown downloads stay consistent. The dedicated server-side hydration pipeline has been retired in favor of this lighter approach.
[New] “Beta” badge on Studio nodes still being stabilized
Studio nodes can now display a BETA badge in the sidebar, on the canvas header, and in the sidebar hover card. The Validate node is the first to use it, marking it as still being stabilized for general production use.
[Fixed] Workflows created via the typed graph endpoint now run extraction
Workflows created via the typed with_graph create endpoint sometimes failed to run extraction and surfaced as EXTRACTION_FAILED (422) because the extract node was persisted without an engine. New workflows now always include an engine and execute end-to-end as expected.
Technical detail
Technical detail
POST /v2/workflows/with_graph/ now persists a default engine on every extract node it creates.parsed_at because parse callbacks resolved graph nodes by an identifier that wasn’t unique across workflow versions. Callbacks now scope to the active workflow version, so parse results are persisted to the correct file every time.
[Improved] Other improvements and fixes
Deleted object rows are now consistently excluded from both per-split and file-level row counts; the file results UI deduplicates parse-result requests when navigating between files; and extraction progress estimates fall back to organization-scoped percentiles instead of cross-tenant numbers.
May 18, 2026
[New] Handwritten text is now a first-class block type in agentic parsing Agentic PDF-to-Markdown now recognizes handwritten text as its own block type, with a dedicated extraction strategy that combines a vision-language transcription with OCR-derived hints. Handwritten blocks are labeled “Handwritten text” in the Studio UI, distinct from machine-printed text, and can be tuned independently from regular text extraction. [Improved] More consistent agentic parsing when text-class blocks have multiple sources When a text-class block has multiple candidate text sources, the agentic parser now writes the chosen source back into the block’s elements and regenerates its segments before rendering. Downstream grounding (overlays, citations) now matches the markdown that’s actually rendered.May 14, 2026
[New] “Beta” badge on agentic parsing mode The Parse node mode card now shows a “Beta” badge next to Agentic parsing, making it clear that this parse mode is still being stabilized for general production use. [New] Calibrated confidence scores on smart-table extractions Both table cells and scalar fields produced by the smart-table extractor now carry a calibrated 0–100 confidence, so review UIs and downstream consumers can prioritize low-confidence values. The scores degrade gracefully when scoring is unavailable, so extractions still complete normally.Technical detail
Technical detail
Confidence is surfaced on each extracted value’s
metadata.confidence field as an integer in [0, 100]. Both judges flip together via the smart_table_confidence_strategy setting (auto / judge / none); the default is on.GET /api/v2/saas_manager/workflow-versions/{id}/results/ rejected the file__id__in and file__id__not_in query parameters with Field 'id' expected a number but got '...' whenever the frontend passed UUIDs (the v2 file API now returns file id as a UUID string). The filter was still joining against the legacy integer PK; it now joins on the file UUID field, matching the regular file filter. Every bulk-download flow that relies on row selection — and the “select all minus N” pattern — is unblocked.
[Fixed] Stricter validation of logo and workflow file uploads
Logo and workflow file uploads are now validated against their actual bytes via magic-byte sniffing rather than the client-supplied Content-Type header, and only file types the parse pipeline can actually process are accepted. Uploading a renamed executable, ZIP, or arbitrary JSON as a workflow file is now rejected at the API boundary instead of silently failing later in the pipeline. Stored objects also carry the correct Content-Type on the storage layer instead of being labeled binary/octet-stream.
[Fixed] Switching organizations on a resource page no longer reverts
Switching the active organization from a workflow detail page (or any other resource page deep-linked from a different org) could occasionally snap back to the original org after showing a “switched” toast. The selection is now preserved on the first click.
[Fixed] Object-field row counts exclude deleted rows
The paginated object-field results endpoint and the slim workflow-results list both now exclude soft-deleted rows from page contents and the row-count total. Counts shown in the UI and returned by the API now match the rows you can actually see.
[Fixed] Sub-table detail rows no longer clip at the bottom
Expanded sub-tables inside object-field result tables previously cut off the bottom row because the row-height calculation didn’t account for the surrounding padding and border. The detail row now renders at full height regardless of how many sub-rows it contains.
[Fixed] Classifier category tabs render again on workflow results
Category tabs on classifier workflow results were rendering empty because the slim results response had stopped including the classifier’s graph_node_id. The field is back in the response (with a batched prefetch to avoid per-file queries), so files routed by a classifier now appear under their correct tab.
[Fixed] Workflow list updates immediately after create, duplicate, or delete
The home page workflow list previously kept the stale list cached after creating, duplicating, or deleting a workflow, so the change only appeared after a refresh. All three mutations now invalidate the workflow list cache and the list reflects the change right away.
May 13, 2026
[New] Studio is now generally available The visual workflow editor (Studio) is now visible to every organization member from the workflow page and the new-workflow page. Write access is still controlled by your workflow role. The legacy schema editor at/workflows/edit/... has been removed — the Studio tab is now the single editing surface.
[New] Row-level actions on object-field result tables
Object-field result tables now expose a per-row dropdown with Validate row, Add row above, Add row below, and Remove row, plus inline up/down chevrons for adjacent reordering. The trailing-column header has a new Validate all rows action that promotes every datapoint in the field to human-verified in a single click.
Technical detail
Technical detail
New endpoints under
/api/v2/datapoints/object-fields/{field_id}/:POST .../rows/(with anindexin[0, n_live_rows]) inserts an empty row and shifts live siblings up by one in the same transaction.POST .../rows/{a}/swap/{b}/exchanges two adjacent live rows.POST .../rows/{position}/validate/bulk-promotes every datapoint in a row to human-verified.POST .../validate-all/does the same for every live row of the object field.
Technical detail
Technical detail
GET /api/v3/iam/signup-vouchers/{code}/ returns a read-only preview of the voucher ({ code, additional_credits }) and never consumes a redemption — actual redemption still happens atomically inside organization creation. Returns 404 voucher_not_found for unknown codes and 410 voucher_expired for expired or exhausted ones. Responses set Cache-Control: no-store, private so admin edits to a voucher are reflected on the next page load.Technical detail
Technical detail
GET /api/v2/saas_manager/workflow-versions/{id}/files/?include=results_lite returns scalar datapoints as { id, display, status } and object fields as { row_count }. Full sub-tables are fetched lazily through GET /api/v2/saas_manager/workflow-versions/{id}/files/{file_id}/fields/{field_persistent_id}/object-results/. The existing include=results (full) path is unchanged and still available for legacy integrators.mailto:sales@anyformat.ai link is gone. After a successful payment, the success page now states how many credits were added and tells the user they can start running workflows immediately.
[Improved] Cleaner extraction on scanned PDFs
The agentic PDF-to-markdown parser now honors the routing model’s verdict on which text source to trust (the embedded PDF text layer, OCR, or vision) for every block type, including miscellaneous blocks that previously ignored the verdict. On scanned PDFs where the embedded text layer is corrupted, blocks now render the clean OCR text instead of garbled bytes.
[Fixed] Returning from Stripe checkout no longer fails authentication
If you stayed on Stripe long enough for the short-lived access token to expire, returning to anyformat used to land on an authorization error screen — even though the payment had already succeeded on Stripe’s side. The app now silently refreshes the token in the background (and replays the first request that hits a 401), so the top-up success page renders cleanly even after a long checkout.
[Fixed] Cross-tab refresh storm and support-elevation isolation
Opening the app in multiple tabs no longer triggers refetch storms when you navigate inside one tab. As a separate fix, support agents who elevate into a customer organization stay elevated only in the current tab — new tabs default back to their own organization until they explicitly re-elevate.
[Fixed] New users are immediately enabled after signup
Users who created their own organization or requested to join an existing one were previously left in a disabled state until an admin reviewed their request, which blocked the create-org path entirely. Both onboarding paths now enable the profile immediately. Join-request approval is still required to gain membership in the target org.
[Improved] Other improvements and fixes
- Structured-output extractions on Amazon Nova models no longer occasionally fail with an empty-JSON parsing error.
- The Usage page no longer renders the “Page consumption” summary box at the top — the cleaner credit-balance card now stands on its own.
- Long organization lists on the join-or-create page no longer get clipped below the fold.
May 12, 2026
[New] Delete rows from object-field result tables Object-field result tables now have a per-row trash action. Remaining rows are reindexed in place so curated positions stay stable and your verification carries over correctly when the file is re-extracted. Rows the model produced are soft-deleted (so accuracy metrics still reflect the model’s miss); rows you added yourself are removed entirely.Technical detail
Technical detail
DELETE /api/v2/datapoints/object-fields/{id}/rows/{position}/. Deletion is transactional. A partial unique constraint allows deleted rows to retain their original index, and the result-building SQL filters soft-deleted rows.organization_id
Workflow and workflow-version responses (including the nested versions collection on workflow detail) now expose a normalized 32-character hex organization_id. Useful when a client needs to identify which org owns a workflow without an extra round-trip.
[Improved] Tax ID and billing address now collected during Stripe Checkout
Top-up Checkout sessions now collect the customer’s tax ID and billing address before invoices are finalized. This unblocks invoice correctness in EU/UK jurisdictions. No action required for existing customers — the values are persisted on the next Checkout.
[Improved] Smoother markdown viewer on long documents
The “visual” markdown view is significantly snappier on long documents. Scroll-driven page detection is now frame-throttled and only fires on real page transitions, sections subscribe to fewer state changes, and highlight-scroll listeners share a single event bus instead of one per section.
[Improved] Faster agentic PDF-to-markdown parsing
Agentic PDF→Markdown parsing now parallelizes work per page — single-call vision blocks fan out and agentic blocks run in a small per-page pool, so page latency tracks the slowest block instead of the serial sum. Layout-routing calls were also tuned for lower latency.
[Fixed] Org switcher no longer goes blank for orgs without a logo
The organization switcher trigger now always renders the org’s initials as the base layer and only overlays the logo image once it loads successfully — so switching to an org without a logo no longer flashes blank.
[Improved] Parse confidence is now reported even when the model omits logprobs
Per-block parse confidence is now computed and stamped on the rendered markdown regardless of whether the parsing model returns token-level log probabilities. Downstream consumers can read data-parse-confidence consistently across all supported parsing providers.
May 11, 2026
[New] Self-service invoice downloads from the usage page Owners and admins can now open Stripe’s hosted Customer Portal from the Usage page to download invoices for past top-ups. Members, reviewers, and viewers don’t see the button — invoices are financial documents.Technical detail
Technical detail
POST /api/v3/billing/organizations/{org_id}/portal-sessions/ mints a short-lived portal URL bound to the org’s billing customer record. Orgs that have never run a top-up return 409 stripe_customer_not_provisioned; the UI surfaces this as a “Top up first” toast. New environment variable STRIPE_BILLING_PORTAL_RETURN_URL falls through to ${FRONTEND_URL}/usage. Configure the operator side (Invoice History toggle, ToS / privacy URLs, default return link) in the Stripe dashboard.404 to mask resource existence; same-org callers are unaffected.
[Improved] Hovering a stacked bounding box highlights every overlapping box
When several bounding boxes overlap, hovering reveals each one stacked under the cursor instead of just the topmost. Rapid mouse movement is collapsed to one diff per frame so the UI stays responsive.
[Improved] Consistent extracted-object UI across Define, Refine, and File Results
Object and table fields now render the same way across the schema builder, refine step, and file-results panel — same headers, same skeletons, same status states. Refine correctly shows error / cancelled copy on failed extractions instead of an empty data view.
May 9, 2026
May 8, 2026
[API] Create workflows with full graph definition in a single call You can now define a workflow’s parse, classify, split, and extract steps in one request — the workflow, its initial version, and all extraction fields are created atomically. The existing workflow-create endpoint is also fully typed, so SDKs get auto-complete and field-level validation.Technical detail
Technical detail
POST /v2/workflows/with_graph/ accepts the complete graph plus per-extract-node fields and creates everything in a single transaction. The original POST /v2/workflows/ body is now declared with discriminated field types.Workflow builder: .parse().classify(...).extract(...).split(...).build() produces a fully validated request. branch= and route_from= accept either an id string or the typed category/rule you registered earlier, so typos fail at the call site instead of at build time.
[Improved] Workflow graph validation catches more errors before run
Non-branching nodes with multiple outgoing edges, missing route_from on classify→split chains, and broken graph transitions are now rejected with clear validation errors instead of producing confusing runtime failures. The new typed graph payload also rolls back cleanly when persistence of one node fails partway.
[New] Voucher codes on signup grant extra wallet credit
Organization creation now accepts an optional voucher_code that grants extra wallet credit to the new org. Vouchers (expiry, usage cap, amount) are managed in the admin panel; invalid or expired codes surface as a clear 400 instead of breaking signup.
[Improved] Doc-viewer table polish — resize, unified sync toggle, safer persisted state
The object-results table is now vertically resizable with sensible bounds (header + one row min, all-rows-visible max). The PDF toolbar’s separate auto-focus and auto-scroll preferences collapse into a single “Sync document and results” toggle, and the lookup-files dialog no longer overflows. Co-mounted views in the same tab stay in sync, and corrupt persisted preferences self-recover instead of breaking the UI on load.
May 7, 2026
[Improved] Org isolation on v2 detail endpoints is now resource-derived Detail endpoints (datapoints, manual datapoints, extractions, workflow-version files, file collections, workflow versions, workflows) now authorize the caller’s role against the resource’s organization instead of a request header. Cross-tenant probes return404; in-tenant callers with insufficient role return 403. URLs are unchanged.
[Improved] More accurate extraction confidence via LLM judge
For extractions where the model doesn’t return logprobs (or where you prefer the judge), a separate judge model now scores each extracted value against its source markdown and outputs a calibrated 0–100 confidence. Object and table cells are batched for cost efficiency. Choose the strategy per task with extraction_confidence_strategy = none | auto | judge.
[Fixed] Login no longer spins on token-exchange failures
The post-login callback now exits cleanly when token exchange or profile fetch fails: a 15-second watchdog renders an error page with a logout option, OAuth errors surface explicitly, and tokens are purged on error.
[New] Bounding boxes for split + extract workflows
The PDF viewer now renders bounding boxes for split + extract workflows by scoping mapped locations per panel — the matching split when navigating by partition, the file’s extraction otherwise. Parse, split, classify, and validation panels paint no boxes, as intended.
[Fixed] Renamed split categories no longer lose historical sub-documents
When you rename a split rule across versions (e.g. “Invoice” → “Bill”), historical sub-documents continue to surface under the renamed tab — the UI now filters by the stable split-node identifier instead of the display string.
May 6, 2026
[New] Top up credits with Stripe Checkout You can now purchase wallet credit through Stripe Checkout from the Usage page. Sessions are idempotent (no duplicate charges on retry), and every successful top-up generates an invoice you can download later from the Customer Portal.Technical detail
Technical detail
POST /api/v3/billing/organizations/{id}/checkout-sessions/ validates the requested credit count against per-session min/max bounds and returns a hosted-page URL. Inbound checkout.session.completed webhooks are HMAC-verified and serialized to guarantee exactly-once credit grant under contention.Cmd/Ctrl-F shortcut, and cross-page highlighting. Search stays responsive on large PDFs thanks to debounced lazy text extraction.
[API] Workflow results now expose classifications, splits, and flat extractions
GET /v2/workflows/{wid}/files/{cid}/results/ now returns classifications, detailed splits (with per-file pages and partitions), and a flat extractions list keyed by (split_name, partition) alongside the existing fields. The v3 file-collection results payload exposes the same keys in place — no new endpoint required.
[Improved] Org switcher orders by ownership, membership, then support access
Organizations are now returned in three alphabetically-sorted buckets — owned, member, support access — replacing the prior database-discovery order. Helpful for accounts that belong to many orgs.
[Fixed] Bulk extraction status no longer marks processed files as pending
Bulk status resolution now aligns identifier formats and queries root extractions only, so processed parents are correctly reported instead of stuck on “pending”.
[Improved] Other improvements and fixes
File status badge in the file-results header, PDF toolbar staying visible on multi-line wrap, clearer toasts when a dropped file’s type isn’t accepted, and a fix for crashes on empty drag-and-drop selections.
May 5, 2026
[Improved] Smart Lookup retries based on residuals Smart Lookup now wraps the worker in an orchestrator + auditor loop that can iterate up to three times if residuals or tool-call signals suggest more work is possible. Search blends BM25 and Jaro-Winkler matching with a per-call query cap to improve recall while keeping runtime bounded. [Improved] Splits panel is now navigable Clicking a split or partition now switches the extraction tab, persists the selected partition in a URL search param, and scrolls the PDF viewer to the matching page. Refresh-safe deep links to a specific partition just work. [Fixed] Duplicating workflows with reused field names no longer fails Workflows that contain multiple extract nodes reusing the same top-level or nested field names can now be duplicated cleanly — field lookups are keyed per source node. [Fixed] Classify-only workflows handle unwired LLM categories When the classifier returns a category with no downstream edge, the run now falls through to the end instead of crashing. Classify-only workflows also persist classification records so results are available afterwards. [Improved] Smart-table list-of-object rows are page-ordered Smart-table list-of-object extraction rows are now sorted by source page and block, so output is stable and easy to align with the source document across runs. Rows with missing page info are placed last. [Improved] Cleaner empty states for splits and category grids Workflows with no rows now render a clean overlay in the splits and category grids instead of stale placeholders.May 4, 2026
[Fixed] Authentication errors now return the correct status codes Unauthenticated requests now return401 (with a WWW-Authenticate header) instead of being coerced to 403. Missing email-claim or unknown-key authentication paths return AuthenticationFailed cleanly instead of 500 errors.
[Improved] Profile-fetch failures show a clear error page
Repeated /profile failures after login now surface an error page with a logout option instead of stranding users on a spinner. Global query errors are also reported with the failing endpoint for faster triage.
[Fixed] Corrupted local preferences no longer break the UI on load
Persisted UI preferences (sync toggles, panel sizes, etc.) are now safely parsed; corrupt keys are dropped on initial load and reported to monitoring with the offending storage key.
April 2026
April 30, 2026
[New] New organizations get 5,000 free credits Every new organization is now automatically granted 5,000 credits on creation, recorded as a standard billing transaction.April 28, 2026
[New] Classify nodes return evidence and confidence Classify nodes now return a structured result with category, evidence, and confidence. The UI surfaces confidence and evidence in the category data viewer, with a dedicated Classify tab at workflow level and a per-file Classify tab with verdict cards. The newclassifications include option exposes the same data via the file API.
[New] HTML files render inside the document viewer
You can now upload .html and .htm files and view them in the document viewer. Files render inside a sandboxed, sanitized iframe to block script execution and same-origin access.
[API] Better OpenAPI for SDKs and try-it surfaces
POST /v2/workflows/ now declares its JSON request body, so SDKs and AI agents stop falling back to raw extra_body dicts. Bearer authentication is now declared as a proper API-key security scheme, so try-it panels render the lock icon and treat auth as required. Webhook responses now expose url as an HttpUrl and created_at as a datetime.
[Improved] Smart-table search now reaches inside table cells
Smart-table search now scans both page prose and table cells, returning cell matches with {table, row, column, value} provenance. Scalar values inside tables are no longer dropped behind prose hits.
[Improved] Per-category accuracy and confidence charts in monitoring
The monitoring tab now groups per-field accuracy and confidence charts into per-category sections for multi-schema workflows, with sensible fallbacks and an improved empty state.
[Improved] Validate view now shows results for classify-routed files
Per-file validation now displays extracted data for files routed via Classify by synthesizing entries from the file’s results when pinned to an extract node. The partition selector and sibling tabs are hidden when they don’t apply.
[Fixed] Improved processing of .docx and .msg uploads
.docx, .msg, and other non-PDF uploads now hydrate from their converted PDF instead of feeding the original binary into the PDF parser. Markdown for these formats is now correct and no longer triggers spurious “corrupt PDF” warnings.
[Improved] Other improvements and fixes
Reliable PDF callback uploads via short-lived presigned URLs (no more “Request Too Big” errors on large rendered PDFs), file-detail columns in splits and per-category tables, search_text helper exposed in the smart-table sandbox, object/list extraction fields no longer dropped from v3 results, timeouts and retries on frontend list requests, lighter /files list payloads, and eval and XFA disabled in the PDF viewer for a smaller attack surface.
April 27, 2026
[Fixed] Smart-table extraction no longer drops scalar fields Smart-table extraction plans now must assign every schema field to either a table or ascalar_fields list, rejecting incomplete plans so scalar fields can no longer be silently dropped.
[Fixed] Master-file + extract workflows now run without error
Non-split workflows that combine a master-file node with an extract node now run cleanly — master-file nodes can augment an extract branch without triggering single-non-split-node validation.
April 24, 2026
[Improved] Tabbed data viewer on workflow detail Reworked the workflow detail data viewer into a tabbed layout with shared grid skeletons and improved loading behavior. Added an inline workflow-name editor and tighter studio reload/loader behavior when workflows change. [Fixed] Empty splits tab now shows a clean overlay The splits tab now uses a proper no-rows overlay (resolving the previous “Element type is invalid” error) and filters out splits with no results, so empty entries no longer appear. [Improved] More content captured during PDF parsing Text missed by layout segmentation now reaches downstream parsing through orphan OCR detection and synthetic text-block injection. Orphans are clustered into lines, merged into nearby blocks where appropriate, or promoted to new blocks; redundant smaller blocks are absorbed to prevent duplicates. [Fixed] Date parsing in the smart-table sandbox no longer silently returnsNone
The sandbox now allows datetime.strptime’s runtime dependencies, so parsed dates are correctly returned instead of being silently converted to None.
[Fixed] Smart-table no longer overwrites populated fields with empty values
Storing an empty list can no longer overwrite an already-populated list field (e.g. a transactions table). Column-statistics summaries now clearly label examples and indicate additional unique values so the worker doesn’t mistake samples for full datasets.
[Improved] Better table merging when headers differ
Same-title tables with differing headers are disambiguated to avoid dropping rows during merge, and the block-parsing prompt and markdown assembly capture text the model sees that isn’t covered by any block id.
[Fixed] Classifier results survive workflow amendments
Category queries continue to return populated classifications and datapoints after prompt adjustments — the version-compatible result builder now maps datapoints across amended versions by their stable persistent identifier.
April 23, 2026
[Fixed] Partitioned splits now flow into Validate and Smart Lookup When a split rule declares apartition_key and the downstream extract is wired into a Validate or Smart Lookup, each partition is now processed independently — one validation pass per partition, one lookup pass per partition — and the results attach to the right per-partition child extraction. Previously the downstream node saw nothing.
[Improved] Per-partition fan-out is now bounded
Per-partition fan-out is now capped (default 4 parallel branches) so a 100-partition document running through extract → validate → lookup doesn’t saturate the model provider.
April 20, 2026
[New] Splits results tab in workflow results Added a Split tab in the extraction results UI showing split-group cards and a collapsible per-page table, plus adocument_split_groups include option on the workflow-version file endpoint with category, partition value, page range, and confidence.
[Improved] Redesigned Classify and Split menus
Both nodes now share a Config / Options tab layout with validated card-based editors, uniqueness checks, reserved “Other” handling for Split, click-to-edit from the Config preview tables, and cleaner Save-from-header behavior that closes open forms first.
[API] Split groups consolidated by category and partition value
document_split_groups responses now group rows by (category, partition_value) and return consolidated page_ranges arrays instead of per-row page_start / page_end fields.
[Improved] Better figure parsing in PDFs
The figure-enhancement prompt now uses chart-type-specific strategies, handles multi-panel figures, gives statistical-annotation guidance, and applies column-header rules. Segment deduplication now preserves the largest picture blocks to avoid losing visual context.
[Improved] Inline field-level errors in the field editor
Replaced the top-level field-form error banner with inline, field-adjacent errors, column-level highlighting, and deduplicated summary messages for enum options and nested fields.
[Improved] Default extraction model upgraded to gpt-5.2
The default extraction model is now gpt-5.2 across markdown and PDF extraction paths.
[Fixed] OneDrive and Google Drive imports work reliably in production
Fixed the cloud-connector reliability issues that affected OneDrive and Google imports in staging and production: a stricter content-security policy now allows the required SDK scripts and frames, SDK load errors surface instead of failing silently, consumer-vs-business OneDrive accounts are detected correctly, and a popup-polling leak is gone.
[Improved] Other improvements and fixes
Validation tab only appears when the workflow includes a Validate node and the file has an extraction; latest_version endpoint returns a clean 400 for invalid workflow UUIDs instead of a noisy 500; OAuth callback token exchange no longer stalls; Smart Lookup handles nested object fields correctly; agentic-extraction description typo fixed; partition tools only exposed when split rules define a partition key; null figure-parsing responses no longer crash; markdown viewer no longer has an infinite scroll loop and the markdown download button moved to the top of the panel.
April 14, 2026
[New] Redesigned Workflow Studio The Studio now ships redesigned Parse and Extract node menus, smart node placement with auto-connect, a way to add connected nodes from an output handle, empty-state guidance, an updated sidebar with hover cards, and save-from-header with per-node validation errors. [New] Validate node for cross-document rule validation A new Validate node type lets you express cross-document rules that run after extraction. [New] XML file support Added an XML-to-PDF converter for files with embedded base64 attachments, so XML files extract end-to-end. [Improved] API keys are hashed at rest, with multiple named keys per user API keys are now hashed at rest, support naming, and you can hold multiple keys per user — better security and easier rotation. [Improved] Per-cell parse confidence with logprob breakdowns Parse confidence now uses cell-level logprobs when available, with per-cell and per-row logprob breakdowns for table blocks. [Improved] Studio warns when leaving with unsaved changes The Studio now blocks tab switches when configuration is dirty or an inline editor is open, and preserves in-progress work across tab navigation. [Improved] Faster, paginated billing usage The billing usage table is now server-side paginated with summary cards, dramatically improving query performance on large datasets. [Improved] Visual grounding toggle for local-model compatibility Added a toggle to disable visual grounding for environments where the local model doesn’t support it. [API] Files nested under workflows; multi-node results; richer OpenAPI File endpoints are now nested under workflows. A new multi-node results endpoint returns results across multiple graph nodes in one call. The OpenAPI spec is now enriched for better SDK documentation, and the billing API gained pagination, filtering, and ordering. [Improved] Other improvements and fixes OCR confidence is now included in segment and table-cell metadata; the schema builder uses a local config store for faster field editing; toast colors, mobile sidebar, long descriptions in choice fields, lookup-file display overflow, and schema editor header sizing fixes; corrupt-PDF hydration errors are now downgraded to warnings.April 6, 2026
[New] Agentic document splitter The batch LLM splitter has been replaced by an agentic workflow. You can now define custom split rules (name, description, optional partition key for grouping repeating sections) directly in the Studio SplitPages node, with a built-in “Other” catch-all rule. [Fixed] Non-figure blocks no longer receive figure wrapping Generic non-figure blocks are no longer treated as pictures, so only true figures receive screenshot crops and figure markup in PDF-to-markdown output. [API] Removed duplicate health check endpoint A duplicate health check endpoint has been removed from the API surface.April 1, 2026
[Improved] Workflow usage endpoint supports pagination and ordering The workflow usage endpoint now supports server-side pagination (page, page_size), ordering, and database-level filtering with the standard results response wrapper.
[API] v1 routes restored with deprecation headers
v1 routes (workflows, jobs, uploads/results) are back with Deprecation and Sunset response headers and X-API-Key auth fallback for backward compatibility.
March 2026
March 30, 2026
[Improved] Rate limits split between submission and general endpoints Rate limits are now applied independently to file-submission endpoints (60 RPM) and general endpoints (600 RPM). Submission operations (run, upload, create file collection) have a stricter limit; read and management endpoints share a higher limit. The two tiers use independent counters.March 26, 2026
[New] Paste images into the workflow creation chat Paste images directly into the workflow creation chat for more contextual AI suggestions (up to 3 images, 3 MB each). [Improved] Concurrent edit protection on workflow saves Workflow configuration saves now detect concurrent edits and surface them, preventing accidental data loss when two users edit at once. [Fixed] Image uploads in workflow suggestions accept files within size limit Image uploads in workflow suggestions are no longer rejected when they’re within the allowed size limit. [Improved] Other improvements and fixes Workflow-name loading indicator no longer gets stuck; buttons now show the pointer cursor on hover.March 25, 2026
[Improved] Upload multiple files in a single requestPOST /v2/files/ now accepts multiple files in a single request.
[API] Customer-facing docs now show Bearer authentication
Customer-facing API docs now reflect the Authorization: Bearer <token> authentication method.
[API] v2 documentation reflects path-based versioning
All API documentation has been updated for v2 endpoints with path-based versioning (/v2/ prefix).
March 19, 2026
[New] EML and MSG email file support You can now upload.eml and .msg (Outlook) email files for extraction. Embedded images in EML files are correctly inlined during conversion.
[API] New allow_extend parameter to append rows on submit
Submit now supports an allow_extend parameter that lets agents append rows to existing submissions instead of replacing them.
[API] v3 types and graceful v2-to-v3 delegation
Updated v3 types, v2 file endpoints now delegate to v3, and v3 errors are translated cleanly for v2 callers.
[Improved] Better OCR text detection
OCR text detection now uses word-count and spatial-coverage heuristics instead of simple truthy checks, and the screenshot is always sent to the model for context.
[Improved] Smart-table resubmit replaces rows instead of skipping
Smart-table resubmit now replaces the previous output instead of skipping, with scoped worker context and a consolidated briefing.
[Improved] Per-user rate limits via API keys
Authentication and rate-limiting are now separated, with per-user rate limits derived from API key identity.
[Improved] HTTP security headers added per pentest findings
Added HTTP security headers across the API per recent pentest recommendations.
[Fixed] Bounding boxes align on pages with different dimensions
Bounding boxes no longer drift on documents that mix page sizes.
[Improved] Other improvements and fixes
Table grounding improvements (HTML cell-id injection now gated and tuned), better login-page rendering with static assets loading correctly, cleaner full-page loader on the auth callback, and PDF viewer fixes for documents that mix page sizes (e.g. US Letter and A4).
March 18, 2026
[New] v2 API with versioned endpoints, pagination wrappers, and improved docs A full v2 API surface ships: versioned endpoints, pagination wrappers, workflow routes, list-extractions, provenance API, and richer OpenAPI documentation. [API] Bearer authentication alongside API key The API now supports the standardAuthorization: Bearer header alongside the existing API key header.
[API] API rate limiting
The API now enforces per-endpoint rate limits.
[Improved] Smart-table extraction planning with table-scoped delegates
The smart-table extraction planner now uses table-scoped delegates, agent limits that scale with document complexity, and improved sandbox tools.
[Improved] Better row-level grounding for tables
Row-level grounding for tables is now more robust via token-overlap pre-mapping and accuracy fixes to anchor finders.
[Improved] Faster page rotation
A cascade decision engine with landscape support and optimized center-crop reuse makes page-rotation detection faster.
[Improved] File metadata toggle and filtered counts
A new metadata visibility toggle on the table header persists locally. File counts now reflect filtered results instead of selected rows, and action buttons show loading state.
[Fixed] File-collection creation works in production
A regression that returned 403 for all file-collection creation calls in production has been fixed.
[Fixed] File download naming
Markdown downloads no longer end up with doubled file extensions.
[Improved] Other improvements and fixes
Security update for vulnerable dependencies; a botched migration that assigned the same UUID to all files has been corrected.
March 5, 2026
[New] Raw JSON viewer alongside validated results Raw JSON extraction results are now available in a dedicated tab alongside validated results, for easier inspection. [Improved] Better document layout analysis Enhanced layout analysis for more accurate table and section detection. [Improved] Hover cards on simple field results Simple field results now display hover cards with additional context. [Improved] Resizable workflow creation chat The workflow creation chat panel is now resizable. [Improved] Refined validation result cards Validation result cards have refined styling that makes validated datapoints easier to identify at a glance. [Fixed] Confidence handling for undefined values Confidence scores no longer break when a value is undefined. [Fixed] Clipboard copy errors now show clear feedback Clipboard copy errors are handled and surfaced with clear feedback instead of failing silently. [Improved] Other improvements and fixes UI styling consistency improvements across result cards and the file viewer.March 4, 2026
[New] Production-grade webhook system You can now create webhook subscriptions, receive real-time event notifications, and configure automatic retries on failure. [Improved] Better error categorization Error messages now use clearer categories, making it easier to understand and resolve extraction issues. [Improved] Schema editor auto-saves Fields and schema changes now auto-save when switching tabs or editing nested fields, preventing accidental data loss. [Improved] Validation page with keyboard navigation The validation page now has improved result cards with keyboard navigation and better loading states. [Improved] Faster OCR processing OCR text recognition is now significantly faster on large documents thanks to GPU-accelerated batch processing. [Fixed] Race conditions in field-form submission Race conditions in field-form submission and save-state management have been eliminated. [Fixed] Security vulnerability in a third-party dependency Fixed a security vulnerability in a third-party dependency.February 2026
February 27, 2026
[Improved] Custom date range filter on the usage page The usage page now lets you filter by a custom date range instead of a fixed year, giving more precise control over billing and usage reports. [Fixed] Reverting to the original value marks fields as verified Editing a field value back to its originally extracted value now correctly marks it as human-verified rather than human-edited.February 25, 2026
[Improved] Better OCR accuracy Upgraded text recognition with better support for Latin languages. [Improved] LLM cost tracking in billing Billing transactions now include per-extraction LLM token counts and costs for full cost transparency. [Improved] More accurate table extraction Improved accuracy when extracting data from tables with closely spaced rows or surrounding text. [Improved] Faster processing during peak usage Extraction now scales to handle high volumes, reducing wait times during busy periods. [New] Sample workflows on new accounts New accounts are now prepopulated with sample workflows and documents for a guided onboarding experience. [Fixed] Smart-lookup crash on empty corpus Smart lookup no longer crashes when the search corpus is empty. [Improved] Faster extraction callbacks and field-list endpoints Performance improvements on extraction callbacks and field-list endpoints. [Fixed] Readable file download filenames File download filenames are now readable by the frontend.February 21, 2026
[Improved] Visual grounding 2.0 Completely revamped evidence highlighting with block-level and element-level precision, making it easier to trace extracted values back to their exact location in the source document. [New] Billing dashboard New usage dashboard with remaining credits, weekly summaries, per-workflow usage breakdown, and CSV export. [Improved] Advanced document layout analysis enabled in production The advanced document segmentation pipeline is now enabled in production for improved table and section detection. [Fixed] Bounding-box highlights no longer stuck after navigating fields Highlights are now cleared correctly when you move between fields. [Fixed] Confidence calibration Fixed calibration configuration for confidence scoring.February 14, 2026
[New] Redesigned home page A personalized home with recent workflows, at-a-glance file-status indicators, and a streamlined workflow creation experience with templates. [New] Redesigned schema builder A new two-step Define & Refine flow with AI-powered field suggestions, inline file previews, and drag-and-drop uploads. [Improved] Faster page rotation About 2× faster page-orientation detection per page. [Improved] More reliable AI field suggestions Field suggestions now use an upgraded model for higher reliability. [Fixed] Smart-lookup settings saved on publish Smart-lookup settings are now correctly saved when publishing a workflow version. [Fixed] Workflow studio sidebar overflow Fixed sidebar overflow in the workflow studio.February 7, 2026
[Improved] New dashboard visual design Refreshed visual design across the platform with updated components, colors, and layout. [Improved] Faster, more reliable evidence highlighting Evidence highlighting now uses an event-driven architecture, making interactions faster and more reliable across all views. [Improved] Files and organization logos served securely through the backend Files and organization logos are now served securely through the backend, improving security and simplifying local development. [New] AI workflow-name suggestions New AI-powered workflow-name suggestions based on your description and uploaded files. [Improved] File status counts in workflow listings Workflow listings now show at-a-glance counts of files by processing status. [Fixed] Document-element classification Fixed a classification regression that could cause incorrect layout detection in some documents.February 3, 2026
[New] Smart Lookup New feature to enrich extracted data with external reference files using AI-powered matching. [Improved] Field source tracking Fields now track their source (extraction vs. smart lookup) for better traceability. [Fixed] Manual field creation processing Field data processing now correctly handles manually created fields.January 2026
January 27, 2026
[New] Usage-based billing Introduced billing for data extraction operations with transparent per-page pricing. [Improved] Faster, more reliable field suggestions Field suggestions are now faster and more reliable when creating workflows. [Fixed] Field value upload formats Resolved an issue where certain field value formats could cause upload errors.January 20, 2026
[Improved] Faster OCR processing Document processing speed improved with optimized text recognition. [New] Extraction review from the dashboard Administrators can now review extractions directly from the dashboard. [Improved] Organization credits management Added the ability to manage organization credits for usage billing.January 13, 2026
[Improved] Faster document processing Significantly improved processing speed for large documents. [New] Filter results by row count Filter extraction results by the number of rows extracted. [New] Configurable per-workflow page rotation Configure custom page-rotation strategies per workflow.January 6, 2026
[Improved] Enhanced results filtering on download Enhanced results download with additional filtering options.December 2025
December 15, 2025
[Improved] Improved reliability Enhanced system stability and faster deployments.December 1, 2025
[Improved] Faster results retrieval for workflows with many documents Optimized results retrieval for workflows that span many documents. [Improved] Detailed usage tracking Added more detailed usage tracking for extraction operations.November 2025
November 24, 2025
[New] Create data rows from specific extractions You can now create new data rows directly from specific extractions.November 17, 2025
[Improved] More accurate table extraction Improved accuracy when extracting data from complex tables. [API] New/user/is-enabled endpoint
A new endpoint lets you check account status.
[Improved] User info on file validation responses
File validation responses now include the user who performed the validation.
