Skip to main content

Core Concepts

This page explains key concepts and terminology used across the anyformat API.

Workflows

A Workflow defines what information should be extracted from your documents. Think of it as a template or schema for document processing. Each workflow contains:
  • A unique identifier (UUID)
  • A name and description
  • A set of fields that define what data to extract

Fields

Fields define specific data points to extract from your documents. Each field has:
  • A name (e.g., “invoice_number”)
  • A description that helps the AI understand what to extract
  • A data type (string, integer, float, date, etc.) - see Field Types for details

Runs

A Run is the process and result of applying a workflow to a document. When you submit a document to be processed with a workflow via POST /v2/workflows/{id}/run/, a file is created, which contains:
  • A file UUID (the id in the response)
  • A status tracking the processing lifecycle:
StatusDescription
pendingFile created, processing not yet started
queuedWaiting for an available processing slot
in_progressProcessing is actively running
processedProcessing complete, results available
errorProcessing failed
cancelledProcessing was cancelled (terminal state, stop polling)
  • Extracted data points (available when status is processed)
Poll results via GET /v2/files/{file_id}/extraction/ — returns 412 while processing and 200 when results are ready.

Results

Results are the structured data extracted from your documents. Each data point includes:
  • The field name
  • The extracted value
  • A confidence score (0-100)
  • Evidence information (location in the document where the data was found)

Evidence

Evidence is an array of metadata objects that indicate where in the document pieces of information were found. This is an array because depending on the type of information one is looking for, sometimes it is inferred instead of directly extracted from a concrete place. The evidence array is therefore way more useful when it comes to validating the results! Each evidence object provides:
  • The actual snippet of text from which the data was extracted
  • The page number in the document

Confidence

The Confidence score (0-100) indicates how certain the system is about an extracted value. Higher scores indicate greater confidence in the accuracy.

Additional Resources

For detailed information about specific topics, see: