Skip to main contentCore Concepts
This page explains key concepts and terminology used across the Anyformat API.
Workflows
A Workflow defines what information should be extracted from your documents. Think of it as a template or schema for data extraction.
Each workflow contains:
- A unique identifier
- A name and description
- A set of fields that define what data to extract
- Optional manual fields for user-provided information
Fields
Fields define specific data points to extract from your documents. Each field has:
- A name (e.g., “invoice_number”)
- A description that helps the AI understand what to extract
- A data type (string, integer, float, date, etc.) - see Field Types for details
An Extraction is the process and result of applying a workflow to a document. When you submit a document to be processed with a workflow, an extraction job is created, which contains:
- A unique job ID
- The workflow ID used for extraction
- Status information (pending, in_progress, processed, error)
- Extracted data points (when processing is complete)
Results
Results are the structured data extracted from your documents. Each extracted data point includes:
- The field name
- The extracted value
- A confidence score (0-100)
- Evidence information (location in the document where the data was found)
Evidence
Evidence is an array of metadata objects that indicate where in the document pieces of information were found. This is an array because depending on the type of information one is looking for, sometimes it is inferred instead of directly extracted from a concrete place. The evidence array is therefore way more useful when it comes to validating the results!
Each evidence object provides:
- The actual snippet of text from which the data was extracted
- The page number in the document
Confidence
The Confidence score (0-100) indicates how certain the system is about an extracted value. Higher scores indicate greater confidence in the extraction accuracy.
Additional Resources
For detailed information about specific topics, see: