Invoice Processing
Extract structured data from PDF invoices including vendor details, dates, totals, and individual line items. This recipe uses the object field type with nested_fields to capture tabular line item data.
Workflow Fields
We recommend creating this workflow in the anyformat platform where you can test with sample invoices and iterate on field descriptions. Copy the workflow ID to use with the API.
Field Type Description invoice_numberstring The unique invoice identifier vendor_namestring Name of the company issuing the invoice issue_datedate Date the invoice was issued due_datedate Payment due date subtotalfloat Amount before tax tax_amountfloat Total tax applied total_amountfloat Final amount due including tax currencyenum Currency code (USD, EUR, GBP) line_itemsobject Individual items on the invoice
Field Configuration
{
"fields" : [
{ "name" : "invoice_number" , "description" : "The unique invoice identifier or number" , "data_type" : "string" },
{ "name" : "vendor_name" , "description" : "Name of the company that issued the invoice" , "data_type" : "string" },
{ "name" : "issue_date" , "description" : "Date the invoice was issued" , "data_type" : "date" },
{ "name" : "due_date" , "description" : "Date by which payment is due" , "data_type" : "date" },
{ "name" : "subtotal" , "description" : "Amount before tax" , "data_type" : "float" },
{ "name" : "tax_amount" , "description" : "Total tax amount" , "data_type" : "float" },
{ "name" : "total_amount" , "description" : "Final total amount due including tax" , "data_type" : "float" },
{
"name" : "currency" ,
"description" : "Currency of the invoice amounts" ,
"data_type" : "enum" ,
"enum_options" : [
{ "name" : "USD" , "description" : "US Dollar" },
{ "name" : "EUR" , "description" : "Euro" },
{ "name" : "GBP" , "description" : "British Pound" },
{ "name" : "CAD" , "description" : "Canadian Dollar" },
{ "name" : "AUD" , "description" : "Australian Dollar" }
]
},
{
"name" : "line_items" ,
"description" : "Individual line items listed on the invoice" ,
"data_type" : "object" ,
"nested_fields" : [
{ "name" : "description" , "description" : "Description of the item or service" , "data_type" : "string" },
{ "name" : "quantity" , "description" : "Number of units" , "data_type" : "integer" },
{ "name" : "unit_price" , "description" : "Price per unit" , "data_type" : "float" },
{ "name" : "amount" , "description" : "Total amount for this line item" , "data_type" : "float" }
]
}
]
}
Process a Document
curl -X POST 'https://api.anyformat.ai/v2/workflows/YOUR_WORKFLOW_ID/run/' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file=@invoice.pdf'
Get Results
# Poll for results with backoff
max_attempts = 60
base_delay = 5
for attempt in range (max_attempts):
response = requests.get(
f "https://api.anyformat.ai/v2/files/ { file_id } /extraction/" ,
headers = headers
)
if response.status_code == 200 :
results = response.json()
break
elif response.status_code == 412 :
delay = min (base_delay * ( 1.5 ** min (attempt, 5 )), 30 )
time.sleep(delay)
else :
raise Exception ( f "Error: { response.json()[ 'detail' ] } " )
else :
raise TimeoutError ( "Processing timed out" )
# Use extracted data
print ( f "Invoice # { results[ 'invoice_number' ][ 'value' ] } " )
print ( f "Vendor: { results[ 'vendor_name' ][ 'value' ] } " )
print ( f "Total: { results[ 'currency' ][ 'value' ] } { results[ 'total_amount' ][ 'value' ] } " )
for item in results[ "line_items" ]:
print ( f " - { item[ 'description' ][ 'value' ] } : { item[ 'amount' ][ 'value' ] } " )
Example Response
{
"invoice_number" : { "value" : "INV-2024-0847" , "confidence" : 97 },
"vendor_name" : { "value" : "Acme Consulting LLC" , "confidence" : 95 },
"issue_date" : { "value" : "2024-03-15" , "confidence" : 93 },
"due_date" : { "value" : "2024-04-14" , "confidence" : 91 },
"subtotal" : { "value" : 3750.00 , "confidence" : 94 },
"tax_amount" : { "value" : 337.50 , "confidence" : 92 },
"total_amount" : { "value" : 4087.50 , "confidence" : 96 },
"currency" : { "value" : "USD" , "confidence" : 98 },
"line_items" : [
{
"description" : { "value" : "Strategy consulting - March" , "confidence" : 90 },
"quantity" : { "value" : 40 , "confidence" : 88 },
"unit_price" : { "value" : 75.00 , "confidence" : 91 },
"amount" : { "value" : 3000.00 , "confidence" : 93 }
},
{
"description" : { "value" : "Travel expenses" , "confidence" : 92 },
"quantity" : { "value" : 1 , "confidence" : 95 },
"unit_price" : { "value" : 750.00 , "confidence" : 89 },
"amount" : { "value" : 750.00 , "confidence" : 94 }
}
]
}
Tips
Use float for all monetary amounts, not string. This gives you numeric values you can sum and compare without parsing.
The object type with nested_fields captures repeating tabular data like line items. Each row becomes an object in the array.
Write specific field descriptions. “Total amount due including tax” extracts better than just “total”.
If invoices span multiple currencies, add the enum_options for all currencies you expect.
For multi-page invoices, processing automatically handles all pages.
Next Steps
Field Types Learn about object, enum, and other complex field types
Response Formats Export results as CSV or JSONL for downstream systems