Field Types
Fields define what information should be extracted from your documents. Defining fields and their data types properly is crucial for extraction accuracy. The clearer the description of what you are trying to extract, the better.
Basic Data Types
| Type | Description | Example Value |
|---|
string | Text values | "INV-001" |
integer | Whole numbers | 42 |
float | Decimal numbers | 1250.99 |
date | Date values (YYYY-MM-DD) | "2024-03-15" |
datetime | Date and time values | "2024-03-15T10:30:00Z" |
boolean | True/false values | true |
list | Array of values | ["item1", "item2"] |
object | Nested object structure | See below |
enum | Set of predefined choices | See below |
multi_select | Multiple choices from predefined options | See below |
UI to API Type Mapping
If you’re familiar with the anyformat UI, here’s how the field type names map to API types:
| UI Name | API Type |
|---|
| Text | string |
| Decimal number | float |
| Integer number | integer |
| Date | date |
| Date & time | datetime |
| Yes / No | boolean |
| Select | enum |
| Multiselect | multi_select |
| Object (Subtable) | object |
Field Definition
Each field requires these properties:
{
"name": "field_name",
"description": "What this field represents",
"data_type": "string"
}
- name: Unique identifier for the field (use snake_case)
- description: Clear explanation of what to extract (helps AI accuracy)
- data_type: One of the types listed above
Object Fields
Use object type to extract structured data with multiple nested properties. Object fields require a nested_fields array:
{
"name": "shipping_address",
"description": "Customer shipping address details",
"data_type": "object",
"nested_fields": [
{
"name": "street",
"data_type": "string",
"description": "Street address including number"
},
{
"name": "city",
"data_type": "string",
"description": "City name"
},
{
"name": "postal_code",
"data_type": "string",
"description": "ZIP or postal code"
},
{
"name": "country",
"data_type": "string",
"description": "Country name"
}
]
}
Complex Object Example
For documents like insurance policies with multiple coverage types:
{
"name": "coverage_details",
"description": "Insurance coverage information",
"data_type": "object",
"nested_fields": [
{
"name": "theft_coverage",
"data_type": "object",
"description": "Conditions of theft coverage",
"nested_fields": [
{
"name": "exclusions",
"data_type": "string",
"description": "Situations where theft is not covered"
},
{
"name": "coverage_limit",
"data_type": "float",
"description": "Maximum coverage amount for theft claims"
},
{
"name": "deductible",
"data_type": "float",
"description": "Deductible amount for theft claims"
}
]
},
{
"name": "fire_coverage",
"data_type": "object",
"description": "Conditions of fire damage coverage",
"nested_fields": [
{
"name": "exclusions",
"data_type": "string",
"description": "Situations where fire damage is not covered"
},
{
"name": "coverage_limit",
"data_type": "float",
"description": "Maximum coverage amount for fire claims"
}
]
}
]
}
Enum Fields
Use enum type when the extracted value should be one of a predefined set of options. Enum fields require an enum_options array:
{
"name": "payment_status",
"description": "The current payment status of the invoice",
"data_type": "enum",
"enum_options": [
{
"name": "pending",
"description": "Payment has not been received"
},
{
"name": "paid",
"description": "Payment has been received in full"
},
{
"name": "partial",
"description": "Partial payment has been received"
},
{
"name": "overdue",
"description": "Payment is past the due date"
}
]
}
If the document content matches one of the enum options, that value is returned. If no match is found, the field value will be null.
Enum Best Practices
- Provide clear descriptions for each option to help the AI match correctly
- Keep options distinct - avoid overlapping definitions
- Use meaningful names that reflect the actual document terminology
Multi-Select Fields
Use multi_select type when the extracted value can be multiple options from a predefined set. Like enum, it requires an enum_options array, but returns an array of matched values instead of a single value:
{
"name": "document_tags",
"description": "Categories that apply to this document",
"data_type": "multi_select",
"enum_options": [
{
"name": "urgent",
"description": "Requires immediate attention"
},
{
"name": "confidential",
"description": "Contains sensitive information"
},
{
"name": "reviewed",
"description": "Has been reviewed by a team member"
},
{
"name": "pending_approval",
"description": "Awaiting approval from management"
}
]
}
Multi-Select vs Enum
| Feature | enum | multi_select |
|---|
| Selection | Single value | Multiple values |
| Return type | string or null | array of strings |
| Use case | Mutually exclusive options | Non-exclusive categories |
Multi-Select Response Example
{
"document_tags": ["urgent", "confidential"]
}
Manual Fields
Manual fields are user-provided values that are not extracted from the document, but included in the results. They’re useful for adding context or metadata to extractions.
{
"manual_fields": [
{
"name": "department",
"description": "Department that submitted this document",
"data_type": "string"
},
{
"name": "batch_id",
"description": "Processing batch identifier",
"data_type": "string"
}
]
}
Manual fields support all basic data types (string, integer, float, date, datetime, boolean, list) but not object or enum types.
When running an extraction, provide manual field values:
curl -X POST 'https://api.anyformat.ai/workflows/{id}/run/' \
-H 'x-api-key: YOUR_API_KEY' \
-F 'file=@document.pdf' \
-F 'manual_field_values={"department": "Finance", "batch_id": "BATCH-2024-03"}'
Complete Workflow Example
Here’s a complete workflow definition with various field types:
{
"name": "Invoice Processing",
"description": "Extract invoice data with line items",
"fields": [
{
"name": "invoice_number",
"description": "Unique invoice identifier",
"data_type": "string"
},
{
"name": "issue_date",
"description": "Date when the invoice was issued",
"data_type": "date"
},
{
"name": "total_amount",
"description": "Total invoice amount including tax",
"data_type": "float"
},
{
"name": "is_paid",
"description": "Whether the invoice has been paid",
"data_type": "boolean"
},
{
"name": "payment_status",
"description": "Current payment status",
"data_type": "enum",
"enum_options": [
{"name": "pending", "description": "Awaiting payment"},
{"name": "paid", "description": "Fully paid"},
{"name": "overdue", "description": "Past due date"}
]
},
{
"name": "vendor",
"description": "Vendor information",
"data_type": "object",
"nested_fields": [
{"name": "name", "data_type": "string", "description": "Vendor company name"},
{"name": "address", "data_type": "string", "description": "Vendor address"}
]
}
],
"manual_fields": [
{
"name": "reviewed_by",
"description": "Name of person who reviewed this invoice",
"data_type": "string"
}
]
}
- Be specific in descriptions - “The invoice number, usually starting with INV-” is better than “Invoice number”
- Use appropriate types - Use
float for amounts, date for dates, not string
- Keep field names consistent - Use snake_case naming convention
- Describe the location when helpful - “Total amount shown at the bottom right of the invoice”