Field Types

Fields define what information should be extracted from your documents. Defining fields and their data types properly is crucial for extraction accuracy. The clearer the description of what you are trying to extract, the better.

Basic Data Types

Type	Description	Example Value
`string`	Text values	`"INV-001"`
`integer`	Whole numbers	`42`
`float`	Decimal numbers	`1250.99`
`date`	Date values (YYYY-MM-DD)	`"2024-03-15"`
`datetime`	Date and time values	`"2024-03-15T10:30:00Z"`
`boolean`	True/false values	`true`
`list`	Array of values	`["item1", "item2"]`
`object`	Nested object structure	See below
`enum`	Set of predefined choices	See below
`multi_select`	Multiple choices from predefined options	See below

UI to API Type Mapping

If you’re familiar with the anyformat UI, here’s how the field type names map to API types:

UI Name	API Type
Text	`string`
Decimal number	`float`
Integer number	`integer`
Date	`date`
Date & time	`datetime`
Yes / No	`boolean`
Select	`enum`
Multiselect	`multi_select`
Object (Subtable)	`object`

Field Definition

Each field requires these properties:

{
  "name": "field_name",
  "description": "What this field represents",
  "data_type": "string"
}

name: Unique identifier for the field (use snake_case)
description: Clear explanation of what to extract (helps AI accuracy)
data_type: One of the types listed above

Object Fields

Use object type to extract structured data with multiple nested properties. Object fields require a nested_fields array:

{
  "name": "shipping_address",
  "description": "Customer shipping address details",
  "data_type": "object",
  "nested_fields": [
    {
      "name": "street",
      "data_type": "string",
      "description": "Street address including number"
    },
    {
      "name": "city",
      "data_type": "string",
      "description": "City name"
    },
    {
      "name": "postal_code",
      "data_type": "string",
      "description": "ZIP or postal code"
    },
    {
      "name": "country",
      "data_type": "string",
      "description": "Country name"
    }
  ]
}

Complex Object Example

For documents like insurance policies with multiple coverage types:

{
  "name": "coverage_details",
  "description": "Insurance coverage information",
  "data_type": "object",
  "nested_fields": [
    {
      "name": "theft_coverage",
      "data_type": "object",
      "description": "Conditions of theft coverage",
      "nested_fields": [
        {
          "name": "exclusions",
          "data_type": "string",
          "description": "Situations where theft is not covered"
        },
        {
          "name": "coverage_limit",
          "data_type": "float",
          "description": "Maximum coverage amount for theft claims"
        },
        {
          "name": "deductible",
          "data_type": "float",
          "description": "Deductible amount for theft claims"
        }
      ]
    },
    {
      "name": "fire_coverage",
      "data_type": "object",
      "description": "Conditions of fire damage coverage",
      "nested_fields": [
        {
          "name": "exclusions",
          "data_type": "string",
          "description": "Situations where fire damage is not covered"
        },
        {
          "name": "coverage_limit",
          "data_type": "float",
          "description": "Maximum coverage amount for fire claims"
        }
      ]
    }
  ]
}

Enum Fields

Use enum type when the extracted value should be one of a predefined set of options. Enum fields require an enum_options array:

{
  "name": "payment_status",
  "description": "The current payment status of the invoice",
  "data_type": "enum",
  "enum_options": [
    {
      "name": "pending",
      "description": "Payment has not been received"
    },
    {
      "name": "paid",
      "description": "Payment has been received in full"
    },
    {
      "name": "partial",
      "description": "Partial payment has been received"
    },
    {
      "name": "overdue",
      "description": "Payment is past the due date"
    }
  ]
}

If the document content matches one of the enum options, that value is returned. If no match is found, the field value will be null.

Enum Best Practices

Provide clear descriptions for each option to help the AI match correctly
Keep options distinct - avoid overlapping definitions
Use meaningful names that reflect the actual document terminology

Multi-Select Fields

Use multi_select type when the extracted value can be multiple options from a predefined set. Like enum, it requires an enum_options array, but returns an array of matched values instead of a single value:

{
  "name": "document_tags",
  "description": "Categories that apply to this document",
  "data_type": "multi_select",
  "enum_options": [
    {
      "name": "urgent",
      "description": "Requires immediate attention"
    },
    {
      "name": "confidential",
      "description": "Contains sensitive information"
    },
    {
      "name": "reviewed",
      "description": "Has been reviewed by a team member"
    },
    {
      "name": "pending_approval",
      "description": "Awaiting approval from management"
    }
  ]
}

Multi-Select vs Enum

Feature	`enum`	`multi_select`
Selection	Single value	Multiple values
Return type	`string` or `null`	`array` of strings
Use case	Mutually exclusive options	Non-exclusive categories

Multi-Select Response Example

{
  "document_tags": ["urgent", "confidential"]
}

Manual Fields

Manual fields are user-provided values that are not extracted from the document, but included in the results. They’re useful for adding context or metadata to extractions.

{
  "manual_fields": [
    {
      "name": "department",
      "description": "Department that submitted this document",
      "data_type": "string"
    },
    {
      "name": "batch_id",
      "description": "Processing batch identifier",
      "data_type": "string"
    }
  ]
}

Manual fields support all basic data types (string, integer, float, date, datetime, boolean, list) but not object or enum types.

When running an extraction, provide manual field values:

curl -X POST 'https://api.anyformat.ai/workflows/{id}/run/' \
  -H 'x-api-key: YOUR_API_KEY' \
  -F 'file=@document.pdf' \
  -F 'manual_field_values={"department": "Finance", "batch_id": "BATCH-2024-03"}'

Complete Workflow Example

Here’s a complete workflow definition with various field types:

{
  "name": "Invoice Processing",
  "description": "Extract invoice data with line items",
  "fields": [
    {
      "name": "invoice_number",
      "description": "Unique invoice identifier",
      "data_type": "string"
    },
    {
      "name": "issue_date",
      "description": "Date when the invoice was issued",
      "data_type": "date"
    },
    {
      "name": "total_amount",
      "description": "Total invoice amount including tax",
      "data_type": "float"
    },
    {
      "name": "is_paid",
      "description": "Whether the invoice has been paid",
      "data_type": "boolean"
    },
    {
      "name": "payment_status",
      "description": "Current payment status",
      "data_type": "enum",
      "enum_options": [
        {"name": "pending", "description": "Awaiting payment"},
        {"name": "paid", "description": "Fully paid"},
        {"name": "overdue", "description": "Past due date"}
      ]
    },
    {
      "name": "vendor",
      "description": "Vendor information",
      "data_type": "object",
      "nested_fields": [
        {"name": "name", "data_type": "string", "description": "Vendor company name"},
        {"name": "address", "data_type": "string", "description": "Vendor address"}
      ]
    }
  ],
  "manual_fields": [
    {
      "name": "reviewed_by",
      "description": "Name of person who reviewed this invoice",
      "data_type": "string"
    }
  ]
}

Tips for Better Extraction

Be specific in descriptions - “The invoice number, usually starting with INV-” is better than “Invoice number”
Use appropriate types - Use float for amounts, date for dates, not string
Keep field names consistent - Use snake_case naming convention
Describe the location when helpful - “Total amount shown at the bottom right of the invoice”

Overview

Workflows

Jobs

Field Types

Field Types

Basic Data Types

UI to API Type Mapping

Field Definition

Object Fields

Complex Object Example

Enum Fields

Enum Best Practices

Multi-Select Fields

Multi-Select vs Enum

Multi-Select Response Example

Manual Fields

Complete Workflow Example

Tips for Better Extraction

Overview

Workflows

Jobs

​Field Types

​Basic Data Types

​UI to API Type Mapping

​Field Definition

​Object Fields

​Complex Object Example

​Enum Fields

​Enum Best Practices

​Multi-Select Fields

​Multi-Select vs Enum

​Multi-Select Response Example

​Manual Fields

​Complete Workflow Example

​Tips for Better Extraction

Field Types

Basic Data Types

UI to API Type Mapping

Field Definition

Object Fields

Complex Object Example

Enum Fields

Enum Best Practices

Multi-Select Fields

Multi-Select vs Enum

Multi-Select Response Example

Manual Fields

Complete Workflow Example

Tips for Better Extraction