Skip to main content

Field Types

Fields define what information should be extracted from your documents. Defining fields and their data types properly is crucial for extraction accuracy. The clearer the description of what you are trying to extract, the better.

Basic Data Types

TypeDescriptionExample Value
stringText values"INV-001"
integerWhole numbers42
floatDecimal numbers1250.99
dateDate values (YYYY-MM-DD)"2024-03-15"
datetimeDate and time values"2024-03-15T10:30:00Z"
booleanTrue/false valuestrue
listArray of values["item1", "item2"]
objectNested object structureSee below
enumSet of predefined choicesSee below
multi_selectMultiple choices from predefined optionsSee below

UI to API Type Mapping

If you’re familiar with the anyformat UI, here’s how the field type names map to API types:
UI NameAPI Type
Textstring
Decimal numberfloat
Integer numberinteger
Datedate
Date & timedatetime
Yes / Noboolean
Selectenum
Multiselectmulti_select
Object (Subtable)object

Field Definition

Each field requires these properties:
{
  "name": "field_name",
  "description": "What this field represents",
  "data_type": "string"
}
  • name: Unique identifier for the field (use snake_case)
  • description: Clear explanation of what to extract (helps AI accuracy)
  • data_type: One of the types listed above

Object Fields

Use object type to extract structured data with multiple nested properties. Object fields require a nested_fields array:
{
  "name": "shipping_address",
  "description": "Customer shipping address details",
  "data_type": "object",
  "nested_fields": [
    {
      "name": "street",
      "data_type": "string",
      "description": "Street address including number"
    },
    {
      "name": "city",
      "data_type": "string",
      "description": "City name"
    },
    {
      "name": "postal_code",
      "data_type": "string",
      "description": "ZIP or postal code"
    },
    {
      "name": "country",
      "data_type": "string",
      "description": "Country name"
    }
  ]
}

Complex Object Example

For documents like insurance policies with multiple coverage types:
{
  "name": "coverage_details",
  "description": "Insurance coverage information",
  "data_type": "object",
  "nested_fields": [
    {
      "name": "theft_coverage",
      "data_type": "object",
      "description": "Conditions of theft coverage",
      "nested_fields": [
        {
          "name": "exclusions",
          "data_type": "string",
          "description": "Situations where theft is not covered"
        },
        {
          "name": "coverage_limit",
          "data_type": "float",
          "description": "Maximum coverage amount for theft claims"
        },
        {
          "name": "deductible",
          "data_type": "float",
          "description": "Deductible amount for theft claims"
        }
      ]
    },
    {
      "name": "fire_coverage",
      "data_type": "object",
      "description": "Conditions of fire damage coverage",
      "nested_fields": [
        {
          "name": "exclusions",
          "data_type": "string",
          "description": "Situations where fire damage is not covered"
        },
        {
          "name": "coverage_limit",
          "data_type": "float",
          "description": "Maximum coverage amount for fire claims"
        }
      ]
    }
  ]
}

Enum Fields

Use enum type when the extracted value should be one of a predefined set of options. Enum fields require an enum_options array:
{
  "name": "payment_status",
  "description": "The current payment status of the invoice",
  "data_type": "enum",
  "enum_options": [
    {
      "name": "pending",
      "description": "Payment has not been received"
    },
    {
      "name": "paid",
      "description": "Payment has been received in full"
    },
    {
      "name": "partial",
      "description": "Partial payment has been received"
    },
    {
      "name": "overdue",
      "description": "Payment is past the due date"
    }
  ]
}
If the document content matches one of the enum options, that value is returned. If no match is found, the field value will be null.

Enum Best Practices

  1. Provide clear descriptions for each option to help the AI match correctly
  2. Keep options distinct - avoid overlapping definitions
  3. Use meaningful names that reflect the actual document terminology

Multi-Select Fields

Use multi_select type when the extracted value can be multiple options from a predefined set. Like enum, it requires an enum_options array, but returns an array of matched values instead of a single value:
{
  "name": "document_tags",
  "description": "Categories that apply to this document",
  "data_type": "multi_select",
  "enum_options": [
    {
      "name": "urgent",
      "description": "Requires immediate attention"
    },
    {
      "name": "confidential",
      "description": "Contains sensitive information"
    },
    {
      "name": "reviewed",
      "description": "Has been reviewed by a team member"
    },
    {
      "name": "pending_approval",
      "description": "Awaiting approval from management"
    }
  ]
}

Multi-Select vs Enum

Featureenummulti_select
SelectionSingle valueMultiple values
Return typestring or nullarray of strings
Use caseMutually exclusive optionsNon-exclusive categories

Multi-Select Response Example

{
  "document_tags": ["urgent", "confidential"]
}

Manual Fields

Manual fields are user-provided values that are not extracted from the document, but included in the results. They’re useful for adding context or metadata to extractions.
{
  "manual_fields": [
    {
      "name": "department",
      "description": "Department that submitted this document",
      "data_type": "string"
    },
    {
      "name": "batch_id",
      "description": "Processing batch identifier",
      "data_type": "string"
    }
  ]
}
Manual fields support all basic data types (string, integer, float, date, datetime, boolean, list) but not object or enum types.
When running an extraction, provide manual field values:
curl -X POST 'https://api.anyformat.ai/workflows/{id}/run/' \
  -H 'x-api-key: YOUR_API_KEY' \
  -F 'file=@document.pdf' \
  -F 'manual_field_values={"department": "Finance", "batch_id": "BATCH-2024-03"}'

Complete Workflow Example

Here’s a complete workflow definition with various field types:
{
  "name": "Invoice Processing",
  "description": "Extract invoice data with line items",
  "fields": [
    {
      "name": "invoice_number",
      "description": "Unique invoice identifier",
      "data_type": "string"
    },
    {
      "name": "issue_date",
      "description": "Date when the invoice was issued",
      "data_type": "date"
    },
    {
      "name": "total_amount",
      "description": "Total invoice amount including tax",
      "data_type": "float"
    },
    {
      "name": "is_paid",
      "description": "Whether the invoice has been paid",
      "data_type": "boolean"
    },
    {
      "name": "payment_status",
      "description": "Current payment status",
      "data_type": "enum",
      "enum_options": [
        {"name": "pending", "description": "Awaiting payment"},
        {"name": "paid", "description": "Fully paid"},
        {"name": "overdue", "description": "Past due date"}
      ]
    },
    {
      "name": "vendor",
      "description": "Vendor information",
      "data_type": "object",
      "nested_fields": [
        {"name": "name", "data_type": "string", "description": "Vendor company name"},
        {"name": "address", "data_type": "string", "description": "Vendor address"}
      ]
    }
  ],
  "manual_fields": [
    {
      "name": "reviewed_by",
      "description": "Name of person who reviewed this invoice",
      "data_type": "string"
    }
  ]
}

Tips for Better Extraction

  1. Be specific in descriptions - “The invoice number, usually starting with INV-” is better than “Invoice number”
  2. Use appropriate types - Use float for amounts, date for dates, not string
  3. Keep field names consistent - Use snake_case naming convention
  4. Describe the location when helpful - “Total amount shown at the bottom right of the invoice”