Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.anyformat.ai/llms.txt

Use this file to discover all available pages before exploring further.

The type tells anyformat what kind of value to expect, and shapes how it’s validated and stored. This page is the single source of truth for field types — it lists every type, its name in the UI and in the API, and how to use it well.

All field types at a glance

UI nameAPI data_typeExample value
Textstring"INV-001"
Decimal numberfloat1250.99
Integer numberinteger42
Datedate"2024-03-15"
Date & timedatetime"2024-03-15T10:30:00Z"
Yes / Nobooleantrue
SelectenumOne of a predefined list
Multiselectmulti_selectA list of values from a predefined list
Object (Subtable)objectRepeating structured rows
Listlist["item1", "item2"]
Field types

Field definition

Every field requires at least:
{
  "name": "field_name",
  "description": "What this field represents",
  "data_type": "string"
}
  • name — Unique identifier (snake_case).
  • description — Clear explanation of what to extract. Used by the AI as guidance.
  • data_type — One of the types listed above.
Complex types (object, enum, multi_select) also take extra properties documented below.

Simple types

Best for: Values that vary widely (company names, addresses).Watch out for: Don’t use if values are from a known list — use Select instead.
Best for: Calendar dates when time isn’t relevant. Returned as YYYY-MM-DD.Watch out for: If time appears in the document, prefer Date & time.
Best for: Precise timestamps when documents include time. Returned in ISO 8601.Watch out for: Ambiguity with timezones and format variations.
Best for: Money, percentages, measurements.Watch out for: Avoid if values should be whole numbers.
Best for: Counts, quantities, item numbers.Watch out for: Not suitable for currency.
Best for: Clear true/false questions (Is paid? Is signed?).Watch out for: Avoid vague cues like “maybe” or “often” — make instructions explicit.

Select (enum)

Use enum when the extracted value must be one of a predefined set of options. The field requires an enum_options array.
{
  "name": "payment_status",
  "description": "The current payment status of the invoice",
  "data_type": "enum",
  "enum_options": [
    {"name": "pending",  "description": "Payment has not been received"},
    {"name": "paid",     "description": "Payment has been received in full"},
    {"name": "partial",  "description": "Partial payment has been received"},
    {"name": "overdue",  "description": "Payment is past the due date"}
  ]
}
If no option matches the document, the field value is null. Why use Select:
  • Enforces consistency
  • Avoids spelling variations
  • Makes analytics and filtering reliable
Best practices:
  • Keep options short and unambiguous
  • Provide clear descriptions for each option
  • Avoid overlapping meanings
  • Prefer Select over Text when values repeat

Multiselect (multi_select)

Same shape as enum, but the field can return multiple matched options as an array.
{
  "name": "document_tags",
  "description": "Categories that apply to this document",
  "data_type": "multi_select",
  "enum_options": [
    {"name": "urgent",           "description": "Requires immediate attention"},
    {"name": "confidential",     "description": "Contains sensitive information"},
    {"name": "reviewed",         "description": "Has been reviewed by a team member"},
    {"name": "pending_approval", "description": "Awaiting approval from management"}
  ]
}
Returns an array of strings:
{ "document_tags": ["urgent", "confidential"] }

Select vs. Multiselect

FeatureSelect (enum)Multiselect (multi_select)
SelectionSingle valueMultiple values
Return typestring or nullarray of strings
Use caseMutually exclusive optionsNon-exclusive categories

Object / Subtable (object)

Use object to extract a structured group of properties. Object fields require a nested_fields array.

Single nested object

{
  "name": "shipping_address",
  "description": "Customer shipping address details",
  "data_type": "object",
  "nested_fields": [
    {"name": "street",      "data_type": "string", "description": "Street address including number"},
    {"name": "city",        "data_type": "string", "description": "City name"},
    {"name": "postal_code", "data_type": "string", "description": "ZIP or postal code"},
    {"name": "country",     "data_type": "string", "description": "Country name"}
  ]
}

Repeating rows (Subtable)

Object fields also capture repeating tabular data — invoice line items, transaction rows, anything that’s a list of “things with the same structure”.
line_items (object)
  ├── description (string)
  ├── quantity    (integer)
  ├── unit_price  (float)
  └── line_total  (float)
Each row in the document becomes one object in the resulting array. Use Object when:
  • You see the same set of fields repeating
  • The document contains a list/table where each row is one “item”
  • You need a structured value per row (not a blob of text)
Best practices:
  • Keep subtable fields minimal at first (3–5 columns)
  • Use clear row-level instructions: “Extract one row per item. Ignore headers and totals.”
  • Add shared fields like currency at the top level, not inside each row
  • Start with the most reliable columns first (description + amount), then expand
Object field
Common mistakes:
  • Using Object for a single nested group (like a “vendor address”). If it’s not repeating, it’s fine — but consider whether top-level fields would be simpler.
  • Making the subtable too wide too early (10+ columns increases ambiguity).

Nested objects

Objects can contain objects. Useful for documents like insurance policies with multiple coverage types:
{
  "name": "coverage_details",
  "description": "Insurance coverage information",
  "data_type": "object",
  "nested_fields": [
    {
      "name": "theft_coverage",
      "data_type": "object",
      "description": "Conditions of theft coverage",
      "nested_fields": [
        {"name": "exclusions",      "data_type": "string", "description": "Situations where theft is not covered"},
        {"name": "coverage_limit",  "data_type": "float",  "description": "Maximum coverage amount for theft claims"},
        {"name": "deductible",      "data_type": "float",  "description": "Deductible amount for theft claims"}
      ]
    },
    {
      "name": "fire_coverage",
      "data_type": "object",
      "description": "Conditions of fire damage coverage",
      "nested_fields": [
        {"name": "exclusions",     "data_type": "string", "description": "Situations where fire damage is not covered"},
        {"name": "coverage_limit", "data_type": "float",  "description": "Maximum coverage amount for fire claims"}
      ]
    }
  ]
}

Complete example

A workflow definition that uses several field types together:
{
  "name": "Invoice Processing",
  "description": "Extract invoice data with line items",
  "fields": [
    {"name": "invoice_number", "description": "Unique invoice identifier",                  "data_type": "string"},
    {"name": "issue_date",     "description": "Date when the invoice was issued",            "data_type": "date"},
    {"name": "total_amount",   "description": "Total invoice amount including tax",          "data_type": "float"},
    {"name": "is_paid",        "description": "Whether the invoice has been paid",           "data_type": "boolean"},
    {
      "name": "payment_status",
      "description": "Current payment status",
      "data_type": "enum",
      "enum_options": [
        {"name": "pending", "description": "Awaiting payment"},
        {"name": "paid",    "description": "Fully paid"},
        {"name": "overdue", "description": "Past due date"}
      ]
    },
    {
      "name": "vendor",
      "description": "Vendor information",
      "data_type": "object",
      "nested_fields": [
        {"name": "name",    "data_type": "string", "description": "Vendor company name"},
        {"name": "address", "data_type": "string", "description": "Vendor address"}
      ]
    }
  ]
}

Tips for better results

  1. Be specific in descriptions. “The invoice number, usually starting with INV-” beats “Invoice number”.
  2. Use appropriate types. float for amounts, date for dates — not string.
  3. Keep field names consistent. snake_case throughout.
  4. Describe location only when helpful. “Total amount shown at the bottom right” can disambiguate, but the AI usually doesn’t need it.

What’s next?

Fields

The three properties every field has

Instructions

Write better extraction instructions