Skip to main content

What is a field?

A field represents one piece of information inside a schema. Examples:
  • Invoice number
  • Issue date
  • Total amount
  • Vendor name
Fields are the most important part of data quality. Clear fields lead to better results and less review.

Field properties

Every field in anyformat has at least three properties:
A human-readable label that still makes sense outside anyformat.Examples: invoice_number, issue_date, total_amount
Tells anyformat what kind of value to expect (text, date, number, etc.).This improves consistency, validation, and output quality.
Plain-English guidance for how to extract the value.Example: “Extract the final total including taxes. Ignore subtotals.”

Field types overview

Each field has a type. Field types help anyformat understand how to interpret, validate, and output values.
Field typeWhen to useExample
TextFree-form informationCompany name, address
DateCalendar datesInvoice date
Date & timePrecise timestampsTransaction time
Decimal numberNumbers with decimalsTotal amount, tax
Integer numberWhole numbers onlyQuantity, item count
Yes / No (Boolean)True / falseIs paid?
SelectOne value from a listStatus: Paid/Unpaid
MultiselectMultiple values from a listTags/categories
Object (Subtable)Repeating structured itemsLine items in an invoice

Field type best practices

Best for: Values that vary widely (company names, addresses)Watch out for: Don’t use if values are from a known list — use Select instead
Best for: Calendar dates when time isn’t relevantWatch out for: If time appears in the document, prefer Date & Time
Best for: Precise timestamps when documents include timeWatch out for: Ambiguity with timezones and format variations
Best for: Money, percentages, measurementsWatch out for: Avoid if values should be whole numbers
Best for: Counts, quantities, item numbersWatch out for: Not suitable for currency
Best for: Clear true/false questions (Is paid? Is signed?)Watch out for: Avoid vague cues like “maybe” or “often” — make instructions explicit

Complex field types

Select fields

Select fields are used when a value must be one of several predefined options. Example:
  • Status: Paid, Unpaid, Overdue
Why use Select?
  • Enforces consistency
  • Avoids spelling variations
  • Makes analytics and filtering reliable
Best practices:
  • Keep options short and unambiguous
  • Avoid overlapping meanings
  • Prefer Select over Text when values repeat

Multiselect fields

Multiselect fields allow multiple options to be selected. Example:
  • Document categories: Invoice, Receipt, Contract
Use Multiselect when:
  • More than one value can apply
  • Order does not matter
Best practices:
  • Limit the number of options
  • Avoid using Multiselect for free-form tagging
  • If values are mutually exclusive, use Select instead

Object (Subtable)

Object (Subtable) is for repeating groups of fields — basically a table of rows. Think:
  • An invoice has multiple line items
  • Each line item has the same structure (description, quantity, unit price, total)
Example schema pattern:
line_items (Object/Subtable)
  ├── description (Text)
  ├── quantity (Integer number)
  ├── unit_price (Decimal number)
  └── line_total (Decimal number)
Use Object/Subtable when:
  • You see the same set of fields repeating
  • The document contains a list/table where each row is one “item”
  • You need structured output per row (not a blob of text)
Best practices:
  • Keep subtable fields minimal at first (3-5 columns)
  • Use clear row-level instructions: “Extract one row per item. Ignore headers and totals.”
  • Add shared fields like currency at the top-level (not inside each row)
  • Start with the most reliable columns first (description + amount), then expand
Common mistakes:
  • Using Object/Subtable for a single nested object (like a “vendor address” group). If it’s not repeating, it usually shouldn’t be a subtable.
  • Making the subtable too wide too early (10+ columns increases ambiguity)

What’s next?