Skip to main content

Overview

Workflows in Anyformat allow you to define and manage your data extraction processes. Each workflow represents a template for extracting specific information from documents. Once a workflow is created, you can use it to process files by submitting them to the File Processing endpoint and specifying the workflow ID.

Create a workflow

To create a new workflow, simply send a POST request with your workflow’s configuration:
curl -X POST 'https://api.anyformat.ai/workflows/' \
-H 'Content-Type: application/json' \
-H 'x-api-key: YOUR_API_KEY' \
-d '{
  "name": "Sample Invoice Workflow",
  "description": "Extract data from invoice documents",
  "fields": [
    {
      "name": "invoice_number",
      "description": "The unique invoice identifier",
      "type": "string"
    },
    {
      "name": "issue_date",
      "description": "Date when the invoice was issued",
      "type": "date"
    },
    {
      "name": "total_amount",
      "description": "Total invoice amount including tax",
      "type": "float"
    }
  ]
}'
A workflow can include manual fields that are not extracted from the document, but whose value is rather provided by the user. These fields must be defined in the manual_fields array when creating a workflow. They can then be used to enrich the extracted data with additional information for each processed file. That information will be appended to the extracted data so you can, for example, perform more complex/precise operations on the extraction results. Manual fields are defined as follows:
{
  "manual_fields": [
    {
      "name": "invoice_number",
      "description": "The unique invoice identifier",
      "type": "string"
    }
  ]
}
An example of a workflow that includes a manual field is as follows:
{
  "name": "Sample Invoice Workflow",
  "description": "Extract data from invoice documents",
  "fields": [
    {
      "name": "invoice_number",
      "description": "The unique invoice identifier",
      "type": "string"
    }
  ],
  "manual_fields": [
    {
      "name": "manual_field",
      "description": "Field which value is provided by the user",
      "type": "string"
    }
  ]
}

Workflow Creation Fields

  • description: Description of what the workflow does
  • fields: Array of fields to extract, each with:
    • name: Field identifier
    • description: What this field represents
    • type: The data type of the field. Must be one of:
      • string: Text values
      • integer: Whole numbers
      • float: Decimal numbers
      • date: Date values (YYYY-MM-DD)
      • datetime: Date and time values
      • boolean: True/false values
      • list: Array of values
      • object: Nested object structure
      • enum: Set of values to be used as choices
  • manual_fields: Array of fields to be provided by the user, each with:
    • name: Field identifier
    • description: What this field represents
    • type: The data type of the field (same values as above, minus object and enum types as they are not supported for manual fields)
Fields define what information should be extracted from your documents. As we strive to extract any kind of information from your data, defining fields and their data types properly is crucial. The clearer the description of what you are trying to extract, the better, as it will ensure the precision with which our AI will be able to extract the information you need.

Defining an object field

{
  "name": "Insurance_Policy",
  "description": "Policy details",
  "fields": [
    {
      "name": "Theft_Coverage",
      "data_type": "object",
      "description": "Conditions of theft coverage",
      "nested_fields": [
        {
          "name": "Theft_Exclusions",
          "data_type": "string",
          "description": "Situations where theft is not covered"
        },
        {
          "name": "Theft_Limits",
          "data_type": "string",
          "description": "Limitations on the number of theft claims per year or monetary limits"
        },
        {
          "name": "Content_Covered?",
          "data_type": "string",
          "description": "Is the content covered in case of theft?"
        },
        {
          "name": "Building_Covered?",
          "data_type": "string",
          "description": "Is the building covered in case of theft?"
        }
      ]
    },
    {
      "name": "Water_Coverage",
      "data_type": "object",
      "description": "Conditions of water damage coverage",
      "nested_fields": [
        {
          "name": "Water_Exclusions",
          "data_type": "string",
          "description": "Situations where water damage is not covered"
        },
        {
          "name": "Content_Covered?",
          "data_type": "string",
          "description": "Is the content covered in case of water damage?"
        },
        {
          "name": "Water_Limits",
          "data_type": "string",
          "description": "Limitations on the number of water damage claims per year or monetary limits"
        },
        {
          "name": "Building_Covered?",
          "data_type": "string",
          "description": "Is the building covered in case of water damage?"
        }
      ]
    },
    {
      "name": "Fire_Coverage",
      "data_type": "object",
      "description": "Conditions of fire coverage",
      "nested_fields": [
        {
          "name": "Fire_Exclusions",
          "data_type": "string",
          "description": "Situations where fire damage is not covered"
        },
        {
          "name": "Fire_Limits",
          "data_type": "string",
          "description": "Limitations on the number of fire claims per year or monetary limits"
        },
        {
          "name": "Content_Covered?",
          "data_type": "string",
          "description": "Is the content covered in case of fire?"
        },
        {
          "name": "Building_Covered?",
          "data_type": "string",
          "description": "Is the building covered in case of fire?"
        }
      ]
    }
  ]
}
With this field definition, if the relevant information is found in the processed document, the nested fields of each object will be extracted.

Defining an enum field

{
  "name": "invoice_status",
  "description": "The status of the invoice",
  "fields": [
    {
      "name": "status",
      "data_type": "enum",
      "description": "The current status of the invoice",
      "enum_options": [
        {
          "name": "pending",
          "description": "Invoice is pending approval"
        },
        {
          "name": "approved",
          "description": "Invoice has been approved"
        },
        {
          "name": "rejected",
          "description": "Invoice has been rejected"
        },
        {
          "name": "paid",
          "description": "Invoice has been paid"
        }
      ]
    }
  ]
}
With this field definition, if the relevant information is found in the processed document, the field will take one of the values defined in the enum_options array. Otherwise, the field will be null.
Workflows created via the API are visible and manageable in the Anyformat app too.

List all workflows

Retrieve a list of all your workflows:
curl -X GET 'https://api.anyformat.ai/workflows/' \
-H 'x-api-key: YOUR_API_KEY'
The response will be an array of all the workflow objects associated to your account (not just the ones created via the API):
[
  {
    "id": "123",
    "name": "Sample Invoice Workflow",
    "created_at": "2024-03-24T12:00:00Z",
    "updated_at": "2024-03-24T12:00:00Z"
  },
  {
    "id": "wf_124",
    "name": "Receipt Workflow",
    "created_at": "2024-03-24T13:00:00Z",
    "updated_at": "2024-03-24T13:00:00Z"
  }
]

Retrieve a workflow

Basic information about a specific workflow can be retrieved by querying the /workflows/:id/ endpoint:
curl -X GET 'https://api.anyformat.ai/workflows/123/' \
-H 'x-api-key: YOUR_API_KEY'
As with the endpoint for listing workflows, the response will include the name as well as the creation and latest update dates:
{
  "id": "123",
  "name": "Sample Invoice Workflow",
  "created_at": "2024-03-24T12:00:00Z",
  "updated_at": "2024-03-24T12:00:00Z",
}