Skip to main content
This guide walks through the complete process of creating a workflow, submitting documents for processing, and retrieving results using the AnyFormat API.

Creating a Workflow

First, create a workflow by sending a POST request to /workflows/. This defines the structure of the data you want to extract.
curl -X POST 'https://api.anyformat.ai/workflows/' \
-H 'Content-Type: application/json' \
-H 'x-api-key: YOUR_API_KEY' \
-d '{
  "name": "Invoice Processing Workflow",
  "description": "Extracts key information from invoices",
  "webhook": "https://your-webhook-url.com/endpoint",
  "fields": [
    {
      "name": "invoice_number",
      "description": "The unique invoice identifier",
      "type": "string"
    },
    {
      "name": "issue_date",
      "description": "Date when the invoice was issued",
      "type": "date"
    }
    // ... additional fields ...
  ]
}'
The API will respond with the basic workflow details:
{
  "id": "123",
  "name": "Invoice Processing Workflow",
  "created_at": "2024-01-01T00:00:00.000Z",
  "updated_at": "2024-01-01T00:00:00.000Z"
}

Processing a Document

Once you have created a workflow, you can submit documents for processing using the workflow ID:
curl -X POST 'https://api.anyformat.ai/workflows/{workflow_id}/run/' \
-H 'x-api-key: YOUR_API_KEY' \
-F 'file=@/path/to/your/document.pdf'
The API will return an extraction job ID:
{
  "status": "success",
  "extraction_id": 456,
  "workflow_id": 123
}

Checking Job Status and Results

You can check the status of your extraction job and retrieve results using the job ID:
curl -X GET 'https://api.anyformat.ai/jobs/{job_id}/' \
-H 'x-api-key: YOUR_API_KEY'
While the job is processing, you’ll receive a status update:
{
  "id": "456",
  "status": "in_progress",
  "processed_at": "2024-01-01T00:00:00.000Z",
  "created_at": "2024-01-01T00:00:00.000Z",
  "updated_at": "2024-01-01T00:00:00.000Z",
  "workflow": "123",
  "results": []
}
Once processing is complete, you’ll receive the extracted data:
{
  "id": "456",
  "status": "processed",
  "processed_at": "2024-01-01T00:00:00.000Z",
  "created_at": "2024-01-01T00:00:00.000Z",
  "updated_at": "2024-01-01T00:00:01.000Z",
  "workflow": "123",
  "results": [
    {
      "field_name": "invoice_number",
      "value": "INV-001"
    },
    {
      "field_name": "issue_date",
      "value": "2024-01-01"
    }
    // ... additional extracted fields ...
  ]
}

Best Practices

  1. Polling Interval: When checking job status, implement exponential backoff in your polling logic to avoid overwhelming the API.
  2. Webhook Security: If using webhooks, implement proper security measures such as signature verification.
  3. Error Handling: Always handle potential error states in your integration code.
  4. Field Definitions: Provide clear, specific descriptions for each field to improve extraction accuracy.
  5. File Types: Ensure your documents are in supported formats (PDF, PNG, JPEG, etc.).