Skip to main content
This guide walks through the complete process of creating a workflow, submitting documents for processing, and retrieving results using the AnyFormat API.

Creating a Workflow

We recommend creating workflows in the anyformat platform where you can visually configure fields and test with sample documents. Once your workflow is ready, copy its ID and skip to Submitting a Document to start extracting via the API.
You can also create a workflow programmatically by sending a POST request to /v2/workflows/. This defines the structure of the data you want to extract.
curl -X POST 'https://api.anyformat.ai/v2/workflows/' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
  "name": "Invoice Processing Workflow",
  "description": "Extracts key information from invoices",
  "fields": [
    {
      "name": "invoice_number",
      "description": "The unique invoice identifier",
      "data_type": "string"
    },
    {
      "name": "issue_date",
      "description": "Date when the invoice was issued",
      "data_type": "date"
    }
  ]
}'
The API will respond with the basic workflow details:
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Invoice Processing Workflow",
  "description": "Extracts key information from invoices",
  "created_at": "2024-01-01T00:00:00.000Z",
  "updated_at": "2024-01-01T00:00:00.000Z"
}

Processing a Document

Once you have created a workflow, you can submit documents for processing using the workflow ID:
curl -X POST 'https://api.anyformat.ai/v2/workflows/{workflow_id}/run/' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-F 'file=@/path/to/your/document.pdf'
The API will return a file UUID:
{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "status": "accepted",
  "workflow_id": "550e8400-e29b-41d4-a716-446655440000"
}

Checking Status

Use the file id to poll for results. The endpoint returns 412 while processing and 200 when results are ready:
curl -X GET 'https://api.anyformat.ai/v2/files/{file_id}/extraction/' \
-H 'Authorization: Bearer YOUR_API_KEY'
While processing, you’ll receive a 412 Precondition Failed response:
{
  "error": "Extraction not yet available",
  "detail": "Extraction status is 'processing'. Results are available when status is 'processed'.",
  "error_code": "PRECONDITION_FAILED",
  "retryable": true,
  "request_id": "a1b2c3d4e5f67890abcdef1234567890"
}
Once processing is complete, you’ll receive 200 OK with the extracted data:
{
  "invoice_number": {
    "value": "INV-001",
    "confidence": 95,
    "evidence": [{"text": "Invoice #INV-001", "page_number": 1}],
    "verification_status": "not_verified",
    "value_unit": null
  },
  "issue_date": {
    "value": "2024-01-01",
    "confidence": 90,
    "evidence": [{"text": "Date: 2024-01-01", "page_number": 1}],
    "verification_status": "not_verified",
    "value_unit": null
  }
}

Polling Example

Prefer webhooks over polling for production integrations. Webhooks deliver results immediately without consuming your rate limit.
import requests
import time

headers = {"Authorization": "Bearer YOUR_API_KEY"}
file_id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"

max_attempts = 60  # 5 minutes at 5-second intervals
base_delay = 5     # seconds

for attempt in range(max_attempts):
    response = requests.get(
        f"https://api.anyformat.ai/v2/files/{file_id}/extraction/",
        headers=headers
    )

    if response.status_code == 200:
        results = response.json()
        print(results)
        break
    elif response.status_code == 412:
        # Still processing — typically takes 10-60 seconds
        # depending on document size and complexity
        delay = min(base_delay * (1.5 ** min(attempt, 5)), 30)
        time.sleep(delay)
    elif response.status_code == 429:
        retry_after = int(response.headers.get("Retry-After", 10))
        time.sleep(retry_after)
    else:
        error = response.json()
        detail = error.get("detail", "")
        # cancelled is a terminal state — stop polling
        if "cancelled" in detail:
            print("Processing was cancelled")
        else:
            print(f"Error: {detail}")
        break
else:
    print("Polling timed out after 5 minutes")

Webhook Notifications

Instead of polling, you can configure webhooks to receive notifications when processing completes or fails. The supported event types are:
  • extraction.completed — processing finished successfully, results are available
  • extraction.failed — processing encountered an error
When an event fires, your HTTPS endpoint receives a POST request with the event payload. See the webhooks documentation for setup instructions.

Error Handling

The endpoint returns 412 PRECONDITION_FAILED whenever results aren’t ready. The retryable flag tells you whether to keep polling:
  • retryable: true — still running (pending, queued, in_progress, processing). Keep polling.
  • retryable: false — reached a terminal failure state (error, cancelled). Stop polling; retrying will not produce results.
Example of a terminal failure:
{
  "error": "Extraction not yet available",
  "detail": "Extraction status is 'error'. Results are available when status is 'processed'.",
  "error_code": "PRECONDITION_FAILED",
  "retryable": false,
  "request_id": "a1b2c3d4e5f67890abcdef1234567890"
}
Check the Error Handling documentation for the complete list of error codes and recommended retry patterns.

Best Practices

  1. Polling Strategy: Use exponential backoff when polling for results. Start with 5-second intervals and increase the delay between attempts. See the error handling retry pattern for a complete implementation.
  2. Webhook Security: If using webhooks, verify the signature using the secret returned when creating the webhook subscription.
  3. Error Handling: Always handle potential error states in your integration code. Check for both HTTP status codes and the error_code field.
  4. Field Definitions: Provide clear, specific descriptions for each field to improve accuracy.
  5. File Types: Ensure your documents are in supported formats: PDF, DOC, DOCX, TXT, HTML, HTM, RTF, ODT, PPT, PPTX, EPUB, XLSX, XLS, MD, MARKDOWN, PNG, JPG, JPEG, GIF, BMP, TIFF, EML, MSG, MP3, WAV. Maximum file size is 20 MB.