Skip to main content

Bank Statement Processing

Extract account details and individual transactions from bank statements (PDF or XLSX). This recipe uses the workflow results endpoint to retrieve unified JSON results, which is useful when you process many statements through the same workflow and want to access all results at once.

Workflow Fields

We recommend creating this workflow in the anyformat platform where you can test with sample statements and iterate on field descriptions. Copy the workflow ID to use with the API.
FieldTypeDescription
account_holderstringName on the account
account_numberstringAccount number (masked or full)
statement_period_startdateStart of statement period
statement_period_enddateEnd of statement period
opening_balancefloatBalance at start of period
closing_balancefloatBalance at end of period
total_depositsfloatSum of all deposits
total_withdrawalsfloatSum of all withdrawals
transaction_countintegerNumber of transactions
transactionsobjectIndividual transaction records

Field Configuration

{
  "fields": [
    {"name": "account_holder", "description": "Name of the account holder as shown on the statement", "data_type": "string"},
    {"name": "account_number", "description": "Bank account number, may be partially masked", "data_type": "string"},
    {"name": "statement_period_start", "description": "First day of the statement period", "data_type": "date"},
    {"name": "statement_period_end", "description": "Last day of the statement period", "data_type": "date"},
    {"name": "opening_balance", "description": "Account balance at the start of the statement period", "data_type": "float"},
    {"name": "closing_balance", "description": "Account balance at the end of the statement period", "data_type": "float"},
    {"name": "total_deposits", "description": "Total amount deposited during the statement period", "data_type": "float"},
    {"name": "total_withdrawals", "description": "Total amount withdrawn during the statement period", "data_type": "float"},
    {"name": "transaction_count", "description": "Total number of transactions in the statement period", "data_type": "integer"},
    {
      "name": "transactions",
      "description": "Individual transactions listed on the statement",
      "data_type": "object",
      "nested_fields": [
        {"name": "date", "description": "Date of the transaction", "data_type": "date"},
        {"name": "description", "description": "Transaction description or memo", "data_type": "string"},
        {"name": "amount", "description": "Transaction amount (positive for deposits, negative for withdrawals)", "data_type": "float"},
        {
          "name": "type",
          "description": "Type of transaction",
          "data_type": "enum",
          "enum_options": [
            {"name": "deposit", "description": "Incoming deposit"},
            {"name": "withdrawal", "description": "Outgoing withdrawal"},
            {"name": "fee", "description": "Bank fee or charge"},
            {"name": "transfer", "description": "Transfer between accounts"},
            {"name": "interest", "description": "Interest earned or charged"}
          ]
        }
      ]
    }
  ]
}

Process a Document

Process multiple statements through the same workflow.
# Upload a statement
curl -X POST 'https://api.anyformat.ai/v2/workflows/YOUR_WORKFLOW_ID/run/' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@statement-march.pdf'

Get Results in Bulk

After statements are processed, use the workflow results endpoint to retrieve all results as unified JSON.

JSON Export

# Get all results as unified JSON
response = requests.get(
    f"https://api.anyformat.ai/v2/workflows/{WORKFLOW_ID}/results/",
    headers=headers
)

data = response.json()

# Each key is a filename
for filename, file_data in data.items():
    extraction = file_data["results"].get("extraction", {})
    holder = extraction.get("account_holder", {}).get("value", "Unknown")
    closing = extraction.get("closing_balance", {}).get("value", "N/A")
    print(f"{filename}{holder}: closing balance ${closing}")
# Or via curl
curl "https://api.anyformat.ai/v2/workflows/YOUR_WORKFLOW_ID/results/" \
  -H "Authorization: Bearer YOUR_API_KEY"
To retrieve results for a single file within the workflow, use the file_id query parameter:
curl "https://api.anyformat.ai/v2/workflows/YOUR_WORKFLOW_ID/results/?file_id=FILE_UUID" \
  -H "Authorization: Bearer YOUR_API_KEY"

Single File Results

You can also get results for an individual statement by polling its file ID.
# Poll a single file's results
max_attempts = 60
base_delay = 5

for attempt in range(max_attempts):
    response = requests.get(
        f"https://api.anyformat.ai/v2/files/{file_id}/extraction/",
        headers=headers
    )

    if response.status_code == 200:
        results = response.json()
        break
    elif response.status_code == 412:
        delay = min(base_delay * (1.5 ** min(attempt, 5)), 30)
        time.sleep(delay)
    else:
        raise Exception(f"Error: {response.json()['detail']}")
else:
    raise TimeoutError("Processing timed out")

# Analyze transactions
for txn in results["transactions"]:
    if txn["type"]["value"] == "fee":
        print(f"Fee: {txn['description']['value']} — ${abs(txn['amount']['value'])}")

Example Response

{
  "account_holder": {"value": "John Smith", "confidence": 97},
  "account_number": {"value": "****4521", "confidence": 95},
  "statement_period_start": {"value": "2024-03-01", "confidence": 94},
  "statement_period_end": {"value": "2024-03-31", "confidence": 94},
  "opening_balance": {"value": 12450.00, "confidence": 93},
  "closing_balance": {"value": 14280.50, "confidence": 92},
  "total_deposits": {"value": 5200.00, "confidence": 90},
  "total_withdrawals": {"value": 3369.50, "confidence": 89},
  "transaction_count": {"value": 23, "confidence": 87},
  "transactions": [
    {
      "date": {"value": "2024-03-01", "confidence": 94},
      "description": {"value": "Direct Deposit - Acme Corp", "confidence": 91},
      "amount": {"value": 3500.00, "confidence": 93},
      "type": {"value": "deposit", "confidence": 95}
    },
    {
      "date": {"value": "2024-03-05", "confidence": 93},
      "description": {"value": "Monthly service fee", "confidence": 90},
      "amount": {"value": -12.50, "confidence": 92},
      "type": {"value": "fee", "confidence": 96}
    },
    {
      "date": {"value": "2024-03-12", "confidence": 92},
      "description": {"value": "Transfer to Savings", "confidence": 88},
      "amount": {"value": -500.00, "confidence": 91},
      "type": {"value": "transfer", "confidence": 94}
    }
  ]
}

Tips

XLSX files typically yield better results than scanned PDF statements, since the data is structured in cells rather than requiring OCR.
The workflow results endpoint (/v2/workflows/{id}/results/) returns unified JSON with parse and extraction data for all files. Use the file_id parameter to filter to a single file.
  • Use float for all monetary values. Describe amounts as “positive for deposits, negative for withdrawals” to get consistent sign conventions.
  • The object type with nested_fields captures the full transaction table, with each row as an object in the array.
  • When uploading multiple statements, add a short delay between uploads to stay within rate limits (60 requests per minute).
  • integer for transaction_count gives you a quick sanity check against the number of extracted transactions.

Next Steps

Response Formats

Learn about the unified JSON response format

Workflow Results

Export all results for a workflow