Resume Parsing

Extract structured candidate data from resumes and CVs including contact info, skills, education, and work history.

Workflow Fields

We recommend creating this workflow in the anyformat platform where you can test with sample resumes and iterate on field descriptions. Copy the workflow ID to use with the API.

Field	Type	Description
`candidate_name`	string	Full name of the candidate
`email`	string	Email address
`phone`	string	Phone number
`skills`	list	Technical and professional skills
`years_of_experience`	integer	Total years of professional experience
`education`	object	Educational background
`work_history`	object	Previous employment

Field Configuration

{
  "fields": [
    {"name": "candidate_name", "description": "Full name of the candidate", "data_type": "string"},
    {"name": "email", "description": "Email address", "data_type": "string"},
    {"name": "phone", "description": "Phone number including country code if present", "data_type": "string"},
    {"name": "skills", "description": "List of technical skills, programming languages, tools, and professional competencies", "data_type": "list"},
    {"name": "years_of_experience", "description": "Total years of professional work experience", "data_type": "integer"},
    {
      "name": "education",
      "description": "Educational qualifications and degrees",
      "data_type": "object",
      "nested_fields": [
        {"name": "institution", "description": "University or school name", "data_type": "string"},
        {"name": "degree", "description": "Degree obtained (e.g., BSc Computer Science, MBA)", "data_type": "string"},
        {"name": "graduation_date", "description": "Date of graduation", "data_type": "date"}
      ]
    },
    {
      "name": "work_history",
      "description": "Previous jobs and roles, most recent first",
      "data_type": "object",
      "nested_fields": [
        {"name": "company", "description": "Company name", "data_type": "string"},
        {"name": "title", "description": "Job title", "data_type": "string"},
        {"name": "start_date", "description": "Start date of employment", "data_type": "date"},
        {"name": "end_date", "description": "End date of employment, or empty if current role", "data_type": "date"}
      ]
    }
  ]
}

Process a Document

curl -X POST 'https://api.anyformat.ai/v2/workflows/YOUR_WORKFLOW_ID/run/' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@resume.docx'

Get Results

# Poll for results
max_attempts = 60
base_delay = 5

for attempt in range(max_attempts):
    response = requests.get(
        f"https://api.anyformat.ai/v2/files/{file_id}/extraction/",
        headers={"Authorization": f"Bearer {API_KEY}"}
    )

    if response.status_code == 200:
        results = response.json()
        break
    elif response.status_code == 412:
        delay = min(base_delay * (1.5 ** min(attempt, 5)), 30)
        time.sleep(delay)
    else:
        raise Exception(f"Error: {response.json()['detail']}")
else:
    raise TimeoutError("Processing timed out")

# Use extracted data
print(f"Candidate: {results['candidate_name']['value']}")
print(f"Email: {results['email']['value']}")
print(f"Skills: {', '.join(results['skills']['value'])}")
print(f"Experience: {results['years_of_experience']['value']} years")

for job in results["work_history"]:
    print(f"  - {job['title']['value']} at {job['company']['value']}")

Example Response

{
  "candidate_name": {"value": "Sarah Chen", "confidence": 98},
  "email": {"value": "sarah.chen@email.com", "confidence": 97},
  "phone": {"value": "+1-555-0142", "confidence": 93},
  "skills": {"value": ["Python", "TypeScript", "AWS", "PostgreSQL", "Docker", "React", "FastAPI"], "confidence": 88},
  "years_of_experience": {"value": 7, "confidence": 85},
  "education": [
    {
      "institution": {"value": "MIT", "confidence": 96},
      "degree": {"value": "BSc Computer Science", "confidence": 94},
      "graduation_date": {"value": "2017-06-15", "confidence": 90}
    }
  ],
  "work_history": [
    {
      "company": {"value": "Stripe", "confidence": 97},
      "title": {"value": "Senior Software Engineer", "confidence": 95},
      "start_date": {"value": "2021-03-01", "confidence": 88},
      "end_date": {"value": null, "confidence": 82}
    },
    {
      "company": {"value": "Datadog", "confidence": 96},
      "title": {"value": "Software Engineer", "confidence": 94},
      "start_date": {"value": "2018-01-15", "confidence": 87},
      "end_date": {"value": "2021-02-28", "confidence": 85}
    }
  ]
}

Tips

The list field type returns an array of values, which is ideal for skills, certifications, and languages. Be specific in the description about what counts as a “skill” to avoid capturing irrelevant items.

DOCX resumes typically yield better results than scanned PDFs since the text is natively accessible.
For end_date, describe it as “empty if current role” so processing returns null for current positions.
years_of_experience as an integer gives you a number you can filter on directly without parsing.

Overview

SDKs

Endpoints

Resume Parsing

Resume Parsing

Workflow Fields

Field Configuration

Process a Document

Get Results

Example Response

Tips

Next Steps

Run Workflow

Field Types

Overview

SDKs

Endpoints

​Resume Parsing

​Workflow Fields

​Field Configuration

​Process a Document

​Get Results

​Example Response

​Tips

​Next Steps

Run Workflow

Field Types

Resume Parsing

Workflow Fields

Field Configuration

Process a Document

Get Results

Example Response

Tips

Next Steps