Skip to main content

Resume Parsing

Extract structured candidate data from resumes and CVs including contact info, skills, education, and work history.

Workflow Fields

We recommend creating this workflow in the anyformat platform where you can test with sample resumes and iterate on field descriptions. Copy the workflow ID to use with the API.
FieldTypeDescription
candidate_namestringFull name of the candidate
emailstringEmail address
phonestringPhone number
skillslistTechnical and professional skills
years_of_experienceintegerTotal years of professional experience
educationobjectEducational background
work_historyobjectPrevious employment

Field Configuration

{
  "fields": [
    {"name": "candidate_name", "description": "Full name of the candidate", "data_type": "string"},
    {"name": "email", "description": "Email address", "data_type": "string"},
    {"name": "phone", "description": "Phone number including country code if present", "data_type": "string"},
    {"name": "skills", "description": "List of technical skills, programming languages, tools, and professional competencies", "data_type": "list"},
    {"name": "years_of_experience", "description": "Total years of professional work experience", "data_type": "integer"},
    {
      "name": "education",
      "description": "Educational qualifications and degrees",
      "data_type": "object",
      "nested_fields": [
        {"name": "institution", "description": "University or school name", "data_type": "string"},
        {"name": "degree", "description": "Degree obtained (e.g., BSc Computer Science, MBA)", "data_type": "string"},
        {"name": "graduation_date", "description": "Date of graduation", "data_type": "date"}
      ]
    },
    {
      "name": "work_history",
      "description": "Previous jobs and roles, most recent first",
      "data_type": "object",
      "nested_fields": [
        {"name": "company", "description": "Company name", "data_type": "string"},
        {"name": "title", "description": "Job title", "data_type": "string"},
        {"name": "start_date", "description": "Start date of employment", "data_type": "date"},
        {"name": "end_date", "description": "End date of employment, or empty if current role", "data_type": "date"}
      ]
    }
  ]
}

Process a Document

curl -X POST 'https://api.anyformat.ai/v2/workflows/YOUR_WORKFLOW_ID/run/' \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -F 'file=@resume.docx'

Get Results

# Poll for results
max_attempts = 60
base_delay = 5

for attempt in range(max_attempts):
    response = requests.get(
        f"https://api.anyformat.ai/v2/files/{file_id}/extraction/",
        headers={"Authorization": f"Bearer {API_KEY}"}
    )

    if response.status_code == 200:
        results = response.json()
        break
    elif response.status_code == 412:
        delay = min(base_delay * (1.5 ** min(attempt, 5)), 30)
        time.sleep(delay)
    else:
        raise Exception(f"Error: {response.json()['detail']}")
else:
    raise TimeoutError("Processing timed out")

# Use extracted data
print(f"Candidate: {results['candidate_name']['value']}")
print(f"Email: {results['email']['value']}")
print(f"Skills: {', '.join(results['skills']['value'])}")
print(f"Experience: {results['years_of_experience']['value']} years")

for job in results["work_history"]:
    print(f"  - {job['title']['value']} at {job['company']['value']}")

Example Response

{
  "candidate_name": {"value": "Sarah Chen", "confidence": 98},
  "email": {"value": "sarah.chen@email.com", "confidence": 97},
  "phone": {"value": "+1-555-0142", "confidence": 93},
  "skills": {"value": ["Python", "TypeScript", "AWS", "PostgreSQL", "Docker", "React", "FastAPI"], "confidence": 88},
  "years_of_experience": {"value": 7, "confidence": 85},
  "education": [
    {
      "institution": {"value": "MIT", "confidence": 96},
      "degree": {"value": "BSc Computer Science", "confidence": 94},
      "graduation_date": {"value": "2017-06-15", "confidence": 90}
    }
  ],
  "work_history": [
    {
      "company": {"value": "Stripe", "confidence": 97},
      "title": {"value": "Senior Software Engineer", "confidence": 95},
      "start_date": {"value": "2021-03-01", "confidence": 88},
      "end_date": {"value": null, "confidence": 82}
    },
    {
      "company": {"value": "Datadog", "confidence": 96},
      "title": {"value": "Software Engineer", "confidence": 94},
      "start_date": {"value": "2018-01-15", "confidence": 87},
      "end_date": {"value": "2021-02-28", "confidence": 85}
    }
  ]
}

Tips

The list field type returns an array of values, which is ideal for skills, certifications, and languages. Be specific in the description about what counts as a “skill” to avoid capturing irrelevant items.
  • DOCX resumes typically yield better results than scanned PDFs since the text is natively accessible.
  • For end_date, describe it as “empty if current role” so processing returns null for current positions.
  • years_of_experience as an integer gives you a number you can filter on directly without parsing.

Next Steps

Run Workflow

See all input methods: file upload and text

Field Types

Learn about list, object, and other field types