★ Overview

A resume parser (CV parser) is used within human resource software and on recruitment websites, job boards, and candidate application portals to simplify and accelerate the application process. It does so by extracting and classifying thousands of attributes about the candidate.

A resume parser also provides the foundation for a semantic search of candidate data. The parser identifies different kinds of information within a resume or CV and tags each data point (for example, Work Experience, Educational Background, Skills, and Personal Details of the candidate).

Resume Parsing on Sense TRM has two essential steps:

[A] Text (Content) Extraction

[B] Information Extraction

Below is an example of how the resume parser works from reading a candidate's resume in unstructured format to Sense TRM AI's readable format:

[A] Text (Content) Extraction

In the Content Extraction step, the resume content is extracted from the uploaded files (from various formats of raw documents .pdf, .doc, .docx, etc. formats).

[B] Information Extraction

The next step in resume parsing is extracting structured information from unstructured or semi-structured machine-readable documents.

A typical resume is a collection of information that includes a candidate's Work Experience, Educational Background, Skills, and Personal Details. And this information will be presented on a resume in various formats: tables, multiple lines, sections, etc.

Deep Learning Algorithms in NLP (Natural Language Processing) help extract information from the content of a resume. Sense TRM has trained a custom Deep Learning NER (Named Entity Recognition) model based on Google’s BERT language model with the help of over 100000 resumes.

★ Understanding NER

Named Entity Recognition (NER) helps fetch the information from the extracted content. The NER locates and classifies the named entities in the unstructured text into predefined categories such as the person names, organizations, locations, etc. These are part of Sense TRM's customized NER model (AI).

Consider the below two statements:

‘2000–2008: Professor at IIT Kanpur.’

‘B.Tech in Computer Science from IIT Kanpur’

Here, IIT Kanpur will be treated as an Employer Organization in the former statement and as an Educational Institution later. Observing the context, we can differentiate between the two meanings of IIT Kanpur here.

1. The first statement has Professor, which is a Job Title. So, IIT Kanpur will be tagged as a Professional Organization.

2. The second one has a degree and major mentioned. So, IIT Kanpur will be tagged as an Educational Organization.

The below snippet is from our NER model results. It shows how the model can recognize and differentiate the different meanings of the phrase “IIT Kanpur” in various contexts. Each word has a corresponding label.

TIT - Designation

COM - Professional Organization

INS - Educational Institute

DEG - Degree

OTH - Other

Relying solely on the NER model will not yield high accuracy in all cases. Hence, Sense TRM has created post-processing algorithms using NLP for sanitizing the information extracted from the resumes.

★ Limitations of Resume Parser

Even with all the advancements and research into Deep Learning and other NLP technologies, achieving 100% accuracy in AI is impossible. Improving the accuracy of the models is a continuous process due to the size of the training data and the time for training.

Below are some of the cases where Sense TRM's parsing accuracy is not at its best -

Complex resumes with multiple vertical/horizontal sections and multiple tables.
Inconsistent patterns or tabs or whitespaces
Resumes with images, diagrams, art, etc.
Resumes created from screenshots, scanned copies, photographs, etc.
Wrong information or format furnished by the candidate.

Sense TRM keeps testing and updating these algorithms to improve the overall parsing quality.

★ Sense TRM's Structured Resume Format

The resume parser gets the parsed resume in a structured format. Provided below is the structured resume format generated by the Sense TRM resume parser.

The structured resume in Sense TRM will have the following fields captured:

Name
1. First name
2. Middle name
3. Last name
Emails
Phone Number
Total Number of Working Experience (Years)
Education Detail(s)
1. Institute
2. Degree
3. Major
4. Start year & month *Sense TRM counts month starting from 0 to 11
5. End year & month
6. Whether the current institute or not
7. Description/summary
8. Grades
Experience Detail(s)
1. Company
2. Job Title
3. Years of Experience
4. Start year & month
5. End year & month
6. Industry
7. Whether the current company or not [BOOLEAN]
8. Description/Summary
9. Location
Skills
1. Core skills
2. All skills
3. Functional & Behavioural skills
Current Company
Current Location
Current Job Title
Latest Institute
Latest Degree
Latest Major
Highest Degree
Highest Major
Profile / Social links (e.g., LinkedIn / GitHub profile, website link, etc.) [LIST]
Overall summary/description

Learn more about Sense TRM AI- Deduplication of Candidate Resumes | Matching Engine

Table of Contents

★ Overview

[A] Text (Content) Extraction

[B] Information Extraction

★ Understanding NER

★ Limitations of Resume Parser

★ Sense TRM's Structured Resume Format

Related articles