A Resume Parser (CV parser) is a tool used in human resource software, on recruitment websites, job boards, and candidate application portals.
- Its primary purpose is to simplify and accelerate the application process.
- It works by extracting and classifying thousands of attributes about a candidate from their resume.
- A resume parser also forms the basis for a semantic search of candidate data.
- It identifies different types of information within a resume or CV and tags each data point, such as Work Experience, Educational Background, Skills, and Personal Details. A typical resume contains this type of information, presented in various formats like tables, multiple lines, or sections.
The Resume Parser is available on the top right of the header.
Sense CRM's Resume Parsing Process
Sense CRM's resume parsing involves two essential steps:
- Text (Content) Extraction
- In this initial step, the content of the resume is extracted from uploaded files.
- Sense CRM supports various raw document formats like .pdf, .doc, and .docx.
- Information Extraction
- This is the subsequent step where structured information is extracted from unstructured or semi-structured machine-readable documents.
- Deep Learning Algorithms in NLP (Natural Language Processing) are used to extract information from the resume content.
- Sense CRM has trained a custom Deep Learning NER (Named Entity Recognition) model. This model is based on Google's BERT language model and was trained using over 100,000 resumes.
Named Entity Recognition (NER) in Sense CRM
- NER plays a crucial role in fetching information from the extracted content.
- It works by locating and classifying named entities within the unstructured text into predefined categories. Examples of such categories include person names, organizations, and locations. These categories are part of Sense CRM's customized NER model.
- The model understands context. For instance, "IIT Kanpur" can be identified differently based on the surrounding text.
- In the statement "2000–2008: Professor at IIT Kanpur," "IIT Kanpur" is tagged as a Professional Organization because "Professor" is identified as a Job Title.
- In the statement "B.Tech in Computer Science from IIT Kanpur," "IIT Kanpur" is tagged as an Educational Organization because a degree and major are mentioned.
- The NER model results in each word having a corresponding label. Examples of labels used by the Sense CRM NER model include:
- TIT - Designation
- COM - Professional Organization
- INS - Educational Institute
- DEG - Degree
- OTH - Other
Improving Accuracy: Post-Processing Algorithms
- Sense CRM recognizes that relying solely on the NER model will not always achieve high accuracy.
- Therefore, Sense CRM has developed post-processing algorithms using NLP to refine and "sanitize" the information extracted from resumes.
Limitations of Resume Parsing
- Achieving 100% accuracy in AI, including resume parsing, is impossible even with advancements in Deep Learning and NLP.
- Improving model accuracy is an ongoing effort, dependent on factors like the size of the training data and training time. Sense CRM continuously tests and updates its algorithms to enhance overall parsing quality.
- Sense CRM's parsing accuracy may be affected in specific scenarios:
- Complex resumes with multiple vertical/horizontal sections or numerous tables.
- Resumes containing inconsistent patterns, tabs, or whitespaces.
- Resumes with images, diagrams, art, etc..
- Resumes created from screenshots, scanned copies, photographs, etc..
- When the candidate provides wrong information or uses an incorrect format.
Sense CRM's Structured Resume Format
- After parsing, the resume information is provided in a structured format.
- The structured resume generated by the Sense CRM parser includes a range of specific fields:
- Name (First name, Middle name, Last name)
- Emails
- Phone Number
- Total Number of Working Experience (Years)
- Education Detail(s) (Institute, Degree, Major, Start/End year & month, Whether current institute, Description/summary, Grades)
- Experience Detail(s) (Company, Job Title, Years of Experience, Start/End year & month, Industry, Whether current company [BOOLEAN], Description/Summary, Location)
- Skills (Core skills, All skills, Functional & Behavioural skills)
- Current Company
- Current Location
- Current Job Title
- Latest Institute
- Latest Degree
- Latest Major
- Highest Degree
- Highest Major
- Profile / Social links (e.g., LinkedIn / GitHub profile, website link, etc.) [LIST]
- Overall summary/description
This structured format makes the extracted candidate data easily searchable and usable within the system.