How I Used Machine Learning to Decide Which Jobs to Apply For

5 min read1 day ago

In the age of automation, where algorithms dictate everything from what we watch to who we match with, job hunting remains stubbornly manual. Scrolling through endless job listings, tweaking resumes, and second-guessing every application — it’s a process both frustrating and inefficient. But what if we could shift the burden to machines?
As a Data Scientist, I believe in solving problems with data, and my own job search was no exception. Instead of sifting through countless job descriptions, hoping to stumble upon the right one, I built an ML-driven system to do it for me. This wasn’t just an experiment; it was a necessity. With too many jobs and too little time, I needed a smarter way to apply, one that ensured every application counted.
This is the story of how I used Machine Learning to cut through the noise, rank job opportunities, and make my job search more efficient.

Introduction

Finding a job can be overwhelming, especially when sifting through thousands of job postings across multiple platforms. The key challenge is not only crafting a strong resume but also applying to the right jobs that align with your skills and experience. Many job seekers waste time applying to roles that aren’t the best fit simply because they lack a systematic approach to filtering opportunities.

Jobs are typically applied for in two major ways:

Directly through job portals.
Via company websites, with redirection from job portals.

Being a Data Scientist with experience in Natural Language Processing (NLP) and Machine Learning (ML), I decided to use my skills to simplify my job search process. Instead of manually reviewing each job description (JD) to determine if it matched my profile, I built a system that automated this filtering process using various text similarity techniques.

The Problem: Too Many Jobs, Too Little Time

When searching for jobs, candidates often apply to many listings without knowing whether they are truly a good match. In my case, manual screening led me to believe that about 35% of the jobs I found were relevant to my profile. However, after implementing an ML-driven approach, I was able to filter my options down to only 14%, drastically improving my efficiency.

How Can Skills Be Used Beyond Work?

ML and Data Science are often thought of as technical tools for work, but they can also simplify everyday tasks. Just as data-driven decision-making helps businesses grow, it can also help individuals make better personal choices. My job search optimization is just one example of applying these techniques to real-life problems.

The Approach: Comparing Job Descriptions with My Resume

To determine job relevance, I used four different text similarity techniques:

Semantic Similarity (Sentence Transformers)
TF-IDF Similarity (Keyword Matching)
Jaccard Similarity (Word Overlap)
NER-Based Similarity (Named Entity Recognition)

Each method evaluates the match between my resume and job descriptions differently, and by combining them, I achieved a more accurate ranking system.

Step 1: Install Dependencies & load them

!pip install sentence-transformers scikit-learn spacy fitz numpy
!python -m spacy download en_core_web_sm

import numpy as np
import spacy
import fitz  # PyMuPDF
import os
import json
from sentence_transformers import SentenceTransformer
from scipy.spatial.distance import cosine
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load Sentence Transformer model (for Semantic Similarity)
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Load English Named Entity Recognition (NER) model
nlp = spacy.load("en_core_web_sm")

1. Semantic Similarity

Used a pre-trained Sentence Transformer model to encode both job descriptions and my resume into numerical vectors.
Compared these vectors using cosine similarity, which measures how similar two texts are based on their meanings.

def semantic_similarity(text1: str, text2: str) -> float:
    """Computes semantic similarity using Sentence Transformers."""
    embedding1 = model.encode(text1)
    embedding2 = model.encode(text2)
    return 1 - cosine(embedding1, embedding2)  # Cosine similarity

2. TF-IDF Similarity

Applied TF-IDF (Term Frequency-Inverse Document Frequency) to extract important keywords from both my resume and job descriptions.
Measured cosine similarity between TF-IDF vectors to find common key terms.

def tfidf_similarity(text1: str, text2: str) -> float:
    """Computes keyword-based similarity using TF-IDF."""
    vectorizer = TfidfVectorizer(stop_words='english')
    tfidf_matrix = vectorizer.fit_transform([text1, text2])
    return cosine_similarity(tfidf_matrix[0], tfidf_matrix[1])[0][0]

3. Jaccard Similarity

Compared sets of words from both texts to measure overlap.
Formula: Jaccard Similarity = (Intersection of words) / (Union of words)
Helps in checking direct term matches, ensuring key job-related words appear in my resume.

def jaccard_similarity(text1: str, text2: str) -> float:
    """Computes Jaccard similarity based on word overlap."""
    words1 = set(text1.lower().split())
    words2 = set(text2.lower().split())
    intersection = len(words1 & words2)
    union = len(words1 | words2)
    return round(intersection / union, 4) if union != 0 else 0

4. Named Entity Recognition (NER) Similarity

Used spaCy’s NER model to extract named entities like skills, job titles, companies, and locations from job descriptions.
Measured how many entities were common between my resume and the job description.

def extract_named_entities(text):
    """Extracts named entities related to jobs from text."""
    doc = nlp(text)
    entities = [ent.text for ent in doc.ents if ent.label_ in {"ORG", "PERSON", "GPE", "NORP", "SKILL", "JOB_TITLE"}]
    return set(entities)

def ner_similarity(text1, text2):
    """Calculates Named Entity Overlap Similarity."""
    entities1 = extract_named_entities(text1)
    entities2 = extract_named_entities(text2)
    
    if not entities1 or not entities2:
        return 0  # No entities found

    intersection = len(entities1 & entities2)
    union = len(entities1 | entities2)
    return round(intersection / union, 4) if union != 0 else 0

Combining the Scores for Final Decision

Each similarity score contributes to the final ranking based on weighted importance:

Semantic Similarity: 40%
TF-IDF Similarity: 25%
Jaccard Similarity: 20%
NER Similarity: 15%

def combined_similarity(jd: str, resume: str) -> dict:
    """Combines multiple similarity measures for better matching."""
    sim_semantic = semantic_similarity(jd, resume)
    sim_tfidf = tfidf_similarity(jd, resume)
    sim_jaccard = jaccard_similarity(jd, resume)
    sim_ner = ner_similarity(jd, resume)  # Named Entity Recognition Similarity

    # Weighted Average (Adjust weights as needed)
    final_score = (0.4 * sim_semantic) + (0.25 * sim_tfidf) + (0.2 * sim_jaccard) + (0.15 * sim_ner)

    return {
        "Semantic Similarity": round(sim_semantic, 4),
        "TF-IDF Similarity": round(sim_tfidf, 4),
        "Jaccard Similarity": round(sim_jaccard, 4),
        "NER Similarity": round(sim_ner, 4),
        "Final Score": round(final_score, 4)
    }

The final score helped me determine whether a job was worth applying to. The result? A more efficient job search where I focused only on high-matching roles rather than sifting through irrelevant ones.

def pdf_to_text(pdf_path: str) -> str:
    doc = fitz.open(pdf_path)
    text = "\n".join([page.get_text("text") for page in doc])
    return text.strip()

pdf_path = "Resume.pdf"
pdf_text = pdf_to_text(pdf_path) #you can define this function as needed

jd_path = "job_description.txt"



with open(jd_text, "r", encoding="utf-8") as f:
    jd_text = json.load(f)

final_score = combined_similarity(jd_text, pdf_text)

print(final_score)

The Impact: Reducing My Job Search Pool

Before applying my ML-based filtering, I considered 35% of job postings relevant. However, after implementing this automated system, the match rate dropped to 14%, allowing me to apply only to the most relevant jobs. This saved time and increased my chances of landing the right role.

Conclusion

Using data science and machine learning in real-life problems, even beyond work, can significantly improve efficiency. Whether it’s job searching, automating tedious tasks, or making data-driven decisions in daily life, these skills are powerful. If you’re job hunting, consider leveraging ML techniques to streamline the process — you might just find your dream job faster!

With the right tweaks, you can obtain reasons for each score, matching key points. Additionally, you can automate this entire process of crawling through job portals using Selenium, apply this processing, and generate your list of jobs to apply for each day without manual checking, saving you valuable time.