Careerbert - g Open - Source Model: A German Tool Specially Designed for Career Matching and Recommendation Systems

Careerbert G

Developed by lwolfrum2

A German sentence transformer model fine-tuned based on the ESCO classification system, specifically designed for career matching and recommendation systems

Text Embedding

Transformers

German#Career Matching #ESCO Classification #German NLP

Downloads 49

Release Time : 2/26/2025

Model Overview

CareerBERT-G is a sentence transformer model fine-tuned from deepset/gbert-base, specifically designed to match resumes with ESCO job classifications, supporting career counseling and job recommendation systems.

Model Features

Career Matching Optimization

Optimized specifically for career counseling and job recommendation scenarios, effectively matching resumes with job descriptions

ESCO Classification Integration

Integrates the European Skills, Competences, and Occupations (ESCO) classification system, providing standardized occupational feature representation

Two-Stage Validation

Validated through EURES job advertisements and human resources expert evaluations in two stages to ensure practical performance

Model Capabilities

Sentence Embedding Generation

Text Similarity Calculation

Occupational Feature Extraction

Resume and Job Matching

Use Cases

Career Counseling

Resume and Job Matching

Matching job seekers' resumes with ESCO job classifications to provide precise job recommendations

Demonstrated robust performance in human resources expert evaluations

Employment Services

Job Advertisement Analysis

Analyzing job advertisements on platforms like EURES to extract standardized occupational features

Outperformed traditional and state-of-the-art embedding methods

🚀 CareerBERT-G

A Sentence Transformer Model finetuned on ESCO Taxonomy, which can be used for sentence similarity tasks.

🚀 Quick Start

CareerBERT-G is a Sentence Transformer Model finetuned on ESCO Taxonomy. Its base model is deepset/gbert-base. You can refer to the corresponding paper here.

✨ Features

Sentence Similarity: It can be used to calculate the similarity between sentences, which is useful for tasks like sentence retrieval and clustering.
Feature Extraction: Extracts high - quality sentence embeddings for various NLP applications.

📦 Installation

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

Advanced Usage

Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling - operation on - top of the contextualized word embeddings.

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 Documentation

Evaluation Results

For an automated evaluation of this model, see the Sentence Embeddings Benchmark: https://seb.sbert.net

Training

The model was trained with the following parameters:

DataLoader: torch.utils.data.dataloader.DataLoader of length 3695 with parameters:

{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

Loss: sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss with parameters:

{'scale': 20.0, 'similarity_fct': 'cos_sim'}

Parameters of the fit()-Method:

{
    "epochs": 1,
    "evaluation_steps": 0,
    "evaluator": "sentence_transformers.evaluation.RerankingEvaluator.RerankingEvaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 11821.1,
    "weight_decay": 0.01
}

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

📄 License

No license information provided in the original document.

📚 Citing & Authors

If you use this model in your research, please cite the following paper:

@article{ROSENBERGER2025127043,
title = {CareerBERT: Matching resumes to ESCO jobs in a shared embedding space for generic job recommendations},
journal = {Expert Systems with Applications},
volume = {275},
pages = {127043},
year = {2025},
issn = {0957-4174},
doi = {https://doi.org/10.1016/j.eswa.2025.127043},
url = {https://www.sciencedirect.com/science/article/pii/S0957417425006657},
author = {Julian Rosenberger and Lukas Wolfrum and Sven Weinzierl and Mathias Kraus and Patrick Zschech},
keywords = {Job consultation, Job markets, Job recommendation system, BERT, NLP},
abstract = {The rapidly evolving labor market, driven by technological advancements and economic shifts, presents significant challenges for traditional job matching and consultation services. In response, we introduce an advanced support tool for career counselors and job seekers based on CareerBERT, a novel approach that leverages the power of unstructured textual data sources, such as resumes, to provide more accurate and comprehensive job recommendations. In contrast to previous approaches that primarily focus on job recommendations based on a fixed set of concrete job advertisements, our approach involves the creation of a corpus that combines data from the European Skills, Competences, and Occupations (ESCO) taxonomy and EURopean Employment Services (EURES) job advertisements, ensuring an up-to-date and well-defined representation of general job titles in the labor market. Our two-step evaluation approach, consisting of an application-grounded evaluation using EURES job advertisements and a human-grounded evaluation using real-world resumes and Human Resources (HR) expert feedback, provides a comprehensive assessment of CareerBERT’s performance. Our experimental results demonstrate that CareerBERT outperforms both traditional and state-of-the-art embedding approaches while showing robust effectiveness in human expert evaluations. These results confirm the effectiveness of CareerBERT in supporting career consultants by generating relevant job recommendations based on resumes, ultimately enhancing the efficiency of job consultations and expanding the perspectives of job seekers. This research contributes to the field of NLP and job recommendation systems, offering valuable insights for both researchers and practitioners in the domain of career consulting and job matching.}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご