🚀 CareerBERT-G
A Sentence Transformer Model finetuned on ESCO Taxonomy, which can be used for sentence similarity tasks.
🚀 Quick Start
CareerBERT-G is a Sentence Transformer Model finetuned on ESCO Taxonomy. Its base model is deepset/gbert-base. You can refer to the corresponding paper here.
✨ Features
- Sentence Similarity: It can be used to calculate the similarity between sentences, which is useful for tasks like sentence retrieval and clustering.
- Feature Extraction: Extracts high - quality sentence embeddings for various NLP applications.
📦 Installation
Using this model becomes easy when you have sentence-transformers installed:
pip install -U sentence-transformers
💻 Usage Examples
Basic Usage
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)
Advanced Usage
Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling - operation on - top of the contextualized word embeddings.
from transformers import AutoTokenizer, AutoModel
import torch
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0]
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ['This is an example sentence', 'Each sentence is converted']
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
with torch.no_grad():
model_output = model(**encoded_input)
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
📚 Documentation
Evaluation Results
For an automated evaluation of this model, see the Sentence Embeddings Benchmark: https://seb.sbert.net
Training
The model was trained with the following parameters:
DataLoader:
torch.utils.data.dataloader.DataLoader
of length 3695 with parameters:
{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
Loss:
sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss
with parameters:
{'scale': 20.0, 'similarity_fct': 'cos_sim'}
Parameters of the fit()-Method:
{
"epochs": 1,
"evaluation_steps": 0,
"evaluator": "sentence_transformers.evaluation.RerankingEvaluator.RerankingEvaluator",
"max_grad_norm": 1,
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
"optimizer_params": {
"lr": 2e-05
},
"scheduler": "WarmupLinear",
"steps_per_epoch": null,
"warmup_steps": 11821.1,
"weight_decay": 0.01
}
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)
📄 License
No license information provided in the original document.
📚 Citing & Authors
If you use this model in your research, please cite the following paper:
@article{ROSENBERGER2025127043,
title = {CareerBERT: Matching resumes to ESCO jobs in a shared embedding space for generic job recommendations},
journal = {Expert Systems with Applications},
volume = {275},
pages = {127043},
year = {2025},
issn = {0957-4174},
doi = {https://doi.org/10.1016/j.eswa.2025.127043},
url = {https://www.sciencedirect.com/science/article/pii/S0957417425006657},
author = {Julian Rosenberger and Lukas Wolfrum and Sven Weinzierl and Mathias Kraus and Patrick Zschech},
keywords = {Job consultation, Job markets, Job recommendation system, BERT, NLP},
abstract = {The rapidly evolving labor market, driven by technological advancements and economic shifts, presents significant challenges for traditional job matching and consultation services. In response, we introduce an advanced support tool for career counselors and job seekers based on CareerBERT, a novel approach that leverages the power of unstructured textual data sources, such as resumes, to provide more accurate and comprehensive job recommendations. In contrast to previous approaches that primarily focus on job recommendations based on a fixed set of concrete job advertisements, our approach involves the creation of a corpus that combines data from the European Skills, Competences, and Occupations (ESCO) taxonomy and EURopean Employment Services (EURES) job advertisements, ensuring an up-to-date and well-defined representation of general job titles in the labor market. Our two-step evaluation approach, consisting of an application-grounded evaluation using EURES job advertisements and a human-grounded evaluation using real-world resumes and Human Resources (HR) expert feedback, provides a comprehensive assessment of CareerBERT’s performance. Our experimental results demonstrate that CareerBERT outperforms both traditional and state-of-the-art embedding approaches while showing robust effectiveness in human expert evaluations. These results confirm the effectiveness of CareerBERT in supporting career consultants by generating relevant job recommendations based on resumes, ultimately enhancing the efficiency of job consultations and expanding the perspectives of job seekers. This research contributes to the field of NLP and job recommendation systems, offering valuable insights for both researchers and practitioners in the domain of career consulting and job matching.}
}