CareerBERT-JG开源求职模型 - 支持职业咨询与精准求职推荐

首页

Careerbert Jg

由 lwolfrum2 开发

CareerBERT-JG是基于ESCO分类法微调的句子转换器模型，专为职业咨询和求职推荐场景设计。

文本嵌入德语#职业推荐 #ESCO分类 #简历匹配

下载量 309

发布时间 : 2/26/2025

模型简介

该模型以agne/jobGBERT为基础，能够计算句子相似度，支持职业咨询和求职推荐等应用。

模型特点

ESCO分类法微调

专门在欧洲技能、能力和职业分类体系上微调，适合欧洲就业市场分析

职业嵌入空间

将简历和职位描述映射到共享的嵌入空间，实现精准匹配

高效池化处理

采用均值池化方法处理词嵌入，考虑注意力掩码确保准确性

模型能力

句子嵌入生成

文本相似度计算

职业相关性分析

简历与职位匹配

使用案例

职业咨询

简历职位匹配

根据求职者简历内容推荐最相关的ESCO职业分类

在专家评估中表现出优于传统方法的匹配效果

职业发展建议

分析现有技能与目标职位要求的匹配度

帮助职业顾问提供数据驱动的建议

招聘系统

自动化简历筛选

快速匹配大量简历与职位要求

提高HR部门工作效率

🚀 CareerBERT-JG

CareerBERT-JG是一个在ESCO分类法上微调的句子转换器模型。它以agne/jobGBERT为基础模型，可用于计算句子相似度，为职业咨询和求职推荐等场景提供支持。

🚀 快速开始

本模型支持使用sentence-transformers库或HuggingFace Transformers库调用，下面为你详细介绍使用方法。

📦 安装指南

若要使用sentence-transformers库调用模型，你需要先安装它：

pip install -U sentence-transformers

💻 使用示例

基础用法（Sentence-Transformers）

当你安装了sentence-transformers库后，使用该模型会变得非常简单：

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

高级用法（HuggingFace Transformers）

若未安装sentence-transformers库，你可以按以下方式使用模型：首先，将输入数据传入Transformer模型，然后对上下文词嵌入应用合适的池化操作。

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 详细文档

评估结果

若要对该模型进行自动评估，请参考句子嵌入基准测试：https://seb.sbert.net

训练信息

该模型使用以下参数进行训练：

数据加载器： torch.utils.data.dataloader.DataLoader，长度为3695，参数如下：

{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

损失函数： sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss，参数如下：

{'scale': 20.0, 'similarity_fct': 'cos_sim'}

fit()方法的参数：

{
    "epochs": 1,
    "evaluation_steps": 0,
    "evaluator": "sentence_transformers.evaluation.RerankingEvaluator.RerankingEvaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 11821.1,
    "weight_decay": 0.01
}

完整模型架构

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

引用与作者

如果你在研究中使用了该模型，请引用以下论文：

@article{ROSENBERGER2025127043,
title = {CareerBERT: Matching resumes to ESCO jobs in a shared embedding space for generic job recommendations},
journal = {Expert Systems with Applications},
volume = {275},
pages = {127043},
year = {2025},
issn = {0957-4174},
doi = {https://doi.org/10.1016/j.eswa.2025.127043},
url = {https://www.sciencedirect.com/science/article/pii/S0957417425006657},
author = {Julian Rosenberger and Lukas Wolfrum and Sven Weinzierl and Mathias Kraus and Patrick Zschech},
keywords = {Job consultation, Job markets, Job recommendation system, BERT, NLP},
abstract = {The rapidly evolving labor market, driven by technological advancements and economic shifts, presents significant challenges for traditional job matching and consultation services. In response, we introduce an advanced support tool for career counselors and job seekers based on CareerBERT, a novel approach that leverages the power of unstructured textual data sources, such as resumes, to provide more accurate and comprehensive job recommendations. In contrast to previous approaches that primarily focus on job recommendations based on a fixed set of concrete job advertisements, our approach involves the creation of a corpus that combines data from the European Skills, Competences, and Occupations (ESCO) taxonomy and EURopean Employment Services (EURES) job advertisements, ensuring an up-to-date and well-defined representation of general job titles in the labor market. Our two-step evaluation approach, consisting of an application-grounded evaluation using EURES job advertisements and a human-grounded evaluation using real-world resumes and Human Resources (HR) expert feedback, provides a comprehensive assessment of CareerBERTâ€™s performance. Our experimental results demonstrate that CareerBERT outperforms both traditional and state-of-the-art embedding approaches while showing robust effectiveness in human expert evaluations. These results confirm the effectiveness of CareerBERT in supporting career consultants by generating relevant job recommendations based on resumes, ultimately enhancing the efficiency of job consultations and expanding the perspectives of job seekers. This research contributes to the field of NLP and job recommendation systems, offering valuable insights for both researchers and practitioners in the domain of career consulting and job matching.}
}