Careerbert - g开源模型 - 专为职业匹配与推荐系统设计的德语工具

首页

Careerbert G

由 lwolfrum2 开发

基于ESCO分类体系微调的德语句子转换器模型，专为职业匹配和推荐系统设计

文本嵌入

Transformers

德语#职业匹配 #ESCO分类 #德语NLP

下载量 49

发布时间 : 2/26/2025

模型简介

CareerBERT-G是基于deepset/gbert-base微调的句子转换器模型，专门用于将简历与ESCO职位分类匹配，支持职业咨询和职位推荐系统。

模型特点

职业匹配优化

专门针对职业咨询和职位推荐场景优化，能有效匹配简历与职位描述

ESCO分类整合

整合欧洲技能、能力与职业分类体系(ESCO)，提供标准化职业特征表示

两阶段验证

通过EURES职位广告和人力资源专家评估两阶段验证，确保模型实用性能

模型能力

句子嵌入生成

文本相似度计算

职业特征提取

简历与职位匹配

使用案例

职业咨询

简历与职位匹配

将求职者简历与ESCO职位分类进行匹配，提供精准职位推荐

在人力资源专家评估中展现出稳健效能

就业服务

职位广告分析

分析EURES等平台的职位广告，提取标准化职业特征

超越传统和最先进的嵌入方法

🚀 CareerBERT - G

CareerBERT - G是一个经过微调的句子转换器模型，它基于ESCO职业分类体系进行训练。该模型以deepset/gbert - base为基础模型，可用于计算句子相似度、特征提取等任务。

🚀 快速开始

本模型可通过sentence - transformers库或HuggingFace Transformers库使用。下面分别介绍两种使用方式。

📦 安装指南

若使用sentence - transformers库，需先安装该库：

pip install -U sentence-transformers

💻 使用示例

基础用法（Sentence - Transformers）

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

高级用法（HuggingFace Transformers）

若不使用sentence - transformers库，可按以下步骤使用模型：首先将输入数据传入Transformer模型，然后对上下文词嵌入应用合适的池化操作。

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 详细文档

评估结果

若要对该模型进行自动化评估，可参考句子嵌入基准测试：https://seb.sbert.net

训练细节

该模型使用以下参数进行训练：

数据加载器： torch.utils.data.dataloader.DataLoader，长度为3695，参数如下：

{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

损失函数： sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss，参数如下：
```
{'scale': 20.0, 'similarity_fct': 'cos_sim'}
```

fit()方法的参数：

{
    "epochs": 1,
    "evaluation_steps": 0,
    "evaluator": "sentence_transformers.evaluation.RerankingEvaluator.RerankingEvaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 11821.1,
    "weight_decay": 0.01
}

完整模型架构

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

引用与作者

若使用该模型，请引用以下论文：

@article{ROSENBERGER2025127043,
title = {CareerBERT: Matching resumes to ESCO jobs in a shared embedding space for generic job recommendations},
journal = {Expert Systems with Applications},
volume = {275},
pages = {127043},
year = {2025},
issn = {0957-4174},
doi = {https://doi.org/10.1016/j.eswa.2025.127043},
url = {https://www.sciencedirect.com/science/article/pii/S0957417425006657},
author = {Julian Rosenberger and Lukas Wolfrum and Sven Weinzierl and Mathias Kraus and Patrick Zschech},
keywords = {Job consultation, Job markets, Job recommendation system, BERT, NLP},
abstract = {The rapidly evolving labor market, driven by technological advancements and economic shifts, presents significant challenges for traditional job matching and consultation services. In response, we introduce an advanced support tool for career counselors and job seekers based on CareerBERT, a novel approach that leverages the power of unstructured textual data sources, such as resumes, to provide more accurate and comprehensive job recommendations. In contrast to previous approaches that primarily focus on job recommendations based on a fixed set of concrete job advertisements, our approach involves the creation of a corpus that combines data from the European Skills, Competences, and Occupations (ESCO) taxonomy and EURopean Employment Services (EURES) job advertisements, ensuring an up-to-date and well-defined representation of general job titles in the labor market. Our two-step evaluation approach, consisting of an application-grounded evaluation using EURES job advertisements and a human-grounded evaluation using real-world resumes and Human Resources (HR) expert feedback, provides a comprehensive assessment of CareerBERT’s performance. Our experimental results demonstrate that CareerBERT outperforms both traditional and state-of-the-art embedding approaches while showing robust effectiveness in human expert evaluations. These results confirm the effectiveness of CareerBERT in supporting career consultants by generating relevant job recommendations based on resumes, ultimately enhancing the efficiency of job consultations and expanding the perspectives of job seekers. This research contributes to the field of NLP and job recommendation systems, offering valuable insights for both researchers and practitioners in the domain of career consulting and job matching.}
}