Careerbert - g開源模型 - 專為職業匹配與推薦系統設計的德語工具

首頁

Careerbert G

由lwolfrum2開發

基於ESCO分類體系微調的德語句子轉換器模型，專為職業匹配和推薦系統設計

文本嵌入

Transformers

德語#職業匹配 #ESCO分類 #德語NLP

下載量 49

發布時間 : 2/26/2025

模型概述

CareerBERT-G是基於deepset/gbert-base微調的句子轉換器模型，專門用於將簡歷與ESCO職位分類匹配，支持職業諮詢和職位推薦系統。

模型特點

職業匹配優化

專門針對職業諮詢和職位推薦場景優化，能有效匹配簡歷與職位描述

ESCO分類整合

整合歐洲技能、能力與職業分類體系(ESCO)，提供標準化職業特徵表示

兩階段驗證

通過EURES職位廣告和人力資源專家評估兩階段驗證，確保模型實用性能

模型能力

句子嵌入生成

文本相似度計算

職業特徵提取

簡歷與職位匹配

使用案例

職業諮詢

簡歷與職位匹配

將求職者簡歷與ESCO職位分類進行匹配，提供精準職位推薦

在人力資源專家評估中展現出穩健效能

就業服務

職位廣告分析

分析EURES等平臺的職位廣告，提取標準化職業特徵

超越傳統和最先進的嵌入方法

🚀 CareerBERT - G

CareerBERT - G是一個經過微調的句子轉換器模型，它基於ESCO職業分類體系進行訓練。該模型以deepset/gbert - base為基礎模型，可用於計算句子相似度、特徵提取等任務。

🚀 快速開始

本模型可通過sentence - transformers庫或HuggingFace Transformers庫使用。下面分別介紹兩種使用方式。

📦 安裝指南

若使用sentence - transformers庫，需先安裝該庫：

pip install -U sentence-transformers

💻 使用示例

基礎用法（Sentence - Transformers）

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

高級用法（HuggingFace Transformers）

若不使用sentence - transformers庫，可按以下步驟使用模型：首先將輸入數據傳入Transformer模型，然後對上下文詞嵌入應用合適的池化操作。

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 詳細文檔

評估結果

若要對該模型進行自動化評估，可參考句子嵌入基準測試：https://seb.sbert.net

訓練細節

該模型使用以下參數進行訓練：

數據加載器： torch.utils.data.dataloader.DataLoader，長度為3695，參數如下：

{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

損失函數： sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss，參數如下：
```
{'scale': 20.0, 'similarity_fct': 'cos_sim'}
```

fit()方法的參數：

{
    "epochs": 1,
    "evaluation_steps": 0,
    "evaluator": "sentence_transformers.evaluation.RerankingEvaluator.RerankingEvaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 11821.1,
    "weight_decay": 0.01
}

完整模型架構

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

引用與作者

若使用該模型，請引用以下論文：

@article{ROSENBERGER2025127043,
title = {CareerBERT: Matching resumes to ESCO jobs in a shared embedding space for generic job recommendations},
journal = {Expert Systems with Applications},
volume = {275},
pages = {127043},
year = {2025},
issn = {0957-4174},
doi = {https://doi.org/10.1016/j.eswa.2025.127043},
url = {https://www.sciencedirect.com/science/article/pii/S0957417425006657},
author = {Julian Rosenberger and Lukas Wolfrum and Sven Weinzierl and Mathias Kraus and Patrick Zschech},
keywords = {Job consultation, Job markets, Job recommendation system, BERT, NLP},
abstract = {The rapidly evolving labor market, driven by technological advancements and economic shifts, presents significant challenges for traditional job matching and consultation services. In response, we introduce an advanced support tool for career counselors and job seekers based on CareerBERT, a novel approach that leverages the power of unstructured textual data sources, such as resumes, to provide more accurate and comprehensive job recommendations. In contrast to previous approaches that primarily focus on job recommendations based on a fixed set of concrete job advertisements, our approach involves the creation of a corpus that combines data from the European Skills, Competences, and Occupations (ESCO) taxonomy and EURopean Employment Services (EURES) job advertisements, ensuring an up-to-date and well-defined representation of general job titles in the labor market. Our two-step evaluation approach, consisting of an application-grounded evaluation using EURES job advertisements and a human-grounded evaluation using real-world resumes and Human Resources (HR) expert feedback, provides a comprehensive assessment of CareerBERT’s performance. Our experimental results demonstrate that CareerBERT outperforms both traditional and state-of-the-art embedding approaches while showing robust effectiveness in human expert evaluations. These results confirm the effectiveness of CareerBERT in supporting career consultants by generating relevant job recommendations based on resumes, ultimately enhancing the efficiency of job consultations and expanding the perspectives of job seekers. This research contributes to the field of NLP and job recommendation systems, offering valuable insights for both researchers and practitioners in the domain of career consulting and job matching.}
}