开源Jobbert Skill Extraction（SkillSpan）模型 - 从英文招聘信息精准提取技能助力市场分析

首页

Jobbert Skill Extraction

由 jjzha 开发

SkillSpan是一个从英文招聘信息中提取硬技能与软技能的模型，基于BERT架构优化，适用于劳动力市场动态分析。

序列标注

Transformers

#招聘技能抽取 #硬软技能分类 #领域适配BERT

下载量 1,675

发布时间 : 4/6/2023

模型简介

该模型主要用于从招聘信息中自动识别和分类硬技能与软技能，帮助分析劳动力市场需求和技能趋势。

模型特点

领域适配优化

通过招聘领域持续预训练和长文本优化，显著提升技能抽取准确率

标准化标注规范

提供由领域专家标注的硬软技能标注规范，确保数据质量

大规模标注数据集

包含14.5K句子和12.5K标注片段的SKILLSPAN数据集

模型能力

文本信息抽取

技能分类

招聘信息分析

使用案例

人力资源

招聘需求分析

自动分析招聘广告中的技能要求，帮助企业了解市场趋势

可准确识别硬技能和软技能需求

求职匹配

将求职者简历技能与职位要求进行匹配

提高人才匹配效率

教育规划

课程设计

分析市场需求技能，指导教育机构课程设置

使教育内容更符合就业市场需求

🚀 技能提取模型演示项目

本项目是一个使用特定模型的演示，这些模型来自以下研究：

🚀 快速开始

此演示使用了来自论文的模型。该论文聚焦于从英文招聘信息中进行硬技能和软技能提取的任务。

📚 详细文档

本项目引用的论文为：

@inproceedings{zhang-etal-2022-skillspan,
    title = "{S}kill{S}pan: Hard and Soft Skill Extraction from {E}nglish Job Postings",
    author = "Zhang, Mike  and
      Jensen, Kristian  and
      Sonniks, Sif  and
      Plank, Barbara",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.366",
    doi = "10.18653/v1/2022.naacl-main.366",
    pages = "4962--4984",
    abstract = "Skill Extraction (SE) is an important and widely-studied task useful to gain insights into labor market dynamics. However, there is a lacuna of datasets and annotation guidelines; available datasets are few and contain crowd-sourced labels on the span-level or labels from a predefined skill inventory. To address this gap, we introduce SKILLSPAN, a novel SE dataset consisting of 14.5K sentences and over 12.5K annotated spans. We release its respective guidelines created over three different sources annotated for hard and soft skills by domain experts. We introduce a BERT baseline (Devlin et al., 2019). To improve upon this baseline, we experiment with language models that are optimized for long spans (Joshi et al., 2020; Beltagy et al., 2020), continuous pre-training on the job posting domain (Han and Eisenstein, 2019; Gururangan et al., 2020), and multi-task learning (Caruana, 1997). Our results show that the domain-adapted models significantly outperform their non-adapted counterparts, and single-task outperforms multi-task learning.",
}

请注意，还有另一个端点 jjzha/jobbert_knowledge_extraction。知识可被视为硬技能，而技能则包含软技能和应用技能。