span-marker-roberta-large-fewnerd-fine-super开源模型 - 精准完成细粒度命名实体识别

首页

Span Marker Roberta Large Fewnerd Fine Super

由 tomaarsen 开发

这是一个基于roberta-large的SpanMarker模型，专门用于细粒度命名实体识别任务，在FewNERD数据集上训练得到。

序列标注

Safetensors

支持多种语言#细粒度NER #RoBERTa增强 #多类别实体识别

下载量 53

发布时间 : 3/30/2023

模型简介

该模型采用SpanMarker架构，结合roberta-large编码器，能够识别文本中的各类命名实体，适用于信息提取等场景。

模型特点

细粒度实体识别

支持识别66种细粒度实体类型，涵盖人物、地点、组织等多个领域

高性能基础模型

基于roberta-large编码器，提供强大的语义理解能力

SpanMarker架构

采用先进的SpanMarker方法，有效处理实体边界识别问题

模型能力

命名实体识别

细粒度实体分类

文本信息提取

使用案例

信息提取

新闻人物识别

从新闻文本中识别提及的人物及其类型

可准确识别如'阿梅莉亚·埃尔哈特'等人物实体

地理信息提取

识别文本中的地点、建筑等地理实体

可识别'巴黎'、'大西洋'等地理实体

内容分析

影视作品分析

识别文本中提到的电影、电视节目等

可准确识别如'潜龙轰天'等影视作品

🚀 SpanMarker结合roberta-large在FewNERD数据集上的应用

本项目是一个基于 SpanMarker 模型，在 FewNERD 数据集上训练得到的命名实体识别模型。该SpanMarker模型使用 roberta-large 作为基础编码器。训练脚本见 train.py。

🚀 快速开始

直接使用

from span_marker import SpanMarkerModel

# 从🤗 Hub下载模型
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")
# 进行推理
entities = model.predict("Most of the Steven Seagal movie ``Under Siege`` (co-starring Tommy Lee Jones) was filmed aboard the Battleship USS Alabama, which is docked on Mobile Bay at Battleship Memorial Park and open to the public.")

下游使用

你可以在自己的数据集上对该模型进行微调。

点击展开

from span_marker import SpanMarkerModel, Trainer

# 从🤗 Hub下载模型
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")

# 指定一个包含 "tokens" 和 "ner_tag" 列的数据集
dataset = load_dataset("conll2003") # 例如CoNLL2003

# 使用预训练模型和数据集初始化一个Trainer
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("tomaarsen/span-marker-roberta-large-fewnerd-fine-super-finetuned")

✨ 主要特性

可用于命名实体识别任务。
使用roberta-large作为基础编码器，具有较强的特征提取能力。
支持在自己的数据集上进行微调。

📦 安装指南

文档未提及安装步骤，故跳过该章节。

💻 使用示例

基础用法

from span_marker import SpanMarkerModel

# 从🤗 Hub下载模型
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")
# 进行推理
entities = model.predict("Most of the Steven Seagal movie ``Under Siege`` (co-starring Tommy Lee Jones) was filmed aboard the Battleship USS Alabama, which is docked on Mobile Bay at Battleship Memorial Park and open to the public.")

高级用法

from span_marker import SpanMarkerModel, Trainer

# 从🤗 Hub下载模型
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")

# 指定一个包含 "tokens" 和 "ner_tag" 列的数据集
dataset = load_dataset("conll2003") # 例如CoNLL2003

# 使用预训练模型和数据集初始化一个Trainer
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("tomaarsen/span-marker-roberta-large-fewnerd-fine-super-finetuned")

📚 详细文档

模型详情

模型描述

属性	详情
模型类型	SpanMarker
编码器	roberta-large
最大序列长度	256个词元
最大实体长度	8个单词
训练数据集	FewNERD
语言	英语
许可证	cc-by-sa-4.0

模型来源

仓库：GitHub上的SpanMarker
论文：SpanMarker For Named Entity Recognition

模型标签

标签	示例
art-broadcastprogram	"Street Cents", "The Gale Storm Show : Oh , Susanna", "Corazones"
art-film	"Shawshank Redemption", "Bosch", "L'Atlantide"
art-music	"Hollywood Studio Symphony", "Champion Lover", "Atkinson , Danko and Ford ( with Brockie and Hilton )"
art-other	"Aphrodite of Milos", "Venus de Milo", "The Today Show"
art-painting	"Production/Reproduction", "Cofiwch Dryweryn", "Touit"
art-writtenart	"Imelda de ' Lambertazzi", "Time", "The Seven Year Itch"
building-airport	"Sheremetyevo International Airport", "Newark Liberty International Airport", "Luton Airport"
building-hospital	"Memorial Sloan-Kettering Cancer Center", "Hokkaido University Hospital", "Yeungnam University Hospital"
building-hotel	"Flamingo Hotel", "The Standard Hotel", "Radisson Blu Sea Plaza Hotel"
building-library	"British Library", "Berlin State Library", "Bayerische Staatsbibliothek"
building-other	"Alpha Recording Studios", "Henry Ford Museum", "Communiplex"
building-restaurant	"Fatburger", "Carnegie Deli", "Trumbull"
building-sportsfacility	"Sports Center", "Glenn Warner Soccer Facility", "Boston Garden"
building-theater	"Pittsburgh Civic Light Opera", "National Paris Opera", "Sanders Theatre"
event-attack/battle/war/militaryconflict	"Jurist", "Vietnam War", "Easter Offensive"
event-disaster	"the 1912 North Mount Lyell Disaster", "1990s North Korean famine", "1693 Sicily earthquake"
event-election	"March 1898 elections", "Elections to the European Parliament", "1982 Mitcham and Morden by-election"
event-other	"Eastwood Scoring Stage", "Union for a Popular Movement", "Masaryk Democratic Movement"
event-protest	"Russian Revolution", "French Revolution", "Iranian Constitutional Revolution"
event-sportsevent	"World Cup", "Stanley Cup", "National Champions"
location-GPE	"Croatian", "the Republic of Croatia", "Mediterranean Basin"
location-bodiesofwater	"Arthur Kill", "Norfolk coast", "Atatürk Dam Lake"
location-island	"new Samsat district", "Staten Island", "Laccadives"
location-mountain	"Ruweisat Ridge", "Salamander Glacier", "Miteirya Ridge"
location-other	"Northern City Line", "Victoria line", "Cartuther"
location-park	"Gramercy Park", "Shenandoah National Park", "Painted Desert Community Complex Historic District"
location-road/railway/highway/transit	"NJT", "Friern Barnet Road", "Newark-Elizabeth Rail Link"
organization-company	"Church 's Chicken", "Dixy Chicken", "Texas Chicken"
organization-education	"MIT", "Barnard College", "Belfast Royal Academy and the Ulster College of Physical Education"
organization-government/governmentagency	"Supreme Court", "Congregazione dei Nobili", "Diet"
organization-media/newspaper	"Al Jazeera", "Clash", "TimeOut Melbourne"
organization-other	"IAEA", "4th Army", "Defence Sector C"
organization-politicalparty	"Al Wafa ' Islamic", "Kenseitō", "Shimpotō"
organization-religion	"Jewish", "UPCUSA", "Christian"
organization-showorganization	"Mr. Mister", "Lizzy", "Bochumer Symphoniker"
organization-sportsleague	"China League One", "NHL", "First Division"
organization-sportsteam	"Arsenal", "Luc Alphand Aventures", "Tottenham"
other-astronomything	"Algol", "`` Caput Larvae ''", "Zodiac"
other-award	"GCON", "Grand Commander of the Order of the Niger", "Order of the Republic of Guinea and Nigeria"
other-biologything	"BAR", "N-terminal lipid", "Amphiphysin"
other-chemicalthing	"carbon dioxide", "sulfur", "uranium"
other-currency	"$", "Travancore Rupee", "lac crore"
other-disease	"bladder cancer", "French Dysentery Epidemic of 1779", "hypothyroidism"
other-educationaldegree	"Bachelor", "Master", "BSc ( Hons ) in physics"
other-god	"El", "Fujin", "Raijin"
other-language	"Latin", "Breton-speaking", "English"
other-law	"Leahy–Smith America Invents Act ( AIA", "Thirty Years ' Peace", "United States Freedom Support Act"
other-livingthing	"monkeys", "patchouli", "insects"
other-medical	"Pediatrics", "pediatrician", "amitriptyline"
person-actor	"Tchéky Karyo", "Ellaline Terriss", "Edmund Payne"
person-artist/author	"George Axelrod", "Gaetano Donizett", "Hicks"
person-athlete	"Jaguar", "Tozawa", "Neville"
person-director	"Bob Swaim", "Frank Darabont", "Richard Quine"
person-other	"Richard Benson", "Holden", "Campbell"
person-politician	"Emeric", "Rivière", "William"
person-scholar	"Stalmine", "Stedman", "Wurdack"
person-soldier	"Helmuth Weidling", "Joachim Ziegler", "Krukenberg"
product-airplane	"Luton", "Spey-equipped FGR.2s", "EC135T2 CPDS"
product-car	"100EX", "Phantom", "Corvettes - GT1 C6R"
product-food	"red grape", "yakiniku", "V. labrusca"
product-game	"Airforce Delta", "Splinter Cell", "Hardcore RPG"
product-other	"Fairbottom Bobs", "X11", "PDP-1"
product-ship	"HMS `` Chinkara ''", "Congress", "Essex"
product-software	"Wikipedia", "Apdf", "AmiPDF"
product-train	"Royal Scots Grey", "High Speed Trains", "55022"
product-weapon	"AR-15 's", "ZU-23-2M Wróbel", "ZU-23-2MR Wróbel II"