span-marker-roberta-large-fewnerd-fine-super開源模型 - 精準完成細粒度命名實體識別

首頁

Span Marker Roberta Large Fewnerd Fine Super

由tomaarsen開發

這是一個基於roberta-large的SpanMarker模型，專門用於細粒度命名實體識別任務，在FewNERD數據集上訓練得到。

序列標註

Safetensors

支持多種語言#細粒度NER #RoBERTa增強 #多類別實體識別

下載量 53

發布時間 : 3/30/2023

模型概述

該模型採用SpanMarker架構，結合roberta-large編碼器，能夠識別文本中的各類命名實體，適用於信息提取等場景。

模型特點

細粒度實體識別

支持識別66種細粒度實體類型，涵蓋人物、地點、組織等多個領域

高性能基礎模型

基於roberta-large編碼器，提供強大的語義理解能力

SpanMarker架構

採用先進的SpanMarker方法，有效處理實體邊界識別問題

模型能力

命名實體識別

細粒度實體分類

文本信息提取

使用案例

信息提取

新聞人物識別

從新聞文本中識別提及的人物及其類型

可準確識別如'阿梅莉亞·埃爾哈特'等人物實體

地理信息提取

識別文本中的地點、建築等地理實體

可識別'巴黎'、'大西洋'等地理實體

內容分析

影視作品分析

識別文本中提到的電影、電視節目等

可準確識別如'潛龍轟天'等影視作品

🚀 SpanMarker結合roberta-large在FewNERD數據集上的應用

本項目是一個基於 SpanMarker 模型，在 FewNERD 數據集上訓練得到的命名實體識別模型。該SpanMarker模型使用 roberta-large 作為基礎編碼器。訓練腳本見 train.py。

🚀 快速開始

直接使用

from span_marker import SpanMarkerModel

# 從🤗 Hub下載模型
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")
# 進行推理
entities = model.predict("Most of the Steven Seagal movie ``Under Siege`` (co-starring Tommy Lee Jones) was filmed aboard the Battleship USS Alabama, which is docked on Mobile Bay at Battleship Memorial Park and open to the public.")

下游使用

你可以在自己的數據集上對該模型進行微調。

點擊展開

from span_marker import SpanMarkerModel, Trainer

# 從🤗 Hub下載模型
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")

# 指定一個包含 "tokens" 和 "ner_tag" 列的數據集
dataset = load_dataset("conll2003") # 例如CoNLL2003

# 使用預訓練模型和數據集初始化一個Trainer
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("tomaarsen/span-marker-roberta-large-fewnerd-fine-super-finetuned")

✨ 主要特性

可用於命名實體識別任務。
使用roberta-large作為基礎編碼器，具有較強的特徵提取能力。
支持在自己的數據集上進行微調。

📦 安裝指南

文檔未提及安裝步驟，故跳過該章節。

💻 使用示例

基礎用法

from span_marker import SpanMarkerModel

# 從🤗 Hub下載模型
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")
# 進行推理
entities = model.predict("Most of the Steven Seagal movie ``Under Siege`` (co-starring Tommy Lee Jones) was filmed aboard the Battleship USS Alabama, which is docked on Mobile Bay at Battleship Memorial Park and open to the public.")

高級用法

from span_marker import SpanMarkerModel, Trainer

# 從🤗 Hub下載模型
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super")

# 指定一個包含 "tokens" 和 "ner_tag" 列的數據集
dataset = load_dataset("conll2003") # 例如CoNLL2003

# 使用預訓練模型和數據集初始化一個Trainer
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("tomaarsen/span-marker-roberta-large-fewnerd-fine-super-finetuned")

📚 詳細文檔

模型詳情

模型描述

屬性	詳情
模型類型	SpanMarker
編碼器	roberta-large
最大序列長度	256個詞元
最大實體長度	8個單詞
訓練數據集	FewNERD
語言	英語
許可證	cc-by-sa-4.0

模型來源

倉庫：GitHub上的SpanMarker
論文：SpanMarker For Named Entity Recognition

模型標籤

標籤	示例
art-broadcastprogram	"Street Cents", "The Gale Storm Show : Oh , Susanna", "Corazones"
art-film	"Shawshank Redemption", "Bosch", "L'Atlantide"
art-music	"Hollywood Studio Symphony", "Champion Lover", "Atkinson , Danko and Ford ( with Brockie and Hilton )"
art-other	"Aphrodite of Milos", "Venus de Milo", "The Today Show"
art-painting	"Production/Reproduction", "Cofiwch Dryweryn", "Touit"
art-writtenart	"Imelda de ' Lambertazzi", "Time", "The Seven Year Itch"
building-airport	"Sheremetyevo International Airport", "Newark Liberty International Airport", "Luton Airport"
building-hospital	"Memorial Sloan-Kettering Cancer Center", "Hokkaido University Hospital", "Yeungnam University Hospital"
building-hotel	"Flamingo Hotel", "The Standard Hotel", "Radisson Blu Sea Plaza Hotel"
building-library	"British Library", "Berlin State Library", "Bayerische Staatsbibliothek"
building-other	"Alpha Recording Studios", "Henry Ford Museum", "Communiplex"
building-restaurant	"Fatburger", "Carnegie Deli", "Trumbull"
building-sportsfacility	"Sports Center", "Glenn Warner Soccer Facility", "Boston Garden"
building-theater	"Pittsburgh Civic Light Opera", "National Paris Opera", "Sanders Theatre"
event-attack/battle/war/militaryconflict	"Jurist", "Vietnam War", "Easter Offensive"
event-disaster	"the 1912 North Mount Lyell Disaster", "1990s North Korean famine", "1693 Sicily earthquake"
event-election	"March 1898 elections", "Elections to the European Parliament", "1982 Mitcham and Morden by-election"
event-other	"Eastwood Scoring Stage", "Union for a Popular Movement", "Masaryk Democratic Movement"
event-protest	"Russian Revolution", "French Revolution", "Iranian Constitutional Revolution"
event-sportsevent	"World Cup", "Stanley Cup", "National Champions"
location-GPE	"Croatian", "the Republic of Croatia", "Mediterranean Basin"
location-bodiesofwater	"Arthur Kill", "Norfolk coast", "Atatürk Dam Lake"
location-island	"new Samsat district", "Staten Island", "Laccadives"
location-mountain	"Ruweisat Ridge", "Salamander Glacier", "Miteirya Ridge"
location-other	"Northern City Line", "Victoria line", "Cartuther"
location-park	"Gramercy Park", "Shenandoah National Park", "Painted Desert Community Complex Historic District"
location-road/railway/highway/transit	"NJT", "Friern Barnet Road", "Newark-Elizabeth Rail Link"
organization-company	"Church 's Chicken", "Dixy Chicken", "Texas Chicken"
organization-education	"MIT", "Barnard College", "Belfast Royal Academy and the Ulster College of Physical Education"
organization-government/governmentagency	"Supreme Court", "Congregazione dei Nobili", "Diet"
organization-media/newspaper	"Al Jazeera", "Clash", "TimeOut Melbourne"
organization-other	"IAEA", "4th Army", "Defence Sector C"
organization-politicalparty	"Al Wafa ' Islamic", "Kenseitō", "Shimpotō"
organization-religion	"Jewish", "UPCUSA", "Christian"
organization-showorganization	"Mr. Mister", "Lizzy", "Bochumer Symphoniker"
organization-sportsleague	"China League One", "NHL", "First Division"
organization-sportsteam	"Arsenal", "Luc Alphand Aventures", "Tottenham"
other-astronomything	"Algol", "`` Caput Larvae ''", "Zodiac"
other-award	"GCON", "Grand Commander of the Order of the Niger", "Order of the Republic of Guinea and Nigeria"
other-biologything	"BAR", "N-terminal lipid", "Amphiphysin"
other-chemicalthing	"carbon dioxide", "sulfur", "uranium"
other-currency	"$", "Travancore Rupee", "lac crore"
other-disease	"bladder cancer", "French Dysentery Epidemic of 1779", "hypothyroidism"
other-educationaldegree	"Bachelor", "Master", "BSc ( Hons ) in physics"
other-god	"El", "Fujin", "Raijin"
other-language	"Latin", "Breton-speaking", "English"
other-law	"Leahy–Smith America Invents Act ( AIA", "Thirty Years ' Peace", "United States Freedom Support Act"
other-livingthing	"monkeys", "patchouli", "insects"
other-medical	"Pediatrics", "pediatrician", "amitriptyline"
person-actor	"Tchéky Karyo", "Ellaline Terriss", "Edmund Payne"
person-artist/author	"George Axelrod", "Gaetano Donizett", "Hicks"
person-athlete	"Jaguar", "Tozawa", "Neville"
person-director	"Bob Swaim", "Frank Darabont", "Richard Quine"
person-other	"Richard Benson", "Holden", "Campbell"
person-politician	"Emeric", "Rivière", "William"
person-scholar	"Stalmine", "Stedman", "Wurdack"
person-soldier	"Helmuth Weidling", "Joachim Ziegler", "Krukenberg"
product-airplane	"Luton", "Spey-equipped FGR.2s", "EC135T2 CPDS"
product-car	"100EX", "Phantom", "Corvettes - GT1 C6R"
product-food	"red grape", "yakiniku", "V. labrusca"
product-game	"Airforce Delta", "Splinter Cell", "Hardcore RPG"
product-other	"Fairbottom Bobs", "X11", "PDP-1"
product-ship	"HMS `` Chinkara ''", "Congress", "Essex"
product-software	"Wikipedia", "Apdf", "AmiPDF"
product-train	"Royal Scots Grey", "High Speed Trains", "55022"
product-weapon	"AR-15 's", "ZU-23-2M Wróbel", "ZU-23-2MR Wróbel II"