SciBERTNER開源模型 - 免費識別6種預定義科學文獻實體類型

首頁

Scibertner

由Kashob開發

基於SciBERT的科學文獻實體識別模型，支持6種預定義的科學實體類型識別

序列標註

Transformers

英語開源協議:MIT #科學實體識別 #SciBERT微調 #科研文本處理

下載量 78

發布時間 : 4/12/2024

模型概述

該模型專門用於科學文獻中的實體識別任務，能夠識別包括材料、方法、指標等在內的6種科學實體類型。

模型特點

科學領域專用

針對科學文獻特點優化，能準確識別材料、方法等科學領域特有實體

多類別識別

支持6種預定義的科學實體類型識別，包括通用類、材料類、方法類等

基於SciBERT

利用SciBERT預訓練模型，具備科學文本理解能力

模型能力

科學實體識別

文本標註

信息提取

使用案例

學術研究

文獻元數據提取

從科研論文中自動提取研究方法、實驗材料等關鍵信息

可構建結構化文獻數據庫

知識圖譜構建

識別科學文獻中的實體關係，輔助構建領域知識圖譜

科研輔助

文獻綜述自動化

自動提取多篇文獻中的關鍵方法和技術術語

加速文獻調研過程

🚀 基於SciBERT的科學實體識別模型

本模型是基於SciBERT的科學實體識別模型，可識別特定的科學實體類型，為科學文本處理提供了有效的解決方案。

🚀 快速開始

模型詳情

模型描述

這是一個基於SciBERT的科學實體識別任務模型。預定義的實體類型包括：'Generic'（通用）、'Material'（材料）、'Method'（方法）、'Metric'（指標）、'OtherScientificTerm'（其他科學術語）和'Task'（任務）。

模型來源

倉庫地址：NA
論文：撰寫中
演示：NA

使用示例

基礎用法

from transformers import AutoConfig, AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained('Kashob/SciBERTNER')
model = AutoModelForTokenClassification.from_pretrained('Kashob/SciBERTNER')
config = AutoConfig.from_pretrained('Kashob/SciBERTNER')
id2tag = config.id2label

text = 'The paper tackles the problem of endowing Transformers with the ability to encode information about the past via recurrence. The proposed architecture can leverage the recurrent connections to improve the sample efficiency while maintaining expressivity due to the use of self-attention.'.split()

inputs = tokenizer(text, is_split_into_words=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits.argmax(-1)

tokenized_text = tokenizer.convert_ids_to_tokens(inputs['input_ids'].tolist()[0])
predicted_labels = [id2tag[label_id] for label_id in predictions[0].tolist()]
print(tokenized_text)
print(predicted_labels)

Output: 
['[CLS]', 'the', 'paper', 'tackle', '##s', 'the', 'problem', 'of', 'endow', '##ing', 'transformers', 'with', 'the', 'ability', 'to', 'encode', 'information', 'about', 'the', 'past', 'via', 'recurrence', '.', 'the', 'proposed', 'architecture', 'can', 'leverage', 'the', 'recurrent', 'connections', 'to', 'improve', 'the', 'sample', 'efficiency', 'while', 'maintaining', 'express', '##ivity', 'due', 'to', 'the', 'use', 'of', 'self', '-', 'attention', '.', '[SEP]']
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-OtherScientificTerm', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-Method', 'O', 'O', 'O', 'B-Generic', 'O', 'O', 'O', 'B-OtherScientificTerm', 'I-OtherScientificTerm', 'O', 'O', 'O', 'B-Metric', 'I-Metric', 'O', 'O', 'B-Metric', 'I-OtherScientificTerm', 'O', 'O', 'O', 'O', 'O', 'B-Method', 'I-OtherScientificTerm', 'I-OtherScientificTerm', 'O', 'O']