SciBERTNERオープンソースモデル - 6種類の事前定義された科学文献の実体タイプを無料で識別

ホーム

Scibertner

Kashobによって開発

SciBERTベースの科学文献エンティティ認識モデル、6種類の事前定義された科学エンティティタイプの認識をサポート

シーケンスラベリング

Transformers

英語オープンソースライセンス:MIT #科学エンティティ認識 #SciBERTファインチューニング #研究テキスト処理

ダウンロード数 78

リリース時間 : 4/12/2024

モデル概要

このモデルは科学文献におけるエンティティ認識タスク専用で、材料、方法、指標など6種類の科学エンティティタイプを識別可能です。

モデル特徴

科学分野専用

科学文献の特徴に最適化され、材料、方法などの科学分野特有のエンティティを正確に認識

マルチカテゴリー認識

汎用クラス、材料クラス、方法クラスなど6種類の事前定義された科学エンティティタイプの認識をサポート

SciBERTベース

SciBERT事前学習モデルを利用し、科学テキスト理解能力を備える

モデル能力

科学エンティティ認識

テキストアノテーション

情報抽出

使用事例

学術研究

文献メタデータ抽出

研究論文から研究方法、実験材料などのキー情報を自動抽出

構造化文献データベースの構築が可能

知識グラフ構築

科学文献中のエンティティ関係を認識し、分野知識グラフ構築を支援

研究支援

文献レビュー自動化

複数文献から主要な方法や技術用語を自動抽出

文献調査プロセスの加速

🚀 科学エンティティ認識用SciBERTベースモデル

このモデルは、科学分野のエンティティ認識タスクに特化したSciBERTベースのモデルです。予め定義されたエンティティタイプには、'Generic'、'Material'、'Method'、'Metric'、'OtherScientificTerm'、'Task' が含まれます。

🚀 クイックスタート

このモデルを使用するには、以下のコードを参考にしてください。

基本的な使用法

from transformers import AutoConfig, AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained('Kashob/SciBERTNER')
model = AutoModelForTokenClassification.from_pretrained('Kashob/SciBERTNER')
config = AutoConfig.from_pretrained('Kashob/SciBERTNER')
id2tag = config.id2label

text = 'The paper tackles the problem of endowing Transformers with the ability to encode information about the past via recurrence. The proposed architecture can leverage the recurrent connections to improve the sample efficiency while maintaining expressivity due to the use of self-attention.'.split()

inputs = tokenizer(text, is_split_into_words=True, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    predictions = outputs.logits.argmax(-1)

tokenized_text = tokenizer.convert_ids_to_tokens(inputs['input_ids'].tolist()[0])
predicted_labels = [id2tag[label_id] for label_id in predictions[0].tolist()]
print(tokenized_text)
print(predicted_labels)

Output: 
['[CLS]', 'the', 'paper', 'tackle', '##s', 'the', 'problem', 'of', 'endow', '##ing', 'transformers', 'with', 'the', 'ability', 'to', 'encode', 'information', 'about', 'the', 'past', 'via', 'recurrence', '.', 'the', 'proposed', 'architecture', 'can', 'leverage', 'the', 'recurrent', 'connections', 'to', 'improve', 'the', 'sample', 'efficiency', 'while', 'maintaining', 'express', '##ivity', 'due', 'to', 'the', 'use', 'of', 'self', '-', 'attention', '.', '[SEP]']
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-OtherScientificTerm', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-Method', 'O', 'O', 'O', 'B-Generic', 'O', 'O', 'O', 'B-OtherScientificTerm', 'I-OtherScientificTerm', 'O', 'O', 'O', 'B-Metric', 'I-Metric', 'O', 'O', 'B-Metric', 'I-OtherScientificTerm', 'O', 'O', 'O', 'O', 'O', 'B-Method', 'I-OtherScientificTerm', 'I-OtherScientificTerm', 'O', 'O']