GeoBERT開源模型 - 免費用於地球科學領域文本的實體識別分析

首頁

Geobert

由botryan96開發

GeoBERT 是一個基於 SciBERT 在 Geoscientific Corpus 數據集上微調的命名實體識別（NER）模型，專門用於地球科學領域的文本分析。

序列標註

Transformers

#地質實體識別 #SciBERT微調 #地球科學NER

下載量 338

發布時間 : 11/8/2022

模型概述

該模型能夠識別地球科學相關的四種主要語義類型：GeoPetro（地球科學術語）、GeoMeth（工具或方法）、GeoLoc（地質位置）和GeoTime（地質時間）。

模型特點

高精度識別

在四種地球科學實體類型上均表現出高精確率、召回率和F1分數。

領域專用

專門針對地球科學領域文本優化，能夠準確識別專業術語和概念。

多類別識別

同時識別術語、方法、位置和時間四類地質相關信息。

模型能力

地球科學文本分析

命名實體識別

地質信息提取

使用案例

學術研究

地質文獻分析

自動提取地質文獻中的專業術語、方法和時間信息

提高文獻檢索和分析效率

能源勘探

勘探報告處理

從勘探報告中提取關鍵地質位置和特徵信息

輔助決策和資源評估

🚀 GeoBERT

GeoBERT是一個命名實體識別（NER）模型，它在地球科學語料庫數據集上對SciBERT進行了微調。該模型在標記的地球科學語料庫數據集（約100萬個句子）上進行訓練。

🚀 快速開始

如何使用HuggingFace調用GeoBERT

加載GeoBERT及其子詞分詞器：

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("botryan96/GeoBERT")
model = AutoModelForTokenClassification.from_pretrained("botryan96/GeoBERT")

#Define the pipeline
from transformers import pipeline
ner_machine = pipeline('ner',model = models,tokenizer=tokenizer, aggregation_strategy="simple")

#Define the sentence
sentence = 'In North America, the water storage in the seepage face model is higher than the base case because positive pore pressure is requisite for drainage through a seepage face boundary condition. The result from the resistivity data supports the notion, especially in the northern part of the Sandstone Sediment formation. The active formation of America has a big potential for Oil and Gas based on the seismic section, has been activated since the Paleozoic'

#Deploy the NER Machine
ner_machine(sentence)

✨ 主要特性

預期用途

此模型中的命名實體識別（NER）產品旨在識別與地球科學相關的四種主要語義類型或類別。

GeoPetro：適用於屬於地球科學所有術語的任何實體。
GeoMeth：適用於與地球科學相關的所有工具或方法。
GeoLoc：用於識別地質位置。
GeoTime：用於識別地質時間尺度實體。

🔧 技術細節

訓練超參數

訓練期間使用了以下超參數：

優化器：{'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 14000, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
訓練精度：混合浮點16位（mixed_float16）