🚀 GeoBERT
GeoBERT是一个命名实体识别(NER)模型,它在地球科学语料库数据集上对SciBERT进行了微调。该模型在标记的地球科学语料库数据集(约100万个句子)上进行训练。
🚀 快速开始
如何使用HuggingFace调用GeoBERT
加载GeoBERT及其子词分词器:
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("botryan96/GeoBERT")
model = AutoModelForTokenClassification.from_pretrained("botryan96/GeoBERT")
from transformers import pipeline
ner_machine = pipeline('ner',model = models,tokenizer=tokenizer, aggregation_strategy="simple")
sentence = 'In North America, the water storage in the seepage face model is higher than the base case because positive pore pressure is requisite for drainage through a seepage face boundary condition. The result from the resistivity data supports the notion, especially in the northern part of the Sandstone Sediment formation. The active formation of America has a big potential for Oil and Gas based on the seismic section, has been activated since the Paleozoic'
ner_machine(sentence)
✨ 主要特性
预期用途
此模型中的命名实体识别(NER)产品旨在识别与地球科学相关的四种主要语义类型或类别。
- GeoPetro:适用于属于地球科学所有术语的任何实体。
- GeoMeth:适用于与地球科学相关的所有工具或方法。
- GeoLoc:用于识别地质位置。
- GeoTime:用于识别地质时间尺度实体。
🔧 技术细节
训练超参数
训练期间使用了以下超参数:
- 优化器:
{'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 14000, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
- 训练精度:混合浮点16位(mixed_float16)
框架版本
- Transformers 4.22.1
- TensorFlow 2.10.0
- Datasets 2.4.0
- Tokenizers 0.12.1
📚 详细文档
模型性能(指标:seqeval)
实体 |
精确率 |
召回率 |
F1值 |
GeoLoc |
0.9727 |
0.9591 |
0.9658 |
GeoMeth |
0.9433 |
0.9447 |
0.9445 |
GeoPetro |
0.9767 |
0.9745 |
0.9756 |
GeoTime |
0.9695 |
0.9666 |
0.9680 |