M

Mattpuscibert

Developed by lfoppiano
A language model pretrained on 700,000 full-text papers in materials science based on the SciBERT framework, specifically designed to enhance text comprehension in materials science
Downloads 161
Release Time : 9/21/2022

Model Overview

MatTPUSciBERT is a BERT model optimized for materials science, significantly improving named entity recognition and physical quantity extraction capabilities in materials science literature through domain-specific pretraining and vocabulary expansion

Model Features

Materials Science Domain Optimization
Pretrained on 700,000 full-text materials science papers, significantly enhancing domain text comprehension
Expanded Domain Vocabulary
Extended the original SciBERT vocabulary with 100 materials science-specific terms extracted using KeyBERT
TPU-efficient Training
Two-phase training (800k steps + 100k steps) conducted on Google Cloud TPU to optimize training efficiency
Multi-task Validation
Model performance validated on two typical tasks: superconductor named entity recognition and physical quantity extraction

Model Capabilities

Materials science text comprehension
Superconductor named entity recognition
Physical quantity extraction
Scientific literature information extraction

Use Cases

Materials Science Research
Superconductor Material Discovery
Automatically identify new superconductor materials and their properties from scientific literature
Achieved F1-score of 83.61%, outperforming comparable models
Material Property Quantitative Analysis
Automatically extract physical quantity data of materials reported in literature
Achieved F1-score of 87.46%, comparable to baseline models
Scientific Literature Mining
Materials Database Construction
Automatically extract structured material data from large volumes of literature
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase