Ontoprotein
A protein pretraining model incorporating structured knowledge from Gene Ontology (GO), optimizing protein sequence representation through masked language modeling and knowledge embedding dual objectives.
Downloads 69
Release Time : 3/2/2022
Model Overview
The first general framework integrating Gene Ontology knowledge into protein pretraining, achieving joint embedding learning of proteins and GO terms through large-scale knowledge graph construction.
Model Features
Knowledge-Enhanced Pretraining
Innovatively integrates structured Gene Ontology knowledge, optimizing protein representation through knowledge graph negative sampling contrastive learning
Dual-Objective Optimization
Simultaneously performs masked language modeling (MLM) for protein sequences and joint training of knowledge graph embedding (KE)
Large-Scale Knowledge Graph
Constructs a novel knowledge graph containing GO terms and associated proteins, with all nodes described by text or sequences
Model Capabilities
Protein sequence representation learning
Gene function prediction
Protein-knowledge graph joint embedding
Use Cases
Biomedical Research
Protein Function Annotation
Predicts unknown protein functions using GO knowledge-enhanced protein representations
Improves functional prediction accuracy compared to traditional methods
Protein-Protein Interaction Prediction
Computes protein similarity through knowledge-aware embedding space
Featured Recommended AI Models