O

Ontoprotein

Developed by zjunlp
A protein pretraining model incorporating structured knowledge from Gene Ontology (GO), optimizing protein sequence representation through masked language modeling and knowledge embedding dual objectives.
Downloads 69
Release Time : 3/2/2022

Model Overview

The first general framework integrating Gene Ontology knowledge into protein pretraining, achieving joint embedding learning of proteins and GO terms through large-scale knowledge graph construction.

Model Features

Knowledge-Enhanced Pretraining
Innovatively integrates structured Gene Ontology knowledge, optimizing protein representation through knowledge graph negative sampling contrastive learning
Dual-Objective Optimization
Simultaneously performs masked language modeling (MLM) for protein sequences and joint training of knowledge graph embedding (KE)
Large-Scale Knowledge Graph
Constructs a novel knowledge graph containing GO terms and associated proteins, with all nodes described by text or sequences

Model Capabilities

Protein sequence representation learning
Gene function prediction
Protein-knowledge graph joint embedding

Use Cases

Biomedical Research
Protein Function Annotation
Predicts unknown protein functions using GO knowledge-enhanced protein representations
Improves functional prediction accuracy compared to traditional methods
Protein-Protein Interaction Prediction
Computes protein similarity through knowledge-aware embedding space
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase