P

Prot T5 Xl Uniref50

Developed by Rostlab
A protein sequence pre-training model based on T5-3B architecture that captures biophysical properties through self-supervised learning
Downloads 78.45k
Release Time : 3/2/2022

Model Overview

This model is pre-trained on the UniRef50 dataset using masked language modeling objectives, capable of extracting meaningful biological feature representations from protein sequences, suitable for tasks such as protein structure prediction and functional analysis

Model Features

Large-scale pre-training
Pre-trained on the UniRef50 dataset containing 45 million protein sequences
Biophysical property capture
The learned features can reflect important biophysical properties that determine protein tertiary structures
Dual-purpose design
Supports both direct feature extraction and fine-tuning for specific downstream tasks
Efficient masking strategy
Uses 15% random amino acid masking strategy, with 90% replaced by [MASK] and 10% by random amino acids

Model Capabilities

Protein sequence feature extraction
Protein secondary structure prediction
Subcellular localization prediction
Membrane protein detection
Protein function prediction

Use Cases

Structural biology
Protein secondary structure prediction
Predicts 3-state or 8-state secondary structures of proteins
Achieved 81% accuracy (3-state) on the CASP12 dataset
Cell biology
Subcellular localization prediction
Predicts the localization of proteins within cells
Achieved 81% accuracy on the DeepLoc dataset
Membrane protein detection
Distinguishes membrane-bound proteins from water-soluble proteins
Achieved 91% accuracy on the DeepLoc dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase