D

Distilprotbert

Developed by yarongef
Distilled version of ProtBert-UniRef100 model for protein feature extraction and downstream task fine-tuning
Downloads 1,965
Release Time : 3/30/2022

Model Overview

DistilProtBert is a distilled protein language model pre-trained with masked language modeling objectives, suitable for uppercase amino acid sequences.

Model Features

Distilled model
Distilled from ProtBert-UniRef100 model with reduced parameters while maintaining high performance
Efficient pre-training
Pre-trained using cross-entropy, cosine teacher-student loss, and MLM objectives
Uppercase amino acid support
Specifically optimized for uppercase amino acid sequences

Model Capabilities

Protein feature extraction
Protein sequence classification
Protein structure prediction

Use Cases

Bioinformatics
Secondary structure prediction
Predict protein secondary structure (3-state)
Achieved 72, 81, and 79 accuracy on CASP12, TS115, and CB513 datasets respectively
Membrane protein prediction
Predict whether a protein is a membrane protein
Achieved 86 accuracy on DeepLoc dataset
Protein authenticity detection
Distinguish real proteins from their randomly shuffled versions
Achieved AUC of 0.92, 0.91, and 0.87 in single, double, and triple shuffling tasks respectively
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase