N

Nucleotide Transformer 2.5b Multi Species

Developed by InstaDeepAI
A DNA sequence analysis model pre-trained on genomes from 850 species, supporting tasks such as molecular phenotype prediction
Downloads 2,714
Release Time : 4/5/2023

Model Overview

This model is a large language model specifically designed for genomics, integrating multi-species DNA sequence data to accurately predict molecular phenotypes. Compared to traditional methods, it offers stronger generalization capabilities and accuracy.

Model Features

Multi-species genome integration
Integrates genome data from 850 species, including model and non-model organisms
Large-scale pre-training
Trained on 300 billion tokens, covering 174 billion nucleotides
Efficient tokenization strategy
Employs a 6-mer prioritized tokenization method with a vocabulary size of 4105

Model Capabilities

DNA sequence analysis
Molecular phenotype prediction
Genomic feature extraction
Masked nucleotide prediction

Use Cases

Genomics research
Regulatory element identification
Identify functional regulatory regions in DNA sequences
Provides more accurate predictions compared to existing methods
Cross-species comparative analysis
Analyze genomic similarities and differences across species
Biomedical research
Disease-associated variant prediction
Predict the impact of DNA sequence variations on diseases
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase