N

Nucleotide Transformer 2.5b 1000g

Developed by InstaDeepAI
A 2.5 billion-parameter DNA sequence foundation model pre-trained on 3,202 genetically diverse human genomes, capable of precise molecular phenotype prediction
Downloads 122
Release Time : 4/4/2023

Model Overview

The Nucleotide Transformer is a pre-trained language model specifically designed for whole-genome DNA sequences, integrating human and multi-species genomic data, demonstrating exceptional performance in molecular phenotype prediction

Model Features

Multi-source Genome Pre-training
Integrates data from 3,200+ human genomes and 850+ species, covering extensive genetic diversity
Efficient Tokenization Strategy
Employs a 6-mer prioritized tokenization method, balancing sequence information density with computational efficiency
Large-scale Parameters
2.5 billion parameter scale enables capturing complex genomic feature patterns

Model Capabilities

DNA Sequence Embedding Generation
Genomic Variant Prediction
Molecular Phenotype Inference
Masked Nucleotide Prediction

Use Cases

Genomics Research
Genetic Variation Analysis
Identify functional genomic regions through sequence embeddings
Significantly improves variant effect prediction accuracy compared to traditional methods
Cross-species Comparison
Analyze conserved regions using multi-species pre-trained features
Biomedical Applications
Disease Risk Prediction
Disease association studies based on whole-genome sequences
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase