N

Nucleotide Transformer V2 500m Multi Species

Developed by InstaDeepAI
A foundational language model pre-trained on whole-genome DNA sequences, integrating information from over 3,200 human genomes and 850 genomes of a wide range of species.
Downloads 6,166
Release Time : 7/27/2023

Model Overview

This model is a Transformer model with 500 million parameters, focusing on DNA sequence analysis and providing highly accurate results in molecular phenotype prediction.

Model Features

Multi-species genome integration
Integrates genomic data from 850 different species, including model and non-model organisms
Large-scale pre-training
Pre-trained on 174 billion nucleotides (approximately 29 billion tokens)
Advanced architecture
A second-generation Transformer architecture using rotary position embedding and gated linear units
Efficient tokenization
Adopts a 6-mer prioritized tokenization strategy with a vocabulary size of 4105

Model Capabilities

DNA sequence analysis
Molecular phenotype prediction
Genomic feature extraction
Sequence embedding generation

Use Cases

Genomics research
Regulatory element prediction
Use the model to predict regulatory elements in DNA sequences
Provides more accurate results compared to existing methods
Cross-species comparison
Analyze genomic similarities and differences between different species
Biomedical research
Disease-related variant analysis
Identify DNA sequence variants associated with diseases
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase