N

Nucleotide Transformer 500m Human Ref

Developed by InstaDeepAI
A 500M-parameter Transformer model pre-trained on the human reference genome, integrating DNA sequence information from over 3,200 diverse human genomes and 850 species
Downloads 4,482
Release Time : 4/4/2023

Model Overview

The Nucleotide Transformer is a series of foundational language models pre-trained on whole-genome DNA sequences, specializing in genomics to provide accurate molecular phenotype predictions

Model Features

Multi-source Genome Integration
Integrates DNA sequence information from over 3,200 diverse human genomes and 850 species
Large-scale Pre-training
Trained on 300 billion tokens using 8 A100 80GB GPUs
6-mer Tokenization Strategy
Employs 6-mer tokenization with a vocabulary size of 4105 for effective DNA sequence processing
Dual Framework Support
Provides both TensorFlow and PyTorch versions

Model Capabilities

DNA Sequence Analysis
Molecular Phenotype Prediction
Genomic Feature Extraction
DNA Sequence Mask Prediction

Use Cases

Genomics Research
DNA Sequence Feature Extraction
Extracts high-level feature representations from DNA sequences
Applicable to downstream genomics tasks
Molecular Phenotype Prediction
Predicts molecular phenotypes associated with DNA sequences
Provides more accurate predictions compared to existing methods
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase