A

Agro Nucleotide Transformer 1b

Developed by InstaDeepAI
AgroNT is a DNA language model trained on edible plant genomes, capable of learning universal representations of nucleotide sequences.
Downloads 4,869
Release Time : 8/1/2023

Model Overview

AgroNT is a DNA language model primarily trained on edible plant genomes, utilizing a Transformer architecture to learn universal representations of nucleotide sequences through masked language modeling objectives.

Model Features

Large-scale genome training
The model is trained using high-availability genotype data from 48 different plant species, covering approximately 10.5 million genomic sequences.
6-mer tokenization
Uses a non-overlapping 6-mer tokenizer to convert genomic nucleotide sequences into tokens, with a vocabulary containing 4096 possible 6-mer combinations.
Long context window
The model supports a context window of 1024 tokens, corresponding to approximately 6144 base pairs.
Efficient pre-training
Pre-training utilizes an effective batch size of 1.5 million tokens, with a total of 315,000 update steps, amounting to 472.5 billion tokens trained in total.

Model Capabilities

Genomic sequence representation learning
Masked nucleotide prediction
Genomic sequence embedding generation

Use Cases

Genomics research
Plant genome analysis
Utilizes the model to learn universal representations of plant genomes, aiding in genome analysis and comparison.
Genomic sequence prediction
Predicts masked portions of genomic sequences, assisting in genome sequencing and annotation.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase