🚀 Evo-1 (Phase 1)
Evo is a biological foundation model that can handle long-context modeling and design. It uses the StripedHyena architecture to model sequences at a single-nucleotide, byte - level resolution with near - linear scaling of compute and memory relative to context length.
🚀 Quick Start
We identified and fixed an issue related to a wrong permutation of some projections, which affects generation quality. To use the new model revision, please load as follows:
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True, revision="1.1_fix")
model = AutoModelForCausalLM.from_pretrained(
model_name,
config=config,
trust_remote_code=True,
revision="1.1_fix"
)
✨ Features
- Long - context Modeling: Capable of long - context modeling and design in the biological domain.
- StripedHyena Architecture: Enables single - nucleotide, byte - level resolution sequence modeling with near - linear scaling of compute and memory.
- Intermediate Checkpoints: We release weights of 15 intermediate pretraining checkpoints for phase 1 and phase 2 of pretraining.
📦 Installation
To use StripedHyena outside of the playground, you will need to install custom kernels. Please follow the instructions from the standalone repository.
💻 Usage Examples
Example usage is provided in the standalone repo.
📚 Documentation
About
Evo uses the StripedHyena architecture to enable modeling of sequences at a single - nucleotide, byte - level resolution with near - linear scaling of compute and memory relative to context length. Evo has 7 billion parameters and is trained on OpenGenome, a prokaryotic whole - genome dataset containing ~300 billion tokens.
Evo-1 (Phase 1) is our first model in the Evo family, trained at a context length of 8k.
Checkpoint Name |
Description |
evo-1-8k-base |
A model pretrained with 8,192 context. We use this model as the base model for molecular - scale finetuning tasks. |
evo-1-131k-base |
A model pretrained with 131,072 context using evo-1-8k-base as the initialization. We use this model to reason about and generate sequences at the genome scale. |
Model Architecture
StripedHyena is a deep signal processing, hybrid architecture composed of multi - head attention and gated convolutions arranged in Hyena blocks, improving over decoder - only Transformers.
Some highlights of the architecture:
- Efficient autoregressive generation via a recurrent mode (>500k generation with a single 80GB GPU)
- Significantly faster training and finetuning at long context (>3x at 131k)
- Improved scaling laws over state - of - the - art architectures (e.g., Transformer++) on both natural language and biological sequences.
- Robust to training beyond the compute - optimal frontier e.g., training way beyond Chinchilla - optimal token amounts (see preprint for details -- more details to come)
Parametrization for Inference and Finetuning
One of the advantages of deep signal processing models is their flexibility. Different parametrizations of convolutions can be used depending on the memory, expressivity and causality requirements of pretraining, finetuning or inference workloads.
The main classes are:
StripedHyena is a mixed precision model. Make sure to keep your poles
and residues
in float32
precision, especially for longer prompts or training.
📄 License
This project is licensed under the Apache-2.0 license.
Cite
@article{nguyen2024sequence,
author = {Eric Nguyen and Michael Poli and Matthew G. Durrant and Brian Kang and Dhruva Katrekar and David B. Li and Liam J. Bartie and Armin W. Thomas and Samuel H. King and Garyk Brixi and Jeremy Sullivan and Madelena Y. Ng and Ashley Lewis and Aaron Lou and Stefano Ermon and Stephen A. Baccus and Tina Hernandez-Boussard and Christopher Ré and Patrick D. Hsu and Brian L. Hie },
title = {Sequence modeling and design from molecular to genome scale with Evo},
journal = {Science},
volume = {386},
number = {6723},
pages = {eado9336},
year = {2024},
doi = {10.1126/science.ado9336},
URL = {https://www.science.org/doi/abs/10.1126/science.ado9336},