Model Overview
Model Features
Model Capabilities
Use Cases
🚀 DBRX Instruct
- DBRX Instruct is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. It specializes in few-turn interactions.
- Both DBRX Instruct and its underlying pretrained base model, DBRX Base, are released under an open license.
- This repository is for DBRX Instruct. You can find DBRX Base here.
- For full details on the DBRX models, read our technical blog post.
🚀 Quick Start
NOTE: This is DBRX Instruct, which has been instruction finetuned. If you're looking for the base model, use DBRX Base.
Getting started with DBRX models is easy using the transformers
library. The model requires ~264GB of RAM and the following packages:
pip install "transformers>=4.40.0"
To speed up download time, use the hf_transfer
package as described by Huggingface here.
pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
You need to request access to this repository to download the model. Once granted, obtain an access token with read
permission and supply it below.
Run the model on multiple GPUs:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, token="hf_YOUR_TOKEN")
input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
If your GPU system supports FlashAttention2, add attn_implementation=”flash_attention_2”
as a keyword to AutoModelForCausalLM.from_pretrained()
for faster inference.
✨ Features
- Fine - grained MoE Architecture: DBRX uses a fine - grained mixture - of - experts (MoE) architecture with 132B total parameters, 36B of which are active on any input. It has 16 experts and chooses 4, providing 65x more possible combinations of experts compared to other open MoE models like Mixtral - 8x7B and Grok - 1.
- Advanced Techniques: It uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA), along with a converted version of the GPT - 4 tokenizer.
- Large - scale Pretraining: Pretrained on 12T tokens of text and code data with a maximum context length of 32K tokens. Curriculum learning was used during pretraining to improve model quality.
📦 Installation
The model requires ~264GB of RAM and the following packages:
pip install "transformers>=4.40.0"
Optionally, for faster downloads:
pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
💻 Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, token="hf_YOUR_TOKEN")
input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
Advanced Usage
If your GPU system supports FlashAttention2, you can add attn_implementation=”flash_attention_2”
as a keyword to AutoModelForCausalLM.from_pretrained()
to achieve faster inference:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, token="hf_YOUR_TOKEN", attn_implementation="flash_attention_2")
input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))
📚 Documentation
Model Overview
DBRX is a transformer - based decoder - only large language model (LLM) trained using next - token prediction.
- Inputs: It only accepts text - based inputs with a context length of up to 32768 tokens.
- Outputs: It only produces text - based outputs.
- Model Architecture: More detailed information can be found in our technical blog post.
- License: Databricks Open Model License
- Acceptable Use Policy: Databricks Open Model Acceptable Use Policy
- Version: 1.0
- Owner: Databricks, Inc.
Usage
- Download on HuggingFace: DBRX Base and DBRX Instruct are available for download on HuggingFace. This is the HF repository for DBRX Instruct; DBRX Base can be found here.
- GitHub Repository: The DBRX model repository can be found on GitHub here.
- Databricks Foundation Model APIs: DBRX Base and DBRX Instruct are available with Databricks Foundation Model APIs via both Pay - per - token and Provisioned Throughput endpoints.
- Fine - tuning: For more information on fine - tuning using LLM - Foundry, see our LLM pretraining and fine - tuning documentation.
Limitations and Ethical Considerations
Training Dataset Limitations
- The DBRX models were trained on 12T tokens of text with a knowledge cutoff date of December 2023.
- The training mix contains both natural - language and code examples, with the vast majority of data in English. DBRX was not tested for non - English proficiency, so it's a generalist model for English - based text use.
- DBRX does not have multimodal capabilities.
Associated Risks and Recommendations
All foundation models carry risks and may output inaccurate, incomplete, biased, or offensive information. Users should evaluate the output for accuracy and appropriateness before use. Databricks recommends using retrieval augmented generation (RAG) for accuracy - critical scenarios. Additional safety testing is recommended for users or fine - tuners.
Intended Uses
Intended Use Cases
The DBRX models are open, general - purpose LLMs licensed for commercial and research applications. They can be fine - tuned for various domain - specific tasks and DBRX Instruct can be used for few - turn question answering in English and coding tasks.
Out - of - Scope Use Cases
DBRX models are not for out - of - the - box use in non - English languages, do not support native code execution or function - calling, and should not be used in violation of applicable laws or the license and acceptable use policy.
Training Stack
MoE models are complex to train. DBRX Base and DBRX Instruct training was supported by Databricks’ infrastructure, including Composer, Streaming, Megablocks, and LLM Foundry.
- Composer: Core library for large - scale training, providing an optimized training loop, checkpointing, logging, model sharding, and more.
- Streaming: Enables fast, low - cost, and scalable training on large datasets from cloud storage, handling various challenges related to training.
- Megablocks: A lightweight library for MoE training, supporting “dropless MoE” for deterministic outputs.
- LLM Foundry: Ties all libraries together for a simple LLM pretraining, fine - tuning, and inference experience.
Evaluation
DBRX outperforms established open - source and open - weight base models on the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval. Full evaluation details are in our technical blog post.
Acknowledgements
The DBRX models are largely thanks to the open - source community, especially the MegaBlocks library and PyTorch FSDP.
📄 License
The DBRX models are released under the Databricks Open Model License.

