dbrx - instruct Open-source Large Language Model - Free to use for scenarios with a small number of interaction rounds

Home

Dbrx Instruct

Developed by databricks

A Mixture of Experts (MoE) large language model developed by Databricks, specialized in few-turn interaction scenarios

Large Language Model

Transformers

Open Source License:Other #Mixture of Experts Architecture #English Q&A Optimization #32K Long Context

Downloads 5,005

Release Time : 3/26/2024

Model Overview

DBRX is a decoder-only large language model based on the Transformer architecture, achieving efficient inference through fine-grained mixture of experts, suitable for English Q&A and code generation tasks

Model Features

Fine-grained Mixture of Experts Architecture

Adopts a 16-select-4 expert combination approach, offering 65x more combination possibilities than traditional 8-select-2 architectures

High-quality Pretraining Data

Trained on 12 trillion carefully selected tokens, with data quality improved by over 2x compared to previous generations

Long Context Support

Supports context lengths of up to 32K tokens

Enterprise Deployment Support

Provides per-token billing and provisioned throughput options through Databricks Foundation Model API

Model Capabilities

English Text Generation

Code Generation and Completion

Instruction Following

Knowledge Q&A

Use Cases

Enterprise Applications

Intelligent Customer Service

Building automated customer service systems for English scenarios

Reduces manual customer service workload

Technical Documentation Generation

Automatically generates API documentation and code comments

Improves development efficiency

Development Tools

Programming Assistant

Integrated into IDEs to provide code suggestions

Accelerates development processes

🚀 DBRX Instruct

DBRX Instruct is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. It specializes in few-turn interactions.
Both DBRX Instruct and its underlying pretrained base model, DBRX Base, are released under an open license.
This repository is for DBRX Instruct. You can find DBRX Base here.
For full details on the DBRX models, read our technical blog post.

🚀 Quick Start

NOTE: This is DBRX Instruct, which has been instruction finetuned. If you're looking for the base model, use DBRX Base.

Getting started with DBRX models is easy using the transformers library. The model requires ~264GB of RAM and the following packages:

pip install "transformers>=4.40.0"

To speed up download time, use the hf_transfer package as described by Huggingface here.

pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1

You need to request access to this repository to download the model. Once granted, obtain an access token with read permission and supply it below.

Run the model on multiple GPUs:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, token="hf_YOUR_TOKEN")

input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))

If your GPU system supports FlashAttention2, add attn_implementation=”flash_attention_2” as a keyword to AutoModelForCausalLM.from_pretrained() for faster inference.

✨ Features

Fine - grained MoE Architecture: DBRX uses a fine - grained mixture - of - experts (MoE) architecture with 132B total parameters, 36B of which are active on any input. It has 16 experts and chooses 4, providing 65x more possible combinations of experts compared to other open MoE models like Mixtral - 8x7B and Grok - 1.
Advanced Techniques: It uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA), along with a converted version of the GPT - 4 tokenizer.
Large - scale Pretraining: Pretrained on 12T tokens of text and code data with a maximum context length of 32K tokens. Curriculum learning was used during pretraining to improve model quality.

📦 Installation

The model requires ~264GB of RAM and the following packages:

pip install "transformers>=4.40.0"

Optionally, for faster downloads:

pip install hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, token="hf_YOUR_TOKEN")

input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))

Advanced Usage

If your GPU system supports FlashAttention2, you can add attn_implementation=”flash_attention_2” as a keyword to AutoModelForCausalLM.from_pretrained() to achieve faster inference:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, token="hf_YOUR_TOKEN", attn_implementation="flash_attention_2")

input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))

📚 Documentation

Model Overview

DBRX is a transformer - based decoder - only large language model (LLM) trained using next - token prediction.

Inputs: It only accepts text - based inputs with a context length of up to 32768 tokens.
Outputs: It only produces text - based outputs.
Model Architecture: More detailed information can be found in our technical blog post.
License: Databricks Open Model License
Acceptable Use Policy: Databricks Open Model Acceptable Use Policy
Version: 1.0
Owner: Databricks, Inc.

Usage

Download on HuggingFace: DBRX Base and DBRX Instruct are available for download on HuggingFace. This is the HF repository for DBRX Instruct; DBRX Base can be found here.
GitHub Repository: The DBRX model repository can be found on GitHub here.
Databricks Foundation Model APIs: DBRX Base and DBRX Instruct are available with Databricks Foundation Model APIs via both Pay - per - token and Provisioned Throughput endpoints.
Fine - tuning: For more information on fine - tuning using LLM - Foundry, see our LLM pretraining and fine - tuning documentation.

Limitations and Ethical Considerations

Training Dataset Limitations

The DBRX models were trained on 12T tokens of text with a knowledge cutoff date of December 2023.
The training mix contains both natural - language and code examples, with the vast majority of data in English. DBRX was not tested for non - English proficiency, so it's a generalist model for English - based text use.
DBRX does not have multimodal capabilities.

Associated Risks and Recommendations

All foundation models carry risks and may output inaccurate, incomplete, biased, or offensive information. Users should evaluate the output for accuracy and appropriateness before use. Databricks recommends using retrieval augmented generation (RAG) for accuracy - critical scenarios. Additional safety testing is recommended for users or fine - tuners.

Intended Uses

Intended Use Cases

The DBRX models are open, general - purpose LLMs licensed for commercial and research applications. They can be fine - tuned for various domain - specific tasks and DBRX Instruct can be used for few - turn question answering in English and coding tasks.

Out - of - Scope Use Cases

DBRX models are not for out - of - the - box use in non - English languages, do not support native code execution or function - calling, and should not be used in violation of applicable laws or the license and acceptable use policy.

Training Stack

MoE models are complex to train. DBRX Base and DBRX Instruct training was supported by Databricks’ infrastructure, including Composer, Streaming, Megablocks, and LLM Foundry.

Composer: Core library for large - scale training, providing an optimized training loop, checkpointing, logging, model sharding, and more.
Streaming: Enables fast, low - cost, and scalable training on large datasets from cloud storage, handling various challenges related to training.
Megablocks: A lightweight library for MoE training, supporting “dropless MoE” for deterministic outputs.
LLM Foundry: Ties all libraries together for a simple LLM pretraining, fine - tuning, and inference experience.

Evaluation

DBRX outperforms established open - source and open - weight base models on the Databricks Model Gauntlet, the Hugging Face Open LLM Leaderboard, and HumanEval. Full evaluation details are in our technical blog post.

Acknowledgements

The DBRX models are largely thanks to the open - source community, especially the MegaBlocks library and PyTorch FSDP.

📄 License

The DBRX models are released under the Databricks Open Model License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご