Snowflake-Arctic-Instruct Open-Source Large Language Model - 480 billion parameters are free to use, with powerful functions and wide applications.

Snowflake Arctic Instruct

Developed by Snowflake

Arctic is a dense Mixture of Experts (MoE) architecture large language model developed by the Snowflake AI Research team, with 480 billion parameters, open-sourced under the Apache-2.0 license.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Mixture of Experts Model #Enterprise-grade AI #Efficient Inference

Downloads 10.94k

Release Time : 4/21/2024

Model Overview

Arctic is an efficient large language model combining dense Transformer and Mixture of Experts architectures, suitable for text generation and code generation tasks.

Model Features

Efficient Mixture of Experts Architecture

Combines dense Transformer with MoE architecture, activating only 17B parameters during inference for efficient computation.

Enterprise-Grade Open Model

Fully open-sourced under Apache-2.0 license, suitable for commercial and research use.

Quantization Support

Supports FP8 and FP6 quantization to reduce inference resource requirements.

Model Capabilities

Text Generation

Code Generation

Mathematical Problem Solving

Instruction Following

Use Cases

Enterprise Applications

Business Intelligence Q&A

Used to build enterprise knowledge Q&A systems

Education

Mathematical Problem Solving

Helps students solve mathematical equations

As demonstrated in the example of linear equation solving capability

🚀 Arctic Model

Arctic is a pre - trained dense - MoE Hybrid transformer architecture developed by Snowflake AI Research Team, offering both base and instruct - tuned versions for free use under Apache - 2.0 license.

🚀 Quick Start

Arctic is a dense - MoE Hybrid transformer architecture pre - trained from scratch by the Snowflake AI Research Team. We are releasing model checkpoints for both the base and instruct - tuned versions of Arctic under an Apache - 2.0 license. This allows you to freely use them in your research, prototypes, and products.

For more information on Arctic and links to other relevant resources like our series of cookbooks about training custom MoE models, producing high - quality training data, etc., visit our blog.

For the latest details about Snowflake Arctic including tutorials, refer to our GitHub repo: Snowflake Arctic GitHub.

Try a live demo with our Streamlit app.

Model developers: Snowflake AI Research Team License: Apache - 2.0 Input: Models input text only. Output: Models generate text and code only. Model Release Date: April, 24th 2024.

✨ Features

Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP, resulting in 480B total and 17B active parameters chosen using a top - 2 gating.
It is currently supported with transformers by leveraging the custom code feature.

📦 Installation

Arctic is currently supported with transformers. To use it, you need to add trust_remote_code=True to your AutoTokenizer and AutoModelForCausalLM calls. We recommend using a transformers version at or above 4.39:

pip install transformers>=4.39.0

Arctic leverages several features from DeepSpeed. You need to install DeepSpeed 0.14.2 or higher to get all the required features:

pip install deepspeed>=0.14.2

💻 Usage Examples

Basic Usage

Arctic is currently supported with transformers by leveraging the custom code feature. To use this, simply add trust_remote_code=True to your AutoTokenizer and AutoModelForCausalLM calls.

Advanced Usage

Due to the model size, we recommend using a single 8xH100 instance from your favorite cloud provider such as AWS p5.48xlarge, Azure ND96isr_H100_v5.

In this example, we are using FP8 quantization provided by DeepSpeed in the backend. We can also use FP6 quantization by specifying q_bits = 6 in the QuantizationConfig config. The "150GiB" setting for max_memory is required until we can get DeepSpeed's FP quantization supported natively as a HFQuantizer, which we are actively working on.

import os
# enable hf_transfer for faster ckpt download
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from deepspeed.linear.config import QuantizationConfig

tokenizer = AutoTokenizer.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True
)
quant_config = QuantizationConfig(q_bits=8)

model = AutoModelForCausalLM.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    device_map="auto",
    ds_quantization_config=quant_config,
    max_memory={i: "150GiB" for i in range(8)},
    torch_dtype=torch.bfloat16)


content = "5x + 35 = 7x - 60 + 10. Solve for x"
messages = [{"role": "user", "content": content}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids=input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

The Arctic GitHub page has additional code snippets and examples around running inference:

Example with pure - HF: Snowflake Arctic pure - HF inference
Tutorial using vLLM: Snowflake Arctic vLLM tutorial

🔧 Technical Details

Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top - 2 gating. For more details about Arctic's model Architecture, training process, data, etc. see our series of cookbooks.

📄 License

The model is released under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご