Snowflake-Arctic-Base Open-Source Large Language Model - Free Deployment for Efficiently Generating Text and Code

Snowflake Arctic Base

Developed by Snowflake

Snowflake Arctic is a large language model developed by Snowflake AI Research Team, featuring a dense Mixture of Experts (MoE) architecture with 480 billion parameters, specifically designed for efficient text and code generation.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Mixture of Experts Model #Enterprise AI #Efficient Inference

Downloads 166

Release Time : 4/23/2024

Model Overview

Arctic is a Mixture of Experts model based on the Transformer architecture, combining dense structures and MoE technology, suitable for text understanding and generation tasks.

Model Features

Efficient Mixture of Experts Architecture

Combines dense Transformer and MoE technology to optimize computational efficiency while maintaining high performance

Open Source License

Released under Apache-2.0 license, allowing free use for research, prototyping, and commercial products

Quantization Support

Supports FP8 and FP6 quantization to reduce inference resource requirements

Model Capabilities

Text generation

Code generation

Mathematical problem solving

Instruction following

Use Cases

Enterprise Applications

Business Document Generation

Automatically generate professional documents such as business reports and contracts

Education

Mathematical Problem Solving

Solve mathematical problems such as algebraic equations

Example demonstrates equation-solving capability

🚀 Arctic: A Dense-MoE Hybrid Transformer Model

Arctic is a dense-MoE Hybrid transformer architecture developed by the Snowflake AI Research Team. It offers pre - trained model checkpoints for both base and instruct - tuned versions, available under the Apache - 2.0 license, enabling free use in research, prototypes, and products.

🚀 Quick Start

Arctic is currently supported with transformers by leveraging the custom code feature. To use it, simply add trust_remote_code=True to your AutoTokenizer and AutoModelForCausalLM calls. However, it's recommended to use a transformers version at or above 4.39:

pip install transformers>=4.39.0

Arctic also leverages several features from DeepSpeed. You'll need to install DeepSpeed 0.14.2 or higher to get all the required features:

pip install deepspeed>=0.14.2

✨ Features

Hybrid Architecture: Combines a 10B dense transformer model with a residual 128x3.66B MoE MLP, resulting in 480B total and 17B active parameters chosen using a top - 2 gating.
Free to Use: Released under an Apache - 2.0 license, allowing free use in various projects.
Multiple Versions: Available in both base and instruct - tuned versions.

📦 Installation

To use Arctic, you need to install the required libraries as mentioned above:

pip install transformers>=4.39.0
pip install deepspeed>=0.14.2

💻 Usage Examples

Basic Usage

import os
# enable hf_transfer for faster ckpt download
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from deepspeed.linear.config import QuantizationConfig

tokenizer = AutoTokenizer.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True
)
quant_config = QuantizationConfig(q_bits=8)

model = AutoModelForCausalLM.from_pretrained(
    "Snowflake/snowflake-arctic-instruct",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    device_map="auto",
    ds_quantization_config=quant_config,
    max_memory={i: "150GiB" for i in range(8)},
    torch_dtype=torch.bfloat16)


content = "5x + 35 = 7x - 60 + 10. Solve for x"
messages = [{"role": "user", "content": content}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(input_ids=input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Advanced Usage

Due to the model size, it's recommended to use a single 8xH100 instance from a cloud provider like AWS p5.48xlarge or Azure ND96isr_H100_v5. You can also use FP6 quantization by specifying q_bits = 6 in the QuantizationConfig config.

📚 Documentation

For more information on Arctic, including details about its architecture, training process, data, etc., see our series of cookbooks.

The Arctic github page has additional code snippets and examples around running inference:

Example with pure - HF: https://github.com/Snowflake-Labs/snowflake-arctic/blob/main/inference
Tutorial using vLLM: https://github.com/Snowflake-Labs/snowflake-arctic/tree/main/inference/vllm

🔧 Technical Details

Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top - 2 gating.

📄 License

This model is released under the Apache - 2.0 license.

Model Information Table

Property	Details
Model Type	Dense - MoE Hybrid transformer
Model developers	Snowflake AI Research Team
License	Apache - 2.0
Input	Text only
Output	Text and code only
Model Release Date	April, 24th 2024
Model Checkpoints	Arctic - Base, Arctic - Instruct
Latest Details	https://github.com/Snowflake-Labs/snowflake-arctic
Blog	Snowflake Arctic: The Best LLM for Enterprise AI — Efficiently Intelligent, Truly Open

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご