đ Granite-3.1-3B-A800M-Instruct
Granite-3.1-3B-A800M-Instruct is a 3B parameter long-context instruct model, which can be used to build AI assistants for multiple domains, including business applications.
đ Quick Start
Granite-3.1-3B-A800M-Instruct is a 3B parameter long-context instruct model finetuned from Granite-3.1-3B-A800M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.
⨠Features
Supported Languages
English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages.
Intended Use
The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications.
Capabilities
- Summarization
- Text classification
- Text extraction
- Question-answering
- Retrieval Augmented Generation (RAG)
- Code related tasks
- Function-calling tasks
- Multilingual dialog use cases
- Long-context tasks including long document/meeting summarization, long document QA, etc.
đĻ Installation
Install the following libraries:
pip install torch torchvision torchaudio
pip install accelerate
pip install transformers
đģ Usage Examples
Basic Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "auto"
model_path = "ibm-granite/granite-3.1-3b-a800m-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
chat = [
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens,
max_new_tokens=100)
output = tokenizer.batch_decode(output)
print(output)
đ Documentation
Evaluation Results
HuggingFace Open LLM Leaderboard V1
Models |
ARC-Challenge |
Hellaswag |
MMLU |
TruthfulQA |
Winogrande |
GSM8K |
Avg |
Granite-3.1-8B-Instruct |
62.62 |
84.48 |
65.34 |
66.23 |
75.37 |
73.84 |
71.31 |
Granite-3.1-2B-Instruct |
54.61 |
75.14 |
55.31 |
59.42 |
67.48 |
52.76 |
60.79 |
Granite-3.1-3B-A800M-Instruct |
50.42 |
73.01 |
52.19 |
49.71 |
64.87 |
48.97 |
56.53 |
Granite-3.1-1B-A400M-Instruct |
42.66 |
65.97 |
26.13 |
46.77 |
62.35 |
33.88 |
46.29 |
HuggingFace Open LLM Leaderboard V2
Models |
IFEval |
BBH |
MATH Lvl 5 |
GPQA |
MUSR |
MMLU-Pro |
Avg |
Granite-3.1-8B-Instruct |
72.08 |
34.09 |
21.68 |
8.28 |
19.01 |
28.19 |
30.55 |
Granite-3.1-2B-Instruct |
62.86 |
21.82 |
11.33 |
5.26 |
4.87 |
20.21 |
21.06 |
Granite-3.1-3B-A800M-Instruct |
55.16 |
16.69 |
10.35 |
5.15 |
2.51 |
12.75 |
17.1 |
Granite-3.1-1B-A400M-Instruct |
46.86 |
6.18 |
4.08 |
0 |
0.78 |
2.41 |
10.05 |
Model Architecture
Granite-3.1-3B-A800M-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
Model |
2B Dense |
8B Dense |
1B MoE |
3B MoE |
Embedding size |
2048 |
4096 |
1024 |
1536 |
Number of layers |
40 |
40 |
24 |
32 |
Attention head size |
64 |
128 |
64 |
64 |
Number of attention heads |
[Original value] |
[Original value] |
[Original value] |
[Original value] |
đ License
Apache 2.0
Additional Information