đ Granite-3.0-8B-Instruct
Granite-3.0-8B-Instruct is an 8B parameter model. It's finetuned from Granite-3.0-8B-Base, using a mix of open - source instruction datasets with permissive licenses and internally collected synthetic datasets. It can be used to build AI assistants for multiple domains, offering capabilities like summarization, text classification, and more.
đ Quick Start
Installation
First, install the necessary libraries:
pip install torch torchvision torchaudio
pip install accelerate
pip install transformers
Usage
Then, use the following code snippet for text generation:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "auto"
model_path = "ibm-granite/granite-3.0-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
chat = [
{ "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
output = model.generate(**input_tokens,
max_new_tokens=100)
output = tokenizer.batch_decode(output)
print(output)
⨠Features
- Multilingual Support: Supports English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Can be finetuned for other languages.
- Diverse Capabilities: Capable of summarization, text classification, text extraction, question - answering, Retrieval Augmented Generation (RAG), code - related tasks, function - calling tasks, and multilingual dialog use cases.
đ Documentation
Model Summary
Granite-3.0-8B-Instruct is finetuned from Granite-3.0-8B-Base using a combination of open - source instruction datasets with permissive license and internally collected synthetic datasets. It is developed using diverse techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.
Model Architecture
Granite-3.0-8B-Instruct is based on a decoder - only dense transformer architecture. Core components are GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
Property |
Details |
Model Type |
Decoder - only dense transformer |
Embedding size |
4096 |
Number of layers |
40 |
Attention head size |
128 |
Number of attention heads |
32 |
Number of KV heads |
8 |
MLP hidden size |
12800 |
MLP activation |
SwiGLU |
Initialization std |
0.1 |
Sequence Length |
4096 |
Position Embedding |
RoPE |
# Parameters |
8.1B |
# Active Parameters |
8.1B |
# Training tokens |
12T |
Training Data
The SFT data mainly comes from three sources: publicly available datasets with permissive license, internal synthetic data targeting specific capabilities, and a small amount of human - curated data. Detailed dataset attribution can be found in the Granite Technical Report and Accompanying Author List.
Infrastructure
The model is trained using IBM's super - computing cluster, Blue Vela, equipped with NVIDIA H100 GPUs. The cluster uses 100% renewable energy, providing a scalable and efficient infrastructure for training.
Ethical Considerations and Limitations
The model is primarily finetuned using English instruction - response pairs and multilingual data for eleven languages. Its performance in non - English tasks may vary. Introducing a small number of examples (few - shot) can improve accuracy. Also, the model may produce inaccurate, biased, or unsafe responses, so proper safety testing and tuning are recommended.
đ§ Technical Details
Model Evaluation
The model has been evaluated on various datasets for text - generation tasks:
Dataset |
pass@1 Value |
IFEval |
52.27 |
MT - Bench |
8.22 |
AGI - Eval |
40.52 |
MMLU |
65.82 |
MMLU - Pro |
34.45 |
OBQA |
46.6 |
SIQA |
71.21 |
Hellaswag |
82.61 |
WinoGrande |
77.51 |
TruthfulQA |
60.32 |
BoolQ |
88.65 |
SQuAD 2.0 |
21.58 |
ARC - C |
64.16 |
GPQA |
33.81 |
BBH |
51.55 |
HumanEvalSynthesis |
64.63 |
HumanEvalExplain |
57.16 |
HumanEvalFix |
65.85 |
MBPP |
49.6 |
GSM8K |
68.99 |
MATH |
30.94 |
PAWS - X (7 langs) |
64.94 |
MGSM (6 langs) |
48.2 |
đ License
This model is released under the Apache 2.0 license.
Additional Resources
- Learn about the latest updates with Granite: https://www.ibm.com/granite
- Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
- Learn about the latest Granite learning resources: https://ibm.biz/granite - learning - resources
â ī¸ Important Note
Although this model has been aligned with safety in mind, it may produce inaccurate, biased, or unsafe responses. The community is urged to conduct proper safety testing and tuning for specific tasks.
đĄ Usage Tip
When dealing with non - English tasks, introducing a small number of examples (few - shot) can help the model generate more accurate outputs.