Granite-3.0-8B-Instruct Open-Source Model - Free Deployment, Trained on Multiple Datasets with Wide Applications

Granite 3.0 8b Instruct

Developed by ibm-granite

Granite-3.0-8B-Instruct is an 8-billion-parameter model fine-tuned from Granite-3.0-8B-Base, trained using a combination of various open-source instruction datasets and internally synthesized datasets.

Large Language Model

Transformers

Open Source License:Apache-2.0 #Multilingual Instruction Fine-tuning #Commercial AI Assistant #Code Generation Optimization

Downloads 24.29k

Release Time : 10/2/2024

Model Overview

This model was developed through supervised fine-tuning, reinforcement learning alignment, and model fusion techniques. It adopts a structured dialogue format and is designed to respond to general instructions, suitable for building AI assistants across various domains.

Model Features

Multilingual Support

Supports 12 languages including English and Chinese, suitable for multilingual dialogue scenarios.

High-Performance Inference

Excels in multiple benchmarks such as MMLU and GSM8K.

Structured Dialogue Format

Adopts a structured dialogue format, ideal for building AI assistants and commercial applications.

Extensive Training Data

Trained using a combination of various open-source instruction datasets and internally synthesized datasets.

Model Capabilities

Text Generation

Instruction Following

Multilingual Processing

Code Generation

Mathematical Reasoning

Logical Reasoning

Commonsense Reasoning

Reading Comprehension

Use Cases

Commercial Applications

AI Assistant

Build general-purpose AI assistants to respond to various instructions and questions.

Scores 8.22 on MT-Bench.

Text Summarization

Generate concise summaries of text.

Education

Q&A System

Answer student questions and provide learning assistance.

Scores 65.82 on MMLU.

Programming

Code Generation

Generate code snippets based on descriptions.

Scores 64.63 on HumanEvalSynthesis.

🚀 Granite-3.0-8B-Instruct

Granite-3.0-8B-Instruct is an 8B parameter model. It's finetuned from Granite-3.0-8B-Base, using a mix of open - source instruction datasets with permissive licenses and internally collected synthetic datasets. It can be used to build AI assistants for multiple domains, offering capabilities like summarization, text classification, and more.

🚀 Quick Start

Installation

First, install the necessary libraries:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

Usage

Then, use the following code snippet for text generation:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "auto"
model_path = "ibm-granite/granite-3.0-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output)

✨ Features

Multilingual Support: Supports English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Can be finetuned for other languages.
Diverse Capabilities: Capable of summarization, text classification, text extraction, question - answering, Retrieval Augmented Generation (RAG), code - related tasks, function - calling tasks, and multilingual dialog use cases.

📚 Documentation

Model Summary

Granite-3.0-8B-Instruct is finetuned from Granite-3.0-8B-Base using a combination of open - source instruction datasets with permissive license and internally collected synthetic datasets. It is developed using diverse techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.

Model Architecture

Granite-3.0-8B-Instruct is based on a decoder - only dense transformer architecture. Core components are GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.

Property	Details
Model Type	Decoder - only dense transformer
Embedding size	4096
Number of layers	40
Attention head size	128
Number of attention heads	32
Number of KV heads	8
MLP hidden size	12800
MLP activation	SwiGLU
Initialization std	0.1
Sequence Length	4096
Position Embedding	RoPE
# Parameters	8.1B
# Active Parameters	8.1B
# Training tokens	12T

Training Data

The SFT data mainly comes from three sources: publicly available datasets with permissive license, internal synthetic data targeting specific capabilities, and a small amount of human - curated data. Detailed dataset attribution can be found in the Granite Technical Report and Accompanying Author List.

Infrastructure

The model is trained using IBM's super - computing cluster, Blue Vela, equipped with NVIDIA H100 GPUs. The cluster uses 100% renewable energy, providing a scalable and efficient infrastructure for training.

Ethical Considerations and Limitations

The model is primarily finetuned using English instruction - response pairs and multilingual data for eleven languages. Its performance in non - English tasks may vary. Introducing a small number of examples (few - shot) can improve accuracy. Also, the model may produce inaccurate, biased, or unsafe responses, so proper safety testing and tuning are recommended.

🔧 Technical Details

Model Evaluation

The model has been evaluated on various datasets for text - generation tasks:

Dataset	pass@1 Value
IFEval	52.27
MT - Bench	8.22
AGI - Eval	40.52
MMLU	65.82
MMLU - Pro	34.45
OBQA	46.6
SIQA	71.21
Hellaswag	82.61
WinoGrande	77.51
TruthfulQA	60.32
BoolQ	88.65
SQuAD 2.0	21.58
ARC - C	64.16
GPQA	33.81
BBH	51.55
HumanEvalSynthesis	64.63
HumanEvalExplain	57.16
HumanEvalFix	65.85
MBPP	49.6
GSM8K	68.99
MATH	30.94
PAWS - X (7 langs)	64.94
MGSM (6 langs)	48.2

📄 License

This model is released under the Apache 2.0 license.

Additional Resources

Learn about the latest updates with Granite: https://www.ibm.com/granite
Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
Learn about the latest Granite learning resources: https://ibm.biz/granite - learning - resources

⚠️ Important Note

Although this model has been aligned with safety in mind, it may produce inaccurate, biased, or unsafe responses. The community is urged to conduct proper safety testing and tuning for specific tasks.

💡 Usage Tip

When dealing with non - English tasks, introducing a small number of examples (few - shot) can help the model generate more accurate outputs.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご