Granite-3.1-3B-A800M-Instruct Open-source Model - Supports Multilingual Tasks and Handles Long Contexts with Ease

Granite 3.1 3b A800m Instruct

Developed by ibm-granite

A 3-billion-parameter long-context instruction model fine-tuned based on Granite-3.1-3B-A800M-Base, supporting multilingual tasks

Large Language Model

Transformers

Open Source License:Apache-2.0 #Long-context instruction #Multilingual tasks #Commercial AI assistant

Downloads 36.16k

Release Time : 12/6/2024

Model Overview

Combines open-source instruction datasets with proprietary synthetic datasets designed to address long-context challenges, suitable for general instruction responses and cross-domain AI assistant development

Model Features

Long-context processing

Specially optimized architecture design for effectively handling long-context tasks such as document/meeting summarization

Multilingual support

Supports text generation and understanding in 12 languages, with extensibility to other languages

Reinforcement learning alignment

Employs a combined approach of supervised fine-tuning and reinforcement learning to enhance instruction-following capabilities

Model Capabilities

Text summarization

Text classification

Information extraction

Question answering systems

Retrieval-augmented generation (RAG)

Code-related tasks

Function calling tasks

Multilingual dialogue

Long-context understanding

Use Cases

Enterprise applications

Meeting minutes generation

Automatically generates summaries of lengthy meeting recordings/texts

Supports coherent processing of ultra-long contexts

Technical document analysis

Parses complex technical documents and answers professional questions

15% accuracy improvement over base models

Multilingual services

Cross-border customer service assistant

Multilingual intelligent customer support system

Covers customer inquiries in 12 languages

🚀 Granite-3.1-3B-A800M-Instruct

Granite-3.1-3B-A800M-Instruct is a 3B parameter long-context instruct model, which can be used to build AI assistants for multiple domains, including business applications.

🚀 Quick Start

Granite-3.1-3B-A800M-Instruct is a 3B parameter long-context instruct model finetuned from Granite-3.1-3B-A800M-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets tailored for solving long context problems. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging.

✨ Features

Supported Languages

English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. Users may finetune Granite 3.1 models for languages beyond these 12 languages.

Intended Use

The model is designed to respond to general instructions and can be used to build AI assistants for multiple domains, including business applications.

Capabilities

Summarization
Text classification
Text extraction
Question-answering
Retrieval Augmented Generation (RAG)
Code related tasks
Function-calling tasks
Multilingual dialog use cases
Long-context tasks including long document/meeting summarization, long document QA, etc.

📦 Installation

Install the following libraries:

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "auto"
model_path = "ibm-granite/granite-3.1-3b-a800m-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device)
model.eval()
# change input text as desired
chat = [
    { "role": "user", "content": "Please list one IBM Research laboratory located in the United States. You should only output its name and location." },
]
chat = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# tokenize the text
input_tokens = tokenizer(chat, return_tensors="pt").to(device)
# generate output tokens
output = model.generate(**input_tokens, 
                        max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# print output
print(output)

📚 Documentation

Evaluation Results

HuggingFace Open LLM Leaderboard V1

Models	ARC-Challenge	Hellaswag	MMLU	TruthfulQA	Winogrande	GSM8K	Avg
Granite-3.1-8B-Instruct	62.62	84.48	65.34	66.23	75.37	73.84	71.31
Granite-3.1-2B-Instruct	54.61	75.14	55.31	59.42	67.48	52.76	60.79
Granite-3.1-3B-A800M-Instruct	50.42	73.01	52.19	49.71	64.87	48.97	56.53
Granite-3.1-1B-A400M-Instruct	42.66	65.97	26.13	46.77	62.35	33.88	46.29

HuggingFace Open LLM Leaderboard V2

Models	IFEval	BBH	MATH Lvl 5	GPQA	MUSR	MMLU-Pro	Avg
Granite-3.1-8B-Instruct	72.08	34.09	21.68	8.28	19.01	28.19	30.55
Granite-3.1-2B-Instruct	62.86	21.82	11.33	5.26	4.87	20.21	21.06
Granite-3.1-3B-A800M-Instruct	55.16	16.69	10.35	5.15	2.51	12.75	17.1
Granite-3.1-1B-A400M-Instruct	46.86	6.18	4.08	0	0.78	2.41	10.05

Model Architecture

Granite-3.1-3B-A800M-Instruct is based on a decoder-only dense transformer architecture. Core components of this architecture are: GQA and RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.

Model	2B Dense	8B Dense	1B MoE	3B MoE
Embedding size	2048	4096	1024	1536
Number of layers	40	40	24	32
Attention head size	64	128	64	64
Number of attention heads	[Original value]	[Original value]	[Original value]	[Original value]

📄 License

Apache 2.0

Additional Information

Developers: Granite Team, IBM
GitHub Repository: ibm-granite/granite-3.1-language-models
Website: Granite Docs
Paper: Granite 3.1 Language Models (coming soon)
Release Date: December 18th, 2024

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご