SILMA-9B-Instruct-v1.0 Open Source Arabic Large Language Model - Free Deployment to Support Arabic Language Tasks

SILMA 9B Instruct V1.0

Developed by silma-ai

SILMA-9B-Instruct-v1.0 is a 9-billion-parameter open-source Arabic large language model that excels in Arabic tasks, built on the Google Gemma architecture.

Large Language Model

Transformers

Supports Multiple Languages#Arabic Generation #Efficient Q&A #Multi-task Optimization

Downloads 18.08k

Release Time : 8/17/2024

Model Overview

SILMA is a generative AI model specialized in Arabic language processing, outperforming larger models in multiple Arabic benchmarks, suitable for dialogue, text generation, and other tasks.

Model Features

Arabic Optimization

Specially optimized for Arabic, excelling in multiple Arabic benchmark tests

Efficient Performance

Outperforms 72-billion-parameter models with just 9 billion parameters

Open Weights

Freely available under an open license

Multi-task Support

Supports various text generation tasks including dialogue, Q&A, and code generation

Model Capabilities

Arabic Text Generation

English Text Generation

Dialogue System

Q&A System

Code Generation

Text Summarization

Use Cases

Content Creation

Business Letter Writing

Generate formal Arabic business letters, apology letters, and other documents

Example shows a complete business apology letter format

Technical Documentation Generation

Generate Python code examples and technical explanations

Example shows a complete Python function for generating even-number sequences

Education

Language Learning Assistance

Help Arabic learners practice writing and grammar

Knowledge Q&A

Answer knowledge-based questions in history, science, and other fields

Correctly answered questions about historical figures

🚀 SILMA AI

SILMA.AI is a leading Generative AI startup dedicated to empowering Arabic speakers with state-of-the-art AI solutions.

🚀 Quick Start

Installation

First, install the Transformers library with:

pip install -U transformers sentencepiece

Usage Examples

Basic Usage

Below are some code snippets to quickly start running the model. Copy the snippet relevant to your use case.

Running with the `pipeline` API

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="silma-ai/SILMA-9B-Instruct-v1.0",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "اكتب رسالة تعتذر فيها لمديري في العمل عن الحضور اليوم لأسباب مرضية."},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)

Response:

السلام عليكم ورحمة الله وبركاته

أودّ أن أعتذر عن عدم الحضور إلى العمل اليوم بسبب مرضي. أشعر بالسوء الشديد وأحتاج إلى الراحة. سأعود إلى العمل فور تعافيي.
شكراً لتفهمكم.

مع تحياتي،
[اسمك]

Advanced Usage

Running the model on a single / multi GPU

First, install the accelerate library:

pip install accelerate

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

messages = [
    {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
    {"role": "user", "content": "أيهما أبعد عن الأرض, الشمس أم القمر؟"},
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=256)

print(tokenizer.decode(outputs[0]))

Response:

الشمس

You can ensure the correct chat template is applied by using tokenizer.apply_chat_template as follows:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

messages = [
    {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
    {"role": "user", "content": "اكتب كود بايثون لتوليد متسلسلة أرقام زوجية."},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]).split("<start_of_turn>model")[-1])

Response:

def generate_even_numbers(n):
    """
    This function generates a list of even numbers from 1 to n.
    Args:
        n: The upper limit of the range.

    Returns:
        A list of even numbers.
    """
    return [i for i in range(1, n + 1) if i % 2 == 0]

# Example usage
n = 10
even_numbers = generate_even_numbers(n)
print(f"The first {n} even numbers are: {even_numbers}")

✨ Features

Our Flagship Model: SILMA 1.0

SILMA 1.0 was the TOP-RANKED open-weights Arabic LLM (Until February 2025) with an impressive 9 billion parameter size, surpassing models that are over seven times larger 🏆

💡 Usage Tip

For RAG use-cases please use SILMA Kashif v1.0 as it has been specifically trained for Question Answering tasks.

What makes SILMA exceptional?

SIMLA is a small language model outperforming 72B models in most arabic language tasks, thus more practical for business use-cases
SILMA is built over the robust foundational models of Google Gemma, combining the strengths of both to provide you with unparalleled performance
SILMA is an open-weight model, free to use in accordance with our open license

👥 Our Team

We are a team of seasoned Arabic AI experts who understand the nuances of the language and cultural considerations, enabling us to build solutions that truly resonate with Arabic users.

Authors: silma.ai

📚 Documentation

Model Information

Property	Details
Model Type	SILMA-9B-Instruct-v1.0
Library Name	transformers
Pipeline Tag	text-generation
License	gemma
Languages Supported	Arabic (ar), English (en)
Tags	conversational

Model Performance

The model has been evaluated on various datasets, and here are some of the results:

Task	Dataset	Metric	Value	Source
Text Generation	MMLU (Arabic)	acc_norm (loglikelihood_acc_norm)	52.55	Open Arabic LLM Leaderboard
Text Generation	AlGhafa	acc_norm (loglikelihood_acc_norm)	71.85	Open Arabic LLM Leaderboard
Text Generation	ARC Challenge (Arabic)	acc_norm (loglikelihood_acc_norm)	78.19	Open Arabic LLM Leaderboard
Text Generation	ACVA	acc_norm (loglikelihood_acc_norm)	78.89	Open Arabic LLM Leaderboard
Text Generation	Arabic_EXAMS	acc_norm (loglikelihood_acc_norm)	51.4	Open Arabic LLM Leaderboard
Text Generation	ARC Easy	acc_norm (loglikelihood_acc_norm)	86	Open Arabic LLM Leaderboard
Text Generation	BOOLQ (Arabic)	acc_norm (loglikelihood_acc_norm)	64.05	Open Arabic LLM Leaderboard
Text Generation	COPA (Arabic)	acc_norm (loglikelihood_acc_norm)	78.89	Open Arabic LLM Leaderboard
Text Generation	HELLASWAG (Arabic)	acc_norm (loglikelihood_acc_norm)	47.64	Open Arabic LLM Leaderboard
Text Generation	OPENBOOK QA (Arabic)	acc_norm (loglikelihood_acc_norm)	72.93	Open Arabic LLM Leaderboard
Text Generation	PIQA (Arabic)	acc_norm (loglikelihood_acc_norm)	71.96	Open Arabic LLM Leaderboard
Text Generation	RACE (Arabic)	acc_norm (loglikelihood_acc_norm)	75.55	Open Arabic LLM Leaderboard
Text Generation	SCIQ (Arabic)	acc_norm (loglikelihood_acc_norm)	91.26	Open Arabic LLM Leaderboard
Text Generation	TOXIGEN (Arabic)	acc_norm (loglikelihood_acc_norm)	67.59	Open Arabic LLM Leaderboard
Text Generation	IFEval (0-Shot)	strict accuracy (inst_level_strict_acc and prompt_level_strict_acc)	58.42	Open LLM Leaderboard
Text Generation	BBH (3-Shot)	normalized accuracy (acc_norm)	30.71	Open LLM Leaderboard
Text Generation	MATH Lvl 5 (4-Shot)	exact match	0.0	Open LLM Leaderboard
Text Generation	GPQA (0-shot)	acc_norm	7.38	Open LLM Leaderboard
Text Generation	MuSR (0-shot)	acc_norm	17.26	Open LLM Leaderboard
Text Generation	MMLU-PRO (5-shot)	accuracy (acc)	32.44	Open LLM Leaderboard

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご