Sarvam-M Open-Source Language Model - Supports Multiple Languages and Performs Well in Hindi and English Inference

Sarvam M

Developed by sarvamai

Sarvam-M is a multilingual, hybrid-reasoning, pure-text language model based on Mistral-Small, specifically optimized for Indian languages and English, featuring exceptional reasoning capabilities and cultural adaptability.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Indian Multilingual Reasoning #Hybrid Thinking Modes #Mathematical Programming Enhancement

Downloads 1,824

Release Time : 5/20/2025

Model Overview

Sarvam-M is a versatile language model supporting both thinking and non-thinking modes, suitable for complex logical reasoning, mathematical problems, programming tasks, and general conversations.

Model Features

Hybrid Thinking Modes

Supports both 'thinking' and 'non-thinking' modes, suitable for complex logical reasoning and efficient general conversations respectively.

Advanced Indian Language Capabilities

Specifically optimized for Indian languages and English, reflecting Indian cultural values, supporting native scripts and Romanized versions.

Exceptional Reasoning Abilities

Outperforms most models of similar size in coding and mathematical benchmarks, with a 21.6% improvement in mathematical benchmarks and a 17.6% improvement in programming benchmarks.

Seamless Chat Experience

Comprehensively supports native scripts and Romanized versions of Indian languages, providing a smooth multilingual conversation experience.

Model Capabilities

Multilingual text generation

Complex logical reasoning

Mathematical problem-solving

Programming assistance

Culturally adaptive conversations

Use Cases

Education

Mathematical Problem-Solving

Helps students solve complex mathematical problems, providing step-by-step reasoning processes.

Achieved an 86% improvement in Romanized Indian language GSM-8K benchmarks.

Software Development

Programming Assistance

Assists developers in writing and debugging code, providing programming suggestions.

Programming benchmarks improved by 17.6%.

Multilingual Services

Multilingual Customer Support

Provides customer service support in various Indian languages.

Indian language benchmarks improved by an average of 20%.

🚀 Sarvam-M

sarvam-m is a multilingual, hybrid-reasoning, text-only language model built on Mistral-Small, offering significant improvements in multiple benchmarks and a seamless multilingual chatting experience.

🚀 Quick Start

The following code snippet demonstrates how to use sarvam-m using Transformers.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sarvamai/sarvam-m"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto"
)

# prepare the model input
prompt = "Who are you and what is your purpose on this planet?"

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    enable_thinking=True,  # Switches between thinking and non-thinking modes. Default is True.
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(**model_inputs, max_new_tokens=8192)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist()
output_text = tokenizer.decode(output_ids)

if "</think>" in output_text:
    reasoning_content = output_text.split("</think>")[0].rstrip("\n")
    content = output_text.split("</think>")[-1].lstrip("\n").rstrip("</s>")
else:
    reasoning_content = ""
    content = output_text.rstrip("</s>")

print("reasoning content:", reasoning_content)
print("content:", content)

⚠️ Important Note

For thinking mode, we recommend temperature=0.5; for no-think mode, temperature=0.2.

✨ Features

Hybrid Thinking Mode: A single versatile model supporting both "think" and "non-think" modes. Use the think mode for complex logical reasoning, mathematical problems, and coding tasks, or switch to non-think mode for efficient, general-purpose conversation.
Advanced Indic Skills: Specifically post-trained on Indian languages alongside English, embodying a character that authentically reflects and emphasizes Indian cultural values.
Superior Reasoning Capabilities: Outperforms most similarly-sized models on coding and math benchmarks, demonstrating exceptional reasoning abilities.
Seamless Chatting Experience: Full support for both Indic scripts and romanized versions of Indian languages, providing a smooth and accessible multilingual conversation experience.

📦 Installation

No specific installation steps provided in the original document.

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sarvamai/sarvam-m"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto"
)

# prepare the model input
prompt = "Who are you and what is your purpose on this planet?"

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    enable_thinking=True,  # Switches between thinking and non-thinking modes. Default is True.
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(**model_inputs, max_new_tokens=8192)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]) :].tolist()
output_text = tokenizer.decode(output_ids)

if "</think>" in output_text:
    reasoning_content = output_text.split("</think>")[0].rstrip("\n")
    content = output_text.split("</think>")[-1].lstrip("\n").rstrip("</s>")
else:
    reasoning_content = ""
    content = output_text.rstrip("</s>")

print("reasoning content:", reasoning_content)
print("content:", content)

Advanced Usage

from openai import OpenAI

base_url = "https://api.sarvam.ai/v1"
model_name = "sarvam-m"
api_key = "Your-API-Key"  # get it from https://dashboard.sarvam.ai/

client = OpenAI(
    base_url=base_url,
    api_key=api_key,
).with_options(max_retries=1)

messages = [
    {"role": "system", "content": "You're a helpful AI assistant"},
    {"role": "user", "content": "Explain quantum computing in simple terms"},
]

response1 = client.chat.completions.create(
    model=model_name,
    messages=messages,
    reasoning_effort="medium",  # Enable thinking mode. `None` for disable.
    max_completion_tokens=4096,
)
print("First response:", response1.choices[0].message.content)

# Building messages for the second turn (using previous response as context)
messages.extend(
    [
        {
            "role": "assistant",
            "content": response1.choices[0].message.content,
        },
        {"role": "user", "content": "Can you give an analogy for superposition?"},
    ]
)

response2 = client.chat.completions.create(
    model=model_name,
    messages=messages,
    reasoning_effort="medium",
    max_completion_tokens=8192,
)
print("Follow-up response:", response2.choices[0].message.content)

from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

messages = [{"role": "user", "content": "Why is 42 the best number?"}]

# By default, thinking mode is enabled.
# If you want to disable thinking, add:
# extra_body={"chat_template_kwargs": {"enable_thinking": False}}
response = client.chat.completions.create(model=model, messages=messages)
output_text = response.choices[0].message.content

if "</think>" in output_text:
    reasoning_content = output_text.split("</think>")[0].rstrip("\n")
    content = output_text.split("</think>")[-1].lstrip("\n")
else:
    reasoning_content = ""
    content = output_text

print("reasoning content:", reasoning_content)
print("content:", content)

# For the next round, add the model's response directly as assistant turn.
messages.append(
    {"role": "assistant", "content": output_text}
)

📚 Documentation

Learn more about sarvam-m in our detailed blog post.

Refer to API docs here: sarvam Chat Completions API docs

reasoning_effort can take three possible values: low, medium, and high to be consistent with the OpenAI API spec. Setting any of the three values just enables the thinking mode of sarvam-m.

For easy deployment, we can use vllm>=0.8.5 and create an OpenAI-compatible API endpoint with vllm serve sarvamai/sarvam-m.

🔧 Technical Details

No technical details provided in the original document.

📄 License

The model is licensed under the apache-2.0 license.

Property	Details
Library Name	transformers
License	apache-2.0
Supported Languages	en, bn, hi, kn, gu, mr, ml, or, pa, ta, te
Base Model	mistralai/Mistral-Small-3.1-24B-Base-2503
Base Model Relation	finetune

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご