Zero-Mistral-24B Open-Source Text Model - Compatible with Russian and English, Focused on Free Text Generation Tasks

Zero Mistral 24B

Developed by ZeroAgency

Zero-Mistral-24B is an improved text-only model based on Mistral-Small-3.1-24B-Instruct-2503, primarily adapted for Russian and English, with the original visual capabilities removed to focus on text generation tasks.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Russian-English Bilingual Assistant #128k Long Context #Mathematical Reasoning Optimization

Downloads 41

Release Time : 4/25/2025

Model Overview

This is an enhanced large language model specializing in Russian and English text generation tasks, featuring strong mathematical and reasoning capabilities, and supports context processing of up to 128k tokens.

Model Features

Multilingual Support

Optimized specifically for Russian and English, excelling in both languages

Long Context Processing

Supports context processing of up to 128k tokens

Mathematical Reasoning

Features strong mathematical computation and logical reasoning abilities

Text-Only Focus

Removed the original model's visual capabilities to focus solely on text generation tasks

Model Capabilities

Text Generation

Dialogue Systems

Mathematical Computation

Logical Reasoning

Multilingual Processing

Use Cases

Education

Math Problem Solving

Helps students solve math problems with detailed step-by-step solutions

Achieved an accuracy of 0.613 on the MathLogicQA test

Language Learning Assistance

Serves as an aid for learning Russian and English

Customer Service

Virtual Assistant

Acts as a multilingual virtual assistant for customer service

Achieved an accuracy of 0.916 on the ruHHH test

🚀 Zero-Mistral-24B

Zero-Mistral-24B is an improved text-only version of mistralai/Mistral-Small-3.1-24B-Instruct-2503, mainly adapted for Russian and English. The original Mistral model's vision features have been removed from this one. It was trained at the SFT stage primarily on the Big Russian Dataset and a proprietary dataset from Shkolkovo.online. This model has good math skills and some reasoning abilities, and it retains the original Mistral's long context capabilities up to 128k tokens.

✨ Features

Language Adaptation: Adapted for both Russian and English, making it suitable for a wider range of users.
Feature Removal: Removed vision features from the original Mistral model, focusing solely on text processing.
Training Data: Trained on high - quality datasets, including the Big Russian Dataset and a proprietary dataset.
Math and Reasoning: Demonstrates good math skills and reasoning abilities.
Long Context: Preserves the long - context capabilities of up to 128k tokens.

📦 Installation

vLLM Installation

Make sure you install vLLM >= 0.8.4:

pip install --upgrade vllm

Also make sure you have mistral_common >= 1.5.4 installed:

pip install --upgrade mistral_common

You can also make use of a ready - to - go docker image or on the docker hub.

💻 Usage Examples

Recommended System Prompts

prompts = {
    "generic": "You are a virtual assistant. You answer people's questions, help and support them. You are created to be helpful, harmless, and honest. You answer in the language the question was asked in or as the user requests.",
    "think": """You are a virtual assistant. You answer people's questions, help and support them. You are created to be helpful, harmless, and honest. You answer in the language the question was asked in or as the user requests.

Answer in the following format:
<think>Reasoning: ...</think>
...""",
    "task": "You are a virtual assistant. You answer people's questions, help and support them. You are created to be helpful, harmless, and honest. You answer in the language the question was asked in or as the user requests. Solve the task according to the instructions below. Don't apologize and don't build a dialogue.",
    "task_think": """You are a virtual assistant. You answer people's questions, help and support them. You are created to be helpful, harmless, and honest. You answer in the language the question was asked in or as the user requests. Solve the task according to the instructions below. Don't apologize and don't build a dialogue.

Answer in the following format:
<think>Reasoning: ...</think>
...""",
    "english_generic": """You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
Your knowledge base was last updated on 2023 - 10 - 01. The current date is 2025 - 01 - 30.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")
""",
    "english_think": """You are Mistral Small 3, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
Your knowledge base was last updated on 2023 - 10 - 01. The current date is 2025 - 01 - 30.
When you're not sure about some information, you say that you don't have the information and don't make up anything.
If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. \"What are some good restaurants around me?\" => \"Where are you?\" or \"When is the next flight to Tokyo\" => \"Where do you travel from?\")

Answer in the following format:
<think>Reasoning: ...</think>
""",
}

vLLM Server Usage

Spin up a server:

vllm serveZeroAgency/Zero-Mistral-24B --enable-prefix-caching --dtype bfloat16 --max-model-len 32768 --tool-call-parser mistral --enable-auto-tool-choice

Note: Running Zero - Mistral - 24B on GPU requires ~55 GB of GPU RAM in bf16 or fp16.

To ping the client you can use a simple Python snippet.

import requests
import json
from datetime import datetime, timedelta

url = "http://<your-server>:8000/v1/chat/completions"
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}

model = "ZeroAgency/Zero-Mistral-24B"

messages = [
    {
        "role": "system",
        "content": """You are a virtual assistant. You answer people's questions, help and support them. You are created to be helpful, harmless, and honest. You answer in the language the question was asked in or as the user requests. Solve the task according to the instructions below. Don't apologize and don't build a dialogue.

Answer in the following format:
<think>Reasoning: ...</think>
..."""
    },
    { # Task from https://3.shkolkovo.online/catalog/2552/93150
        "role": "user",
        "content": """The first worker makes 9 more parts per hour than the second worker. The first worker completes an order of 216 parts 4 hours faster than the second worker who completes the same order. How many parts does the first worker make per hour?"""
    },
]

data = {"model": model, "messages": messages}

response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.json()["choices"][0]["message"]["content"])

#<think> Let x be the number of parts the second worker makes per hour. Then the first worker makes x + 9 parts per hour. Let's make a table: First worker Second worker Number of parts per hour x + 9 x Number of hours 216 : (x + 9) 216 : x Difference in number of hours 4 216 : (x + 9) − 216 : x = 4 216x − 216(x + 9) = 4x(x + 9) 216x − 216x − 1944 = 4x^2 + 36x 1944 = 4x^2 + 36x 4x^2 + 36x − 1944 = 0 D = 36^2 + 4 · 4 · 1944 = 1296 + 31104 = 32400 = 180^2 x1 = −36 + 180 : 8 = 144 : 8 = 18 x2 = −36 − 180 : 8 < 0 — not suitable for the problem. Then the first worker makes 18 + 9 = 27 parts per hour. </think>
#27

vLLM Offline Usage

from vllm import LLM
from vllm.sampling_params import SamplingParams
from datetime import datetime, timedelta


# note that running this model on GPU requires over 60 GB of GPU RAM
llm = LLM(model="ZeroAgency/Zero-Mistral-24B", tokenizer_mode="mistral", tensor_parallel_size=8)

📚 Documentation

Model Details

image/png

Model Description

Property	Details
Developed by	ZeroAgency.ru
Funded by	ZeroAgency.ru and Shkolkovo.online
Shared by	Alexander Kozhevnikov (developer)
Model Type	LLM
Language(s) (NLP)	Russian, English
License	MIT
Finetuned from model	mistralai/Mistral-Small-3.1-24B-Instruct-2503

Model versions

Merged 16-bit - original 16bit merged version for transformers.
GGUF - different GGUF versions: BF16, F16, Q8_0, Q6_K, Q4_K_M, IQ4_XS, etc.

Benchmarks for main 16-bit merged version

MERA

MERA score: 0.623

Task	Result	Metric
LCS	0.194	Accuracy
RCB	0.607 / 0.592	Avg. F1 / Accuracy
USE	0.452	Grade Norm
RWSD	0.55	Accuracy
PARus	0.942	Accuracy
ruTiE	0.868	Accuracy
MultiQ	0.781 / 0.629	F1-score/EM
CheGeKa	0.397 / 0.322	F1 / EM
ruModAr	0.971	EM
MaMuRAMu	0.832	Accuracy
ruMultiAr	0.354	EM
ruCodeEval	0 / 0 / 0	pass@k `¯\_(ツ)_/¯`
MathLogicQA	0.613	Accuracy
ruWorldTree	0.987 / 0.987	Avg. F1 / Accuracy
ruOpenBookQA	0.913 / 0.913	Avg. F1 / Accuracy

Open Task Evaluation

Task	Result	Metric
BPS	0.981	Accuracy
ruMMLU	0.778	Accuracy
SimpleAr	0.997	EM
ruHumanEval	0.006 / 0.006 / 0.006	pass@k `¯\_(ツ)_/¯`
ruHHH	0.916	Accuracy
ruHateSpeech	0.834	Accuracy
ruDetox	0.341 / 0.843 / 0.624 / 0.66	Overall average score (J) / Meaning preservation score (SIM) / Naturalness score (FL) / Style transfer accuracy (STA)
ruEthics	[[0.386, 0.399, 0.41, 0.333, 0.327], [0.421, 0.427, 0.452, 0.375, 0.363], [0.653, 0.65, 0.697, 0.596, 0.573]]	5 MCC

📄 License

The model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご