Magistral-Small-2506-Vision Open-source Model - Free to Support Experimental Examinations with Visual Capabilities

Magistral Small 2506 Vision

Developed by OptimusePrime

Magistral-Small-2506-Vision is an inference fine-tuned version based on Mistral Small 3.1 with GRPO training, an experimental checkpoint with visual capabilities.

Image-to-Text

Safetensors

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multimodal reasoning #Multilingual vision #Zero-training vision

Downloads 125

Release Time : 6/13/2025

Model Overview

This model is an inference fine-tuned version of Mistral Small 3.1 with GRPO training. It ported the visual encoder of Mistral Small 3.1, enabling it to process images. Despite being fine-tuned only on text data, it still shows moderate improvement in multimodal benchmarks.

Model Features

Multilingual support

Supports multiple languages such as English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Persian, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, and Bengali.

Visual ability

By porting the visual encoder of Mistral Small 3.1, the model is enabled to process images.

Generalization of reasoning ability

Despite being fine-tuned only on text data, it still shows moderate improvement in multimodal benchmarks, indicating that the reasoning ability can be generalized to multimodal data.

Model Capabilities

Text generation

Image analysis

Multimodal reasoning

Use Cases

Multimodal tasks

Image description generation

Generate descriptive text based on the input image.

Multimodal question answering

Answer questions by combining image and text inputs.

🚀 Magistral-Small-2506-Vision

This is an experimental vision checkpoint of Magistral-Small-2506, inspired by the Devstral vision experiment.

This project is inspired by the Devstral vision experiment at https://huggingface.co/ngxson/Devstral-Small-Vision-2505-GGUF. It presents an experimental checkpoint of Magistral-Small-2506 with vision capabilities. Magistral Small is a GRPO-trained reasoning fine - tune of Mistral Small 3.1, which is a vision - capable large language model (LLM).

In its technical report, Mistral indicates that Magistral was fine - tuned on text - only data. However, the authors reported results on MMMU, MMMU - Pro, and MathVista benchmarks, showing modest improvements even with text - only training. This implies that Magistral has successfully generalized its reasoning capabilities to multimodal data.

Mistral removed Magistral's vision encoder in their official release. This might be due to the performance gap between text - only and multimodal inputs.

In this model, I grafted Mistral Small 3.1's vision encoder onto Magistral Small. No further training was conducted, so the text - only performance of this model should be the same as Mistral's official release.

✨ Features

Multilingual Support: This model supports a wide range of languages including English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Farsi, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, and Bengali.
Vision Integration: It integrates a vision encoder from Mistral Small 3.1, enabling potential multimodal processing.

📦 Installation

The model was tested with vLLM and should be compatible with any toolkit supporting Mistral Small 3.1. Note that the Transformers implementation of Mistral 3 does not work well.

💻 Usage Examples

Basic Usage

Make sure to use the system prompt provided in the SYSTEM_PROMPT.txt file (from Mistral's docs) and the sampling parameters temp = 0.7, top_p = 0.95.

# Here you can add sample code for using the model with vLLM or other compatible toolkits
# For example, if using vLLM
from vllm import LLM, SamplingParams

# Load the model
model = LLM(model="your_model_path")

# Define sampling parameters
sampling_params = SamplingParams(temperature=0.7, top_p=0.95)

# Read system prompt from file
with open('SYSTEM_PROMPT.txt', 'r') as f:
    system_prompt = f.read()

# Generate text
prompt = system_prompt + "Your input prompt here"
output = model.generate(prompt, sampling_params)
print(output)

🔧 Technical Details

Magistral Small is a fine - tuned version of Mistral Small 3.1, trained using the GRPO method for reasoning. Mistral reported that Magistral was fine - tuned on text - only data, yet it showed some improvements on multimodal benchmarks. By grafting Mistral Small 3.1's vision encoder onto Magistral Small, we aim to leverage the multimodal capabilities while maintaining text - only performance.

📄 License

This model is licensed under the Apache - 2.0 license.

⚠️ Important Note

There still may be configuration errors in this model which might reduce performance. Let me know if you encounter any weird behavior!

💡 Usage Tip

Make sure to use the system prompt from the SYSTEM_PROMPT.txt file and the sampling parameters temp = 0.7, top_p = 0.95 for better results.

Property	Details
Base Model	mistralai/Magistral-Small-2506, mistralai/Mistral-Small-3.1-24B-Instruct-2503
Pipeline Tag	image - text - to - text
Library Name	vLLM
Supported Languages	en, fr, de, es, pt, it, ja, ko, ru, zh, ar, fa, id, ms, ne, pl, ro, sr, sv, tr, uk, vi, hi, bn
License	Apache - 2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご