🚀 Magistral-Small-2506-Vision
This is an experimental vision checkpoint of Magistral-Small-2506, inspired by the Devstral vision experiment.
This project is inspired by the Devstral vision experiment at https://huggingface.co/ngxson/Devstral-Small-Vision-2505-GGUF. It presents an experimental checkpoint of Magistral-Small-2506 with vision capabilities. Magistral Small is a GRPO-trained reasoning fine - tune of Mistral Small 3.1, which is a vision - capable large language model (LLM).
In its technical report, Mistral indicates that Magistral was fine - tuned on text - only data. However, the authors reported results on MMMU, MMMU - Pro, and MathVista benchmarks, showing modest improvements even with text - only training. This implies that Magistral has successfully generalized its reasoning capabilities to multimodal data.
Mistral removed Magistral's vision encoder in their official release. This might be due to the performance gap between text - only and multimodal inputs.
In this model, I grafted Mistral Small 3.1's vision encoder onto Magistral Small. No further training was conducted, so the text - only performance of this model should be the same as Mistral's official release.
✨ Features
- Multilingual Support: This model supports a wide range of languages including English, French, German, Spanish, Portuguese, Italian, Japanese, Korean, Russian, Chinese, Arabic, Farsi, Indonesian, Malay, Nepali, Polish, Romanian, Serbian, Swedish, Turkish, Ukrainian, Vietnamese, Hindi, and Bengali.
- Vision Integration: It integrates a vision encoder from Mistral Small 3.1, enabling potential multimodal processing.
📦 Installation
The model was tested with vLLM and should be compatible with any toolkit supporting Mistral Small 3.1. Note that the Transformers implementation of Mistral 3 does not work well.
💻 Usage Examples
Basic Usage
Make sure to use the system prompt provided in the SYSTEM_PROMPT.txt
file (from Mistral's docs) and the sampling parameters temp = 0.7, top_p = 0.95
.
from vllm import LLM, SamplingParams
model = LLM(model="your_model_path")
sampling_params = SamplingParams(temperature=0.7, top_p=0.95)
with open('SYSTEM_PROMPT.txt', 'r') as f:
system_prompt = f.read()
prompt = system_prompt + "Your input prompt here"
output = model.generate(prompt, sampling_params)
print(output)
🔧 Technical Details
Magistral Small is a fine - tuned version of Mistral Small 3.1, trained using the GRPO method for reasoning. Mistral reported that Magistral was fine - tuned on text - only data, yet it showed some improvements on multimodal benchmarks. By grafting Mistral Small 3.1's vision encoder onto Magistral Small, we aim to leverage the multimodal capabilities while maintaining text - only performance.
📄 License
This model is licensed under the Apache - 2.0 license.
⚠️ Important Note
There still may be configuration errors in this model which might reduce performance. Let me know if you encounter any weird behavior!
💡 Usage Tip
Make sure to use the system prompt from the SYSTEM_PROMPT.txt
file and the sampling parameters temp = 0.7, top_p = 0.95
for better results.
Property |
Details |
Base Model |
mistralai/Magistral-Small-2506, mistralai/Mistral-Small-3.1-24B-Instruct-2503 |
Pipeline Tag |
image - text - to - text |
Library Name |
vLLM |
Supported Languages |
en, fr, de, es, pt, it, ja, ko, ru, zh, ar, fa, id, ms, ne, pl, ro, sr, sv, tr, uk, vi, hi, bn |
License |
Apache - 2.0 |