MMR1-Math-v0-7B開源多模態模型 - 解決數學任務，展現最先進性能！

首頁

MMR1 Math V0 7B

由MMR1開發

專注於數學任務的大型多模態模型，在開源7B多模態模型中實現最先進的性能

文本生成圖像

Transformers

英語開源協議:Apache-2.0 #數學多模態推理 #小樣本高效訓練 #7B參數SOTA

下載量 75

發布時間 : 3/11/2025

模型概述

MMR1-Math-v0-7B是基於Qwen2.5-VL-7B-Instruct構建的多模態大模型，專注於數學推理任務。該模型僅使用6k精選數據樣本訓練即達到SOTA性能，在多個數學推理基準測試上表現優異。

模型特點

SOTA性能

在開源7B多模態模型中創下數學任務的新標杆

高效訓練

僅需6k高質量樣本和6小時RL訓練即可達到頂級表現

數據策略

基於任務難度和數學推理多樣性進行均勻採樣的高質量公開數據

GRPO訓練

使用64張H100顯卡進行高效RL訓練（15個epoch）

模型能力

多模態數學推理

圖像文本理解

複雜數學問題解答

邏輯推理

使用案例

教育

數學題目解答

幫助學生理解並解答覆雜的數學題目

在MathVista等基準測試上達到71.0分

研究

多模態推理研究

為多模態推理領域提供基準模型

在多個數學推理基準上超越同類模型

🚀 MMR1：推進多模態推理的前沿

MMR1是一個專注於多模態推理的項目，其推出的MMR1 - Math - v0模型在數學多模態任務上表現出色，僅用6k數據就達到了SOTA水平，為多模態推理領域帶來了新的突破。

🚀 快速開始

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
# default: Load the model on the available device(s)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "MMR1/MMR1-Math-v0-7B", 
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)
# default processer
processor = AutoProcessor.from_pretrained("MMR1/MMR1-Math-v0-7B")
# Example input
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "path/to/image.jpeg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]
# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

批量推理

# Sample messages for batch inference
messages1 = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "file:///path/to/image1.jpg"},
            {"type": "image", "image": "file:///path/to/image2.jpg"},
            {"type": "text", "text": "What are the common elements in these pictures?"},
        ],
    }
]
messages2 = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
]
# Combine messages for batch processing
messages = [messages1, messages2]
# Preparation for batch inference
texts = [
    processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True)
    for msg in messages
]
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=texts,
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")
# Batch Inference
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_texts = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_texts)

✨ 主要特性

模型亮點

SOTA性能：在開源7B模型的數學相關多模態任務中，樹立了新的最優基準。
少量訓練數據：僅使用來自公共訓練數據集的6k高質量樣本，就取得了頂級性能。
GRPO高效訓練：使用64個H100進行6小時的強化學習訓練，共15個epoch。
公開且高質量的數據：數據來源於公開數據集，經過嚴格篩選，在難度和數學問題類型上保持平衡。
平衡的數據策略：基於任務難度（過濾過於簡單的問題）和數學推理多樣性進行均勻的數據採樣。

📚 詳細文檔

📰 新聞

[2025.03.11] 🔥🔥 發佈MMR1 - Math - v0，僅用6k數據就達到了SOTA水平！

鏈接

代碼：https://github.com/LengSicong/MMR1
該模型在論文 LMM - R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two - Stage Rule - Based RL 中展示，代碼可在 https://github.com/LengSicong/MMR1 找到。

模型描述

MMR1 - Math - v0 - 7B是一個專門用於數學任務的大型多模態模型。值得注意的是，MMR1 - Math - v0 - 7B在開源7B多模態模型中達到了最優性能，即使與參數規模大得多的專有模型競爭也不落下風，而這一切僅通過精心策劃的6k數據實例訓練而成。

評估結果

我們使用 VLMEvalKit 在四個數學推理基準測試上評估了我們的模型：MathVista_MINI、MathVision、LogicVista和MathVerse_MINI。

我們還包含了MathVerse_MINI_Vision_Only_cot (MathVerse_V) 子集的結果，以與VLMEvalKit排行榜保持一致。下表將我們模型的性能與各種開源和專有模型進行了比較。

模型	規模	MathVista	MathVision	LogicVista	MathVerse	MathVerse_V
閉源模型
[GPT - 4o 1120](https://openai.com/index/gpt - 4o - system - card/)	-	60.0	31.2	52.8	40.6	-
Gemini - 2.0 - flash	-	70.4	43.6	52.3	47.8	-
[Claude3.7 - Sonnet](https://www.anthropic.com/news/claude - 3 - 7 - sonnet)	-	66.8	41.9	58.2	46.7	-
與R1相關的模型
[LLaVA - CoT](https://github.com/PKU - YuanGroup/LLaVA - CoT)	11B	52.5	19.9	39.6	22.6	-
[Open - R1 - Multimodal](https://github.com/EvolvingLMMs - Lab/open - r1 - multimodal)	7B	60.6	-	-	-	-
Mulberry	7B	63.1	-	-	-	-
LMM - R1	3B	63.2	26.4	-	-	41.6
[R1 - Onevision](https://github.com/Fancy - MLLM/R1 - Onevision?tab=readme - ov - file)	7B	-	26.2	-	-	44.1
[MM - Eureka](https://github.com/ModalMinds/MM - EUREKA)	8B	67.1	22.2	-	-	40.4
[MM - Eureka](https://github.com/ModalMinds/MM - EUREKA)	38B	64.2	26.6	-	-	48.9
開源模型
[Ovis2 - 8b](https://github.com/AIDC - AI/Ovis)	8B	71.8	25.9	39.4	42.3	-
[MiniCPM - o - 2.6](https://github.com/OpenBMB/MiniCPM - o)	8B	71.9	21.7	36.0	35.0	-
[Qwen2.5 - VL](https://github.com/QwenLM/Qwen2.5 - VL) (官方)	7B	68.2	25.4	47.9	41.1	-
[Qwen2.5 - VL](https://github.com/QwenLM/Qwen2.5 - VL) (復現)	7B	67.5	25.6	46.8	42.5	46.9
我們的模型
MMR1 - math - v0	7B	71.0	30.2	50.8	45.1	49.8

📄 許可證

本項目採用apache - 2.0許可證。

📚 引用

如果您發現MMR1對您的研究和應用有用，請使用以下BibTeX進行引用：

@misc{MMR1-Math2025,
  title={MMR1: Advancing the Frontiers of Multimodal Reasoning},
  author={Sicong Leng*, Jing Wang*, Jiaxi Li*, Hao Zhang*, Zhiqiang Hu, Boqiang Zhang, Hang Zhang, Yuming Jiang, Xin Li, Fan Wang, Yu Rong, Aixin Sun†, Shijian Lu†},
  year={2025},
  howpublished={\url{https://github.com/LengSicong/MMR1}},
}