🚀 Skywork-R1V2
Skywork-R1V2-38B 是一款先進的開源多模態推理模型,在多個基準測試中表現卓越,融合了強大的視覺推理和文本理解能力,為多模態領域帶來了新的解決方案。
🚀 快速開始
1. 克隆倉庫
git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference
2. 環境搭建
# For Transformers
conda create -n r1-v python=3.10 && conda activate r1-v
bash setup.sh
# For vLLM
conda create -n r1v-vllm python=3.10 && conda activate r1v-vllm
pip install -U vllm
3. 運行推理腳本
Transformers 推理
CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
--model_path path \
--image_paths image1_path \
--question "your question"
vLLM 推理
python inference_with_vllm.py \
--model_path path \
--image_paths image1_path image2_path \
--question "your question" \
--tensor_parallel_size 4
✨ 主要特性
Skywork-R1V2-38B 作為一款先進的開源多模態推理模型,在多個基準測試中展現出了卓越的性能:
- 在 MMMU 測試中,得分達到 73.6%,是目前所有開源模型中的最高分。
- 在 OlympiadBench 測試中,取得了 62.6% 的成績,大幅領先於其他開源模型。
- 在 MathVision、MMMU-Pro 和 MathVista 等測試中也表現出色,可與專有商業模型相媲美。
- 總體而言,R1V2 是一款高性能的開源視覺語言模型(VLM),具備強大的視覺推理和文本理解能力。
🔧 模型詳情
📚 詳細文檔
評估
與大規模開源模型對比
圖注:與大規模開源模型的比較
與專有模型對比
圖注:與專有模型的比較
先進大語言模型和視覺語言模型的評估結果
模型 |
是否支持視覺 |
文本推理(%) |
|
|
|
|
|
多模態推理(%) |
|
|
|
|
|
|
AIME24 |
LiveCodebench |
liveBench |
IFEVAL |
BFCL |
GPQA |
MMMU(val) |
MathVista(mini) |
MathVision(mini) |
OlympiadBench |
mmmu‑pro |
R1V2‑38B |
✅ |
78.9 |
63.6 |
73.2 |
82.9 |
66.3 |
61.6 |
73.6 |
74.0 |
49.0 |
62.6 |
52.0 |
R1V1‑38B |
✅ |
72.0 |
57.2 |
54.6 |
72.5 |
53.5 |
– |
68.0 |
67.0 |
– |
40.4 |
– |
Deepseek‑R1‑671B |
❌ |
74.3 |
65.9 |
71.6 |
83.3 |
60.3 |
71.5 |
– |
– |
– |
– |
– |
GPT‑o1 |
❌ |
79.8 |
63.4 |
72.2 |
– |
– |
– |
– |
– |
– |
– |
– |
GPT‑o4‑mini |
✅ |
93.4 |
74.6 |
78.1 |
– |
– |
49.9 |
81.6 |
84.3 |
58.0 |
– |
– |
Claude 3.5 Sonnet |
✅ |
– |
– |
– |
– |
– |
65.0 |
66.4 |
65.3 |
– |
– |
– |
Kimi k1.5 long-cot |
✅ |
– |
– |
– |
– |
– |
– |
70.0 |
74.9 |
– |
– |
– |
Qwen2.5‑VL‑72B‑Instruct |
✅ |
– |
– |
– |
– |
– |
– |
70.2 |
74.8 |
– |
– |
– |
InternVL2.5‑78B |
✅ |
– |
– |
– |
– |
– |
– |
70.1 |
72.3 |
– |
33.2 |
– |
📄 許可證
本項目採用 MIT 許可證開源。
📖 引用
如果您在研究中使用了 Skywork-R1V,請引用以下文獻:
@misc{chris2025skyworkr1v2multimodalhybrid,
title={Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning},
author={Chris and Yichen Wei and Yi Peng and Xiaokun Wang and Weijie Qiu and Wei Shen and Tianyidan Xie and Jiangbo Pei and Jianhao Zhang and Yunzhuo Hao and Xuchen Song and Yang Liu and Yahui Zhou},
year={2025},
eprint={2504.16656},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.16656},
}
@misc{peng2025skyworkr1vpioneeringmultimodal,
title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author={Yi Peng and Chris and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},
year={2025},
eprint={2504.05599},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.05599},
}