🚀 Skywork-R1V2
Skywork-R1V2-38B 是一款先进的开源多模态推理模型,在多个基准测试中表现卓越,融合了强大的视觉推理和文本理解能力,为多模态领域带来了新的解决方案。
🚀 快速开始
1. 克隆仓库
git clone https://github.com/SkyworkAI/Skywork-R1V.git
cd skywork-r1v/inference
2. 环境搭建
# For Transformers
conda create -n r1-v python=3.10 && conda activate r1-v
bash setup.sh
# For vLLM
conda create -n r1v-vllm python=3.10 && conda activate r1v-vllm
pip install -U vllm
3. 运行推理脚本
Transformers 推理
CUDA_VISIBLE_DEVICES="0,1" python inference_with_transformers.py \
--model_path path \
--image_paths image1_path \
--question "your question"
vLLM 推理
python inference_with_vllm.py \
--model_path path \
--image_paths image1_path image2_path \
--question "your question" \
--tensor_parallel_size 4
✨ 主要特性
Skywork-R1V2-38B 作为一款先进的开源多模态推理模型,在多个基准测试中展现出了卓越的性能:
- 在 MMMU 测试中,得分达到 73.6%,是目前所有开源模型中的最高分。
- 在 OlympiadBench 测试中,取得了 62.6% 的成绩,大幅领先于其他开源模型。
- 在 MathVision、MMMU-Pro 和 MathVista 等测试中也表现出色,可与专有商业模型相媲美。
- 总体而言,R1V2 是一款高性能的开源视觉语言模型(VLM),具备强大的视觉推理和文本理解能力。
🔧 模型详情
📚 详细文档
评估
与大规模开源模型对比
图注:与大规模开源模型的比较
与专有模型对比
图注:与专有模型的比较
先进大语言模型和视觉语言模型的评估结果
模型 |
是否支持视觉 |
文本推理(%) |
|
|
|
|
|
多模态推理(%) |
|
|
|
|
|
|
AIME24 |
LiveCodebench |
liveBench |
IFEVAL |
BFCL |
GPQA |
MMMU(val) |
MathVista(mini) |
MathVision(mini) |
OlympiadBench |
mmmu‑pro |
R1V2‑38B |
✅ |
78.9 |
63.6 |
73.2 |
82.9 |
66.3 |
61.6 |
73.6 |
74.0 |
49.0 |
62.6 |
52.0 |
R1V1‑38B |
✅ |
72.0 |
57.2 |
54.6 |
72.5 |
53.5 |
– |
68.0 |
67.0 |
– |
40.4 |
– |
Deepseek‑R1‑671B |
❌ |
74.3 |
65.9 |
71.6 |
83.3 |
60.3 |
71.5 |
– |
– |
– |
– |
– |
GPT‑o1 |
❌ |
79.8 |
63.4 |
72.2 |
– |
– |
– |
– |
– |
– |
– |
– |
GPT‑o4‑mini |
✅ |
93.4 |
74.6 |
78.1 |
– |
– |
49.9 |
81.6 |
84.3 |
58.0 |
– |
– |
Claude 3.5 Sonnet |
✅ |
– |
– |
– |
– |
– |
65.0 |
66.4 |
65.3 |
– |
– |
– |
Kimi k1.5 long-cot |
✅ |
– |
– |
– |
– |
– |
– |
70.0 |
74.9 |
– |
– |
– |
Qwen2.5‑VL‑72B‑Instruct |
✅ |
– |
– |
– |
– |
– |
– |
70.2 |
74.8 |
– |
– |
– |
InternVL2.5‑78B |
✅ |
– |
– |
– |
– |
– |
– |
70.1 |
72.3 |
– |
33.2 |
– |
📄 许可证
本项目采用 MIT 许可证开源。
📖 引用
如果您在研究中使用了 Skywork-R1V,请引用以下文献:
@misc{chris2025skyworkr1v2multimodalhybrid,
title={Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning},
author={Chris and Yichen Wei and Yi Peng and Xiaokun Wang and Weijie Qiu and Wei Shen and Tianyidan Xie and Jiangbo Pei and Jianhao Zhang and Yunzhuo Hao and Xuchen Song and Yang Liu and Yahui Zhou},
year={2025},
eprint={2504.16656},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.16656},
}
@misc{peng2025skyworkr1vpioneeringmultimodal,
title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author={Yi Peng and Chris and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},
year={2025},
eprint={2504.05599},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.05599},
}