Skywork-R1V2-38B is currently the most advanced open-source multimodal reasoning model, demonstrating outstanding performance in multiple benchmark tests with robust visual reasoning and text comprehension capabilities.
A high-performance open-source vision-language model combining visual reasoning and text comprehension, leading other open-source models in benchmarks such as MMMU and OlympiadBench.
Model Features
Multimodal Reasoning Capability
Achieved a score of 73.6% in the MMMU test, the highest among all open-source models.
Outstanding Visual Understanding
Reached 62.6% on OlympiadBench, significantly outperforming other open-source models.
Comparable to Commercial Models
Demonstrated strong performance in MathVision, MMMU-Pro, and MathVista tests, approaching the performance of commercial closed-source models.
Open Source Accessibility
Fully open-source, available via Hugging Face and ModelScope model repositories.
Model Capabilities
Multimodal Reasoning
Visual Question Answering
Image Understanding
Complex Problem Solving
Cross-modal Information Processing
Use Cases
Education
Math Problem Solving
Analyze and solve problems containing mathematical formulas and diagrams.
Achieved 74.0% accuracy in the MathVista test.
Science Problem Solving
Understand scientific charts and answer related questions.
Achieved 62.6% accuracy in the OlympiadBench test.
Research
Multimodal Research
Used for cutting-edge research in vision-language models.
๐ Skywork-R1V2
Skywork-R1V2-38B is a state-of-the-art open-source multimodal reasoning model. It combines powerful visual reasoning and text understanding, achieving top-tier performance across multiple benchmarks.
Evaluation Results of State-of-the-Art LLMs and VLMs
Model
Supports Vision
Text Reasoning (%)
Multimodal Reasoning (%)
AIME24
LiveCodebench
liveBench
IFEVAL
BFCL
GPQA
MMMU(val)
MathVista(mini)
MathVision(mini)
OlympiadBench
mmmuโpro
R1V2โ38B
โ
78.9
63.6
73.2
82.9
66.3
61.6
73.6
74.0
49.0
62.6
52.0
R1V1โ38B
โ
72.0
57.2
54.6
72.5
53.5
โ
68.0
67.0
โ
40.4
โ
DeepseekโR1โ671B
โ
74.3
65.9
71.6
83.3
60.3
71.5
โ
โ
โ
โ
โ
GPTโo1
โ
79.8
63.4
72.2
โ
โ
โ
โ
โ
โ
โ
โ
GPTโo4โmini
โ
93.4
74.6
78.1
โ
โ
49.9
81.6
84.3
58.0
โ
โ
Claude 3.5 Sonnet
โ
โ
โ
โ
โ
โ
65.0
66.4
65.3
โ
โ
โ
Kimi k1.5 long-cot
โ
โ
โ
โ
โ
โ
โ
70.0
74.9
โ
โ
โ
Qwen2.5โVLโ72BโInstruct
โ
โ
โ
โ
โ
โ
โ
70.2
74.8
โ
โ
โ
InternVL2.5โ78B
โ
โ
โ
โ
โ
โ
โ
70.1
72.3
โ
33.2
โ
Evaluation Results of State-of-the-Art LLMs and VLMs
๐ License
This project is released under the MIT license.
๐ Citation
If you use Skywork-R1V in your research, please cite:
@misc{chris2025skyworkr1v2multimodalhybrid,
title={Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning},
author={Chris and Yichen Wei and Yi Peng and Xiaokun Wang and Weijie Qiu and Wei Shen and Tianyidan Xie and Jiangbo Pei and Jianhao Zhang and Yunzhuo Hao and Xuchen Song and Yang Liu and Yahui Zhou},
year={2025},
eprint={2504.16656},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.16656},
}
@misc{peng2025skyworkr1vpioneeringmultimodal,
title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought},
author={Yi Peng and Chris and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},
year={2025},
eprint={2504.05599},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.05599},
}