The INFRL-Qwen2.5-VL-72B Vision-Language Model is Open-Source! Outstanding Performance in Multiple Visual Reasoning Tests

INFRL Qwen2.5 VL 72B Preview Ggufs Fully Quantized

Developed by GeorgyGUF

An improved vision-language model based on Qwen2.5-VL-72B-Instruct, excelling in multiple visual reasoning benchmarks

Text-to-Image EnglishOpen Source License:Apache-2.0 #Visual Reasoning Enhancement #Mathematical Visual Question Answering #Multimodal Large Model

Downloads 230

Release Time : 5/14/2025

Model Overview

A multimodal model with enhanced visual reasoning capabilities, achieving the best performance among open-source models in mathematical visual understanding tasks

Model Features

Exceptional Visual Reasoning Capabilities

Top performance in visual reasoning benchmarks such as MathVision, MathVista, and MathVerse

Reinforcement Learning Optimization

Utilizes rule-based reward reinforcement learning to enhance visual comprehension

Multimodal Understanding

Capable of processing both visual and linguistic information for complex cross-modal reasoning

Model Capabilities

Visual Question Answering

Mathematical Problem Visual Understanding

Chart Analysis

Cross-modal Reasoning

Use Cases

EdTech

Visual Solution for Math Problems

Analyzing math problems containing diagrams and formulas

Achieved 77.8% accuracy on the MathVista test set

Scientific Research

Scientific Chart Analysis

Understanding and interpreting complex charts in research papers

🚀 INFRL-Qwen2.5-VL-72B-Preview

INFRL-Qwen2.5-VL-72B-Preview enhances visual reasoning capabilities based on the Qwen2.5-VL-72B-Instruct model and achieves top performance on multiple visual reasoning benchmarks.

🚀 Quick Start

This section provides a high - level overview of the model. INFRL-Qwen2.5-VL-72B-Preview builds on the Qwen2.5-VL-72B-Instruct model to improve visual reasoning. As of March 25th, 2025, it stands as the best - performing open - sourced VL model on various visual reasoning benchmarks such as MathVision, MathVista, and MathVerse.

✨ Features

Enhanced Visual Reasoning: Improves upon the base model Qwen2.5-VL-72B-Instruct for better visual reasoning performance.
Top - Tier Performance: Achieves the best results on multiple visual reasoning benchmarks as of March 25th, 2025.

📚 Documentation

Evaluation

The following table shows the performance of different models on various visual reasoning benchmarks:

Models	MathVision (test)	MathVista (testmini)	MathVerse (testmini)
GPT4o	30.6	60	41.2
Gemini-2.0-Flash	41.3	70.1	50.6
Claude 3.5 Sonnet	33.5	67.7	47.8
QvQ-72B	35.9	71.4	48.6
InternVL2.5-78B	34.9	72.3	51.7
Qwen-VL-2.5-72B	38.1	74.8	57.18
INFRL-VL-Preview	41.9	77.8	58.84

We plan to release a code repository for VLM evaluation. This repository will support RL training with simple rule - based rewards and will be aligned with LLM - Judge results. Stay tuned!

Contributors

Supervisors

Wei Chu • Yuan Qi

VL Team

Haozhe Wang • Zuming Huang

RL Team

Haozhe Wang • Chao Qu • Long Li

Thanks

We would like to thank Jiaran Hao and Liuyihan Song for their support in the RL infrastructure.

Citation

If you find our model useful, please consider citing:

@misc {INFRL_VL_Preview,
	author       = { {Wang, Haozhe and Huang, Zuming and Qu, Chao and Chu, Wei and Qi, Yuan} },
	title        = { INFRL-Qwen2.5-VL-72B-Preview },
	year         = 2025,
	url          = { https://huggingface.co/infly/INFRL-Qwen2.5-VL-72B-Preview},
	publisher    = { Hugging Face }
}

📄 License

This project is licensed under the apache - 2.0 license.

Model Information

Property	Details
Base Model	Qwen/Qwen2.5-VL-72B-Instruct
Language	en
License	apache - 2.0
Tags	transformers, multimodal
Pipeline Tag	visual - question - answering

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご