V

Visualthinker R1 Zero

Developed by turningpoint-ai
The first multimodal reasoning model to reproduce 'Aha moments' and increased response length on just a 2B model with unsupervised fine-tuning
Downloads 578
Release Time : 2/28/2025

Model Overview

Based on the Qwen2-VL-2B foundation model, trained with reinforcement learning on the SAT dataset, enhancing reasoning capabilities for vision-centric tasks

Model Features

Reproduction of Aha Moments
First to successfully reproduce DeepSeek-R1's 'Aha moments' feature on a 2B model with unsupervised fine-tuning
Vision-Centric Reasoning
Demonstrates that vision-centric tasks can also benefit from improved reasoning capabilities
Self-Reflection Capability
The model exhibits emergent abilities to rethink and correct errors

Model Capabilities

Multimodal Reasoning
Image Understanding
Text Generation
Vision-Centric Task Processing

Use Cases

Visual Reasoning
Object Position Analysis
Analyze the relative positional relationships of objects in images
Achieved 59.47% accuracy on CVBench
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase