D

Deepeyes 7B

Developed by ChenShawn
DeepEyes is a vision-language model that encourages 'thinking with images' through reinforcement learning. It can directly integrate visual information into the reasoning chain and performs excellently in image-text processing tasks.
Downloads 383
Release Time : 5/20/2025

Model Overview

DeepEyes is trained through end-to-end reinforcement learning and can acquire the ability of 'thinking with images' without cold start or supervised fine-tuning. It demonstrates strong generalization ability in tasks such as visual localization, hallucination mitigation, and mathematical problem-solving.

Model Features

Ability to Think with Images
Acquired through end-to-end reinforcement learning, directly guided by the result reward signal without cold start or supervised fine-tuning
Improved Visual Localization Ability
During the reinforcement learning training phase, both the localization IoU and the accuracy of tool calls are improved
High-Resolution Processing Ability
Brings significant performance improvement in high-resolution benchmark tests
Intelligent Thinking Mode
Thinking modes such as visual search for small objects and cross-region visual comparison naturally emerge during the training process

Model Capabilities

Image Understanding and Analysis
Visual Reasoning
Visual Localization
Hallucination Mitigation
Mathematical Problem-Solving
High-Resolution Image Processing

Use Cases

Visual Question Answering
Complex Image Question Answering
Accurately answer questions about images containing complex visual information
Performs excellently in high-resolution benchmark tests
Visual Localization
Target Localization
Accurately locate specific targets in images
The localization IoU indicator is improved
Mathematical Problem-Solving
Visual Math Problems
Solve mathematical problems containing visual information
Demonstrates strong generalization ability
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase