Treevgr 7B CI
TreeVGR-7B is a visual positioning reasoning model with traceable evidence enhancement. It combines supervised positioning and reasoning through reinforcement learning to achieve accurate positioning and interpretable reasoning paths.
Downloads 115
Release Time : 7/3/2025
Model Overview
TreeVGR-7B is an advanced open-source visual positioning reasoning model, initialized based on Qwen2.5-VL-7B, and performs excellently in multiple benchmark tests.
Model Features
Traceable evidence enhancement
Combines supervised positioning and reasoning through reinforcement learning to achieve accurate positioning and interpretable reasoning paths.
Complex scene processing
Capable of handling complex scenes with dense objects and focusing on visual perception of subtle targets.
Second-order reasoning ability
Tests object interactions and spatial hierarchies, rather than just simple object positioning.
Model Capabilities
Visual positioning reasoning
Complex scene analysis
Second-order reasoning
Interpretable reasoning path
Use Cases
Visual question answering
TreeBench evaluation
Conduct visual question answering evaluation on TreeBench to test the model's visual perception and reasoning abilities.
Achieved an accuracy of 49.38% and a Mean IoU of 43.3 on TreeBench.
Visual positioning
V* Bench evaluation
Conduct visual positioning evaluation on V* Bench to test the model's positioning ability.
The performance is improved by 16.8%.
Featured Recommended AI Models
Qwen2.5 VL 7B Abliterated Caption It I1 GGUF
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Abliterated-Caption-it, supporting multilingual image description tasks.
Image-to-Text
Transformers Supports Multiple Languages

Q
mradermacher
167
1
Nunchaku Flux.1 Dev Colossus
Other
The Nunchaku quantized version of the Colossus Project Flux, designed to generate high-quality images based on text prompts. This model minimizes performance loss while optimizing inference efficiency.
Image Generation English
N
nunchaku-tech
235
3
Qwen2.5 VL 7B Abliterated Caption It GGUF
Apache-2.0
This is a static quantized version based on the Qwen2.5-VL-7B model, focusing on image captioning generation tasks and supporting multiple languages.
Image-to-Text
Transformers Supports Multiple Languages

Q
mradermacher
133
1
Olmocr 7B 0725 FP8
Apache-2.0
olmOCR-7B-0725-FP8 is a document OCR model based on the Qwen2.5-VL-7B-Instruct model. It is fine-tuned using the olmOCR-mix-0225 dataset and then quantized to the FP8 version.
Image-to-Text
Transformers English

O
allenai
881
3
Lucy 128k GGUF
Apache-2.0
Lucy-128k is a model developed based on Qwen3-1.7B, focusing on proxy-based web search and lightweight browsing, and can run efficiently on mobile devices.
Large Language Model
Transformers English

L
Mungert
263
2
Š 2025AIbase