# Visual Question Answering Optimization

VL Reasoner 7B
Apache-2.0
VL-Reasoner-7B is a multimodal reasoning model trained using GRPO-SSR technology, demonstrating outstanding performance across multiple multimodal reasoning benchmarks.
Text-to-Image Transformers English
V
TIGER-Lab
126
1
Qwen2.5 VL 7B Instruct Quantized.w4a16
Apache-2.0
Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, with weights quantized to INT4 and activations to FP16.
Text-to-Image Transformers English
Q
RedHatAI
605
3
Reflectiva
Apache-2.0
ReflectiVA is a multimodal large language model that enhances visual question answering capabilities by integrating external knowledge sources and a reflection token mechanism.
Text-to-Image Transformers
R
aimagelab
46
2
Florence 2 VLM Doc VQA
A specialized version for Visual Question Answering (VQA) fine-tuned based on microsoft/Florence-2-base-ft, capable of interpreting image content and answering related questions
Text-to-Image Transformers English
F
prithivMLmods
69
4
Xinyuan VL 2B
Apache-2.0
Xinyuan-VL-2B is a high-performance multimodal large model for edge-side applications launched by Cylingo Group, fine-tuned based on Qwen/Qwen2-VL-2B-Instruct, utilizing over 5 million multimodal data points and a small amount of pure text data.
Text-to-Image Transformers Supports Multiple Languages
X
Cylingo
94
7
Volcano 7b
Volcano-7b is a multimodal self-feedback guided revision model, fine-tuned on the vicuna-7b-v1.5 model using a mixed visual instruction tuning dataset with multimodal feedback and revision data.
Image-to-Text Transformers English
V
kaist-ai
268
3
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase