Model Selection

Visual Question Answering Optimization

# Visual Question Answering Optimization

VL-Reasoner-7B is a multimodal reasoning model trained using GRPO-SSR technology, demonstrating outstanding performance across multiple multimodal reasoning benchmarks.

Transformers English

Qwen2.5 VL 7B Instruct Quantized.w4a16

Quantized version of Qwen2.5-VL-7B-Instruct, supporting vision-text input and text output, with weights quantized to INT4 and activations to FP16.

Transformers English

ReflectiVA is a multimodal large language model that enhances visual question answering capabilities by integrating external knowledge sources and a reflection token mechanism.

Florence 2 VLM Doc VQA

A specialized version for Visual Question Answering (VQA) fine-tuned based on microsoft/Florence-2-base-ft, capable of interpreting image content and answering related questions

Transformers English

Xinyuan-VL-2B is a high-performance multimodal large model for edge-side applications launched by Cylingo Group, fine-tuned based on Qwen/Qwen2-VL-2B-Instruct, utilizing over 5 million multimodal data points and a small amount of pure text data.

Transformers Supports Multiple Languages

Volcano-7b is a multimodal self-feedback guided revision model, fine-tuned on the vicuna-7b-v1.5 model using a mixed visual instruction tuning dataset with multimodal feedback and revision data.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase