Q

Qwen2 VL 2B Instruct

Developed by FriendliAI
Qwen2-VL-2B-Instruct is a multimodal vision-language model that supports image-text-to-text tasks.
Downloads 24
Release Time : 3/17/2025

Model Overview

This model is a multimodal vision-language model based on Qwen2-VL-2B, capable of handling interactive tasks involving both images and text.

Model Features

Multimodal support
Capable of processing both image and text inputs simultaneously, enabling multimodal interaction.
Instruction following
Supports instruction-following tasks, generating corresponding text outputs based on user instructions.
Optimized token processing
Added missing `<|image_pad|>` and `<|video_pad|>` tokens in tokenizer.json, improving processing efficiency.

Model Capabilities

Image-text understanding
Multimodal interaction
Instruction following

Use Cases

Multimodal interaction
Image caption generation
Generates detailed textual descriptions based on input images.
Visual question answering
Answers questions about input images.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase