Qwen2-VL-2B-Instruct Open-source Multimodal Model - Free Deployment for Image-Text to Text Tasks

Qwen2 VL 2B Instruct

Developed by FriendliAI

Qwen2-VL-2B-Instruct is a multimodal vision-language model that supports image-text-to-text tasks.

Downloads 24

Release Time : 3/17/2025

Model Overview

This model is a multimodal vision-language model based on Qwen2-VL-2B, capable of handling interactive tasks involving both images and text.

Multimodal support

Capable of processing both image and text inputs simultaneously, enabling multimodal interaction.

Instruction following

Supports instruction-following tasks, generating corresponding text outputs based on user instructions.

Optimized token processing

Added missing `<|image_pad|>` and `<|video_pad|>` tokens in tokenizer.json, improving processing efficiency.

Image-text understanding

Multimodal interaction

Instruction following

Multimodal interaction

Image caption generation

Generates detailed textual descriptions based on input images.

Visual question answering

Answers questions about input images.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base