Qwen2 VL 2B Instruct GGUF

Developed by second-state

Qwen2-VL-2B-Instruct is a multimodal vision-language model that supports image-text generation tasks, based on the Qwen2 architecture with a parameter scale of 2B.

Image-to-Text EnglishOpen Source License:Apache-2.0 #Multimodal Image Understanding #Lightweight Visual Question Answering #Low-Resource Deployment

Downloads 125

Release Time : 12/15/2024

Model Overview

This model is a multimodal vision-language model capable of processing image and text inputs to generate relevant text outputs. It is suitable for application scenarios requiring combined visual and linguistic understanding.

Model Features

Multimodal Support

Capable of processing both image and text inputs to generate relevant text outputs.

Efficient Quantization

Provides multiple quantized versions of the model to suit different hardware and performance needs.

Long Context Support

Supports context lengths of up to 32,000, suitable for handling complex tasks.

Model Capabilities

Image-Text Generation

Multimodal Understanding

Visual Question Answering

Use Cases

Visual Question Answering

Image Caption Generation

Generates detailed textual descriptions based on input images.

Visual Question Answering

Answers questions about input images.

Multimodal Interaction

Image-Text Combined Tasks

Combines image and text inputs to generate relevant text outputs.

base_model: Qwen/Qwen2-VL-2B-Instruct license: apache-2.0 model_creator: Qwen model_name: Qwen2-VL-2B-Instruct quantized_by: Second State Inc. language:

en pipeline_tag: image-text-to-text tags:
multimodal library_name: transformers

Qwen2-VL-2B-Instruct-GGUF

Original Model

Qwen/Qwen2-VL-2B-Instruct

Run with LlamaEdge

LlamaEdge version: v0.16.0

Prompt template

Prompt type: qwen2-vision

Prompt string

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
<|vision_start|>{image_placeholder}<|vision_end|>{user_prompt}<|im_end|>
<|im_start|>assistant

Context size: 32000

Run as LlamaEdge service

wasmedge --dir .:. \
  --nn-preload default:GGML:AUTO:Qwen2-VL-2B-Instruct-Q5_K_M.gguf \
  llama-api-server.wasm \
  --model-name Qwen2-VL-2B-Instruct \
  --prompt-template qwen2-vision \
  --llava-mmproj Qwen2-VL-2B-Instruct-vision-encoder.gguf \
  --ctx-size 32000

Quantized GGUF Models

Name	Quant method	Bits	Size	Use case
Qwen2-VL-2B-Instruct-Q2_K.gguf	Q2_K	2	676 MB	smallest, significant quality loss - not recommended for most purposes
Qwen2-VL-2B-Instruct-Q3_K_L.gguf	Q3_K_L	3	880 MB	small, substantial quality loss
Qwen2-VL-2B-Instruct-Q3_K_M.gguf	Q3_K_M	3	824 MB	very small, high quality loss
Qwen2-VL-2B-Instruct-Q3_K_S.gguf	Q3_K_S	3	761 MB	very small, high quality loss
Qwen2-VL-2B-Instruct-Q4_0.gguf	Q4_0	4	935 MB	legacy; small, very high quality loss - prefer using Q3_K_M
Qwen2-VL-2B-Instruct-Q4_K_M.gguf	Q4_K_M	4	986 MB	medium, balanced quality - recommended
Qwen2-VL-2B-Instruct-Q4_K_S.gguf	Q4_K_S	4	940 MB	small, greater quality loss
Qwen2-VL-2B-Instruct-Q5_0.gguf	Q5_0	5	1.10 GB	legacy; medium, balanced quality - prefer using Q4_K_M
Qwen2-VL-2B-Instruct-Q5_K_M.gguf	Q5_K_M	5	1.13 GB	large, very low quality loss - recommended
Qwen2-VL-2B-Instruct-Q5_K_S.gguf	Q5_K_S	5	1.10 GB	large, low quality loss - recommended
Qwen2-VL-2B-Instruct-Q6_K.gguf	Q6_K	6	1.27 GB	very large, extremely low quality loss
Qwen2-VL-2B-Instruct-Q8_0.gguf	Q8_0	8	1.65 GB	very large, extremely low quality loss - not recommended
Qwen2-VL-2B-Instruct-f16.gguf	f16	16	3.09 GB
Qwen2-VL-2B-Instruct-vision-encoder.gguf	f16	16	2.66 GB

Quantized with llama.cpp b4329

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご