Qwen2 VL 7B Instruct GGUF

Developed by second-state

Qwen2-VL-7B-Instruct is a multimodal vision-language model that supports joint understanding and generation tasks for images and text.

Image-to-Text EnglishOpen Source License:Apache-2.0 #Multimodal Visual Question Answering #128K Long Context #Quantized Efficient Inference

Downloads 195

Release Time : 12/15/2024

Model Overview

A 7B-parameter vision-language instruction model based on the Qwen2 architecture, capable of processing image and text inputs to generate relevant textual outputs.

Model Features

Multimodal Understanding

Capable of processing both image and text inputs simultaneously, understanding the relationship between them

Large Context Window

Supports context lengths of up to 128,000 tokens

Quantization Support

Offers multiple quantized versions to accommodate different hardware requirements

Model Capabilities

Image Understanding

Text Generation

Multimodal Reasoning

Visual Question Answering

Use Cases

Content Understanding

Image Caption Generation

Generates detailed textual descriptions based on input images

Visual Question Answering

Answers natural language questions about image content

Multimodal Interaction

Image-Based Dialogue

Engages in natural conversations combining images and text

base_model: Qwen/Qwen2-VL-7B-Instruct license: apache-2.0 model_creator: Qwen model_name: Qwen2-VL-7B-Instruct quantized_by: Second State Inc. language:

en pipeline_tag: image-text-to-text tags:
multimodal library_name: transformers

Qwen2-VL-7B-Instruct-GGUF

Original Model

Qwen/Qwen2-VL-7B-Instruct

Run with LlamaEdge

LlamaEdge version: v0.16.0

Prompt template

Prompt type: qwen2-vision

Prompt string

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
<|vision_start|>{image_placeholder}<|vision_end|>{user_prompt}<|im_end|>
<|im_start|>assistant

Context size: 128000

Run as LlamaEdge service

wasmedge --dir .:. \
  --nn-preload default:GGML:AUTO:Qwen2-VL-7B-Instruct-Q5_K_M.gguf \
  llama-api-server.wasm \
  --model-name Qwen2-VL-7B-Instruct \
  --prompt-template qwen2-vision \
  --llava-mmproj Qwen2-VL-7B-Instruct-vision-encoder.gguf \
  --ctx-size 128000

Quantized GGUF Models

Name	Quant method	Bits	Size	Use case
Qwen2-VL-7B-Instruct-Q2_K.gguf	Q2_K	2	3.02 GB	smallest, significant quality loss - not recommended for most purposes
Qwen2-VL-7B-Instruct-Q3_K_L.gguf	Q3_K_L	3	4.09 GB	small, substantial quality loss
Qwen2-VL-7B-Instruct-Q3_K_M.gguf	Q3_K_M	3	3.81 GB	very small, high quality loss
Qwen2-VL-7B-Instruct-Q3_K_S.gguf	Q3_K_S	3	3.49 GB	very small, high quality loss
Qwen2-VL-7B-Instruct-Q4_0.gguf	Q4_0	4	4.43 GB	legacy; small, very high quality loss - prefer using Q3_K_M
Qwen2-VL-7B-Instruct-Q4_K_M.gguf	Q4_K_M	4	4.68 GB	medium, balanced quality - recommended
Qwen2-VL-7B-Instruct-Q4_K_S.gguf	Q4_K_S	4	4.46 GB	small, greater quality loss
Qwen2-VL-7B-Instruct-Q5_0.gguf	Q5_0	5	5.32 GB	legacy; medium, balanced quality - prefer using Q4_K_M
Qwen2-VL-7B-Instruct-Q5_K_M.gguf	Q5_K_M	5	5.44 GB	large, very low quality loss - recommended
Qwen2-VL-7B-Instruct-Q5_K_S.gguf	Q5_K_S	5	5.32 GB	large, low quality loss - recommended
Qwen2-VL-7B-Instruct-Q6_K.gguf	Q6_K	6	6.25 GB	very large, extremely low quality loss
Qwen2-VL-7B-Instruct-Q8_0.gguf	Q8_0	8	8.21 GB	very large, extremely low quality loss - not recommended
Qwen2-VL-7B-Instruct-f16.gguf	f16	16	15.2 GB
Qwen2-VL-7B-Instruct-vision-encoder.gguf	f16	16	2.70 GB

Quantized with llama.cpp b4329

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご