Qwen2.5-VL-72B-Instruct-GGUF Open-source Multimodal Large Model - Supports Image and Text Generation and Long Text Multilingual Processing

Qwen2.5 VL 72B Instruct GGUF

Developed by lmstudio-community

A multimodal large model launched by Tongyi Qianwen, supporting image and text generation and 128k long context processing, with multilingual capabilities.

Image-to-Text EnglishOpen Source License:Other #Multimodal image analysis #128k long context #Visual agent reasoning

Downloads 668

Release Time : 5/8/2025

Model Overview

This is a multimodal instruction model that can process image and text inputs and generate text outputs. It supports long context, multilingual, and structured outputs, and is suitable for various AI tasks.

Model Features

Long context support

Supports a context length of 128k tokens, suitable for processing long documents and complex tasks.

Multimodal recognition

Can recognize objects, text, charts, icons, graphics, and layouts in images.

Visual intelligent agent

Can act as a visual agent for reasoning and dynamically call tools, with the ability to use computers and mobile phones.

Structured output

Can generate structured outputs and stable JSON formats.

Multilingual support

Supports input and output in multiple languages.

Model Capabilities

Image understanding

Text generation

Multimodal reasoning

Tool invocation

Structured data generation

Long document processing

Use Cases

Visual content analysis

Image description generation

Generate detailed text descriptions for the input images.

Accurately identify objects, scenes, and text content in the images.

Chart understanding

Analyze chart data in images and extract information.

Can understand common chart types and extract key data.

Intelligent agent

Computer - assisted operation

Guide computer operations based on visual input.

Can understand screen content and generate operation instructions.

Content generation

Structured report generation

Generate structured reports based on multimodal inputs.

Output stable JSON or other structured data.

🚀 Qwen2.5 VL 72B Instruct by Qwen

This model is part of the LM Studio Community models highlights program, which showcases new and remarkable models from the community. Join the discussion on Discord.

📋 Model Information

Property	Details
Quantized By	bartowski
Pipeline Tag	image-text-to-text
License Link	https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct/blob/main/LICENSE
Language	en
License	other
Base Model Relation	quantized
License Name	qwen
Base Model	Qwen/Qwen2.5-VL-72B-Instruct
Tags	multimodal

👨‍💻 Model Creators

Model creator: Qwen
Original model: Qwen2.5-VL-72B-Instruct
GGUF quantization: provided by bartowski based on llama.cpp release b5317

🔧 Technical Details

Supports a context length of 128k tokens.
Proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
Capable of acting as a visual agent that can reason and dynamically direct tools, enabling computer and phone use.
Useful for generating structured outputs and stable JSON outputs.
Supports multiple languages.

🙏 Special Thanks

Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.

⚠️ Disclaimers

LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error - free, viruses - free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご