Qwen2.5-Omni-3B-GGUF Open-Source Multimodal Model - Support for Text, Audio, and Image Inputs

Qwen2.5 Omni 3B GGUF

Developed by ggml-org

Qwen2.5-Omni-3B is a multimodal model that supports text, audio, and image input, but does not support video input or audio generation.

Large Language Model EnglishOpen Source License:Other #Multimodal input #Lightweight 3B parameters #Audio image text processing

Downloads 126

Release Time : 5/26/2025

Model Overview

Qwen2.5-Omni-3B is a multimodal model capable of processing text, audio, and image inputs, suitable for various tasks such as text generation, image analysis, and speech recognition.

Model Features

Multimodal support

Supports text, audio, and image input, suitable for various tasks.

Efficient inference

With a parameter scale of 3B, it is suitable for efficient operation on various hardware.

Model Capabilities

Text generation

Image analysis

Speech recognition

Use Cases

Natural language processing

Text generation

Generate coherent text content, suitable for scenarios such as chatbots and content creation.

Computer vision

Image analysis

Analyze image content and extract key information, suitable for tasks such as image classification and object detection.

Speech processing

Speech recognition

Convert audio input into text, suitable for scenarios such as voice assistants and transcription services.

Property	Details
License Type	Other
License Name	qwen-research
License Link	https://huggingface.co/Qwen/Qwen2.5-Omni-3B/blob/main/LICENSE
Base Model	Qwen/Qwen2.5-Omni-3B
Pipeline Tag	any-to-any
Tags	multimodal

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Qwen2.5 Omni 3B GGUF

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Qwen2.5-Omni-3B-GGUF

🚀 Quick Start

Modalities

Reference PR

📄 License