Pixtral-12b-GGUF Open-Source Multimodal Large Model - Process Images and Texts for Free, Supports Variable Sizes

Pixtral 12b GGUF

Developed by lmstudio-community

A multimodal large model launched by Mistral-Community, supporting image and text processing with 128k context length and variable image size handling capabilities.

Image-to-Text Open Source License:Apache-2.0 #Multimodal Image Understanding #128k Long Context #Variable Image Size Support

Downloads 611

Release Time : 4/25/2025

Model Overview

pixtral-12b is a native multimodal model capable of processing both image and text inputs to generate text outputs. Suitable for complex multimodal tasks.

Model Features

128k Long Context Support

Supports context processing up to 128k, suitable for handling long documents and complex tasks

Variable Image Size Processing

Capable of processing input images of different sizes, adapting to various visual task requirements

Native Multimodal

Natively supports joint processing of images and text, achieving true multimodal understanding

Model Capabilities

Image Understanding

Text Generation

Multimodal Reasoning

Long Document Processing

Use Cases

Multimodal Content Generation

Image Caption Generation

Generate detailed textual descriptions based on input images

Can produce accurate and rich image descriptions

Visual Question Answering

Answer complex questions about image content

Can understand image details and provide accurate answers

Long Document Processing

Long Document Summarization

Process documents with up to 128k context and generate summaries

Maintains contextual consistency in long documents

🚀 💫 Community Model: pixtral 12b by Mistral-Community

👾 This is part of the LM Studio Community models highlights program, which showcases new and remarkable models from the community. Join the discussion on Discord.

📦 Model Information

Property	Details
Quantized By	bartowski
Pipeline Tag	image-text-to-text
License	apache-2.0
Base Model Relation	quantized
Base Model	mistral-community/pixtral-12b

📚 Model Details

Model creator: mistral-community Original model: pixtral-12b GGUF quantization: provided by bartowski based on llama.cpp release b5173

🔧 Technical Details

Supports a context length of 128k
Supports variable image sizes
Natively multimodal

🙏 Special Thanks

Special thanks to Georgi Gerganov and the whole team working on llama.cpp for making all of this possible.

📄 Disclaimers

⚠️ Important Note

LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご