SmolVLM2-2.2B-Instruct Open-Source Vision-Language Model - Supports English Video Text-to-Text Tasks

Smolvlm2 2.2B Instruct I1 GGUF

Developed by mradermacher

SmolVLM2-2.2B-Instruct is a vision-language model with a parameter scale of 2.2B, focusing on video text-to-text tasks and supporting English.

EnglishOpen Source License:Apache-2.0 #Video text generation #Multimodal instructions #Lightweight quantization

Downloads 285

Release Time : 4/25/2025

Model Overview

This model is a quantized version of the vision-language model, trained on multiple video and text datasets, suitable for video content understanding and generation tasks.

Model Features

Trained on multiple datasets

The model is trained on multiple high-quality video and text datasets, including the_cauldron, Docmatix, LLaVA-OneVision-Data, etc.

Diverse quantization versions

Multiple quantization versions are provided, ranging from the extremely low-quality IQ1_S to the high-quality Q6_K, meeting different hardware and performance requirements.

Video understanding ability

Focuses on the understanding of video content and text generation, suitable for tasks such as video subtitle generation and video content analysis.

Model Capabilities

Video content understanding

Text generation

Video subtitle generation

Multimodal reasoning

Use Cases

Video content analysis

Video subtitle generation

Generate descriptive subtitles for video content

Video content summarization

Extract key information from the video and generate a summary

Education

Educational video explanation

Generate explanatory text for educational videos

🚀 SmolVLM2-2.2B-Instruct Quantized Model

This project provides quantized versions of the SmolVLM2-2.2B-Instruct model, offering various options for different usage scenarios.

🚀 Quick Start

If you are unsure how to use GGUF files, refer to one of TheBloke's READMEs for more details, including on how to concatenate multi-part files.

✨ Features

Multiple Datasets: Trained on a diverse set of datasets, including HuggingFaceM4/the_cauldron, HuggingFaceM4/Docmatix, and many others.
Video-Text-to-Text: Capable of handling video and text input and generating text output.
Quantized Versions: Available in various quantized formats for different performance and quality requirements.

📦 Installation

No specific installation steps are provided in the original README.

📚 Documentation

Model Information

Property	Details
Base Model	HuggingFaceTB/SmolVLM2-2.2B-Instruct
Datasets	HuggingFaceM4/the_cauldron, HuggingFaceM4/Docmatix, lmms-lab/LLaVA-OneVision-Data, etc.
Language	en
Library Name	transformers
License	apache-2.0
Quantized By	mradermacher
Tags	video-text-to-text

About

Weighted/imatrix quants of https://huggingface.co/HuggingFaceTB/SmolVLM2-2.2B-Instruct. Static quants are available at https://huggingface.co/mradermacher/SmolVLM2-2.2B-Instruct-GGUF.

Provided Quants

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Link	Type	Size/GB	Notes
GGUF	i1-IQ1_S	0.5	for the desperate
GGUF	i1-IQ1_M	0.6	mostly desperate
GGUF	i1-IQ2_XXS	0.6
GGUF	i1-IQ2_S	0.7
GGUF	i1-IQ2_M	0.8
GGUF	i1-Q2_K_S	0.8	very low quality
GGUF	i1-Q2_K	0.8	IQ3_XXS probably better
GGUF	i1-IQ3_XXS	0.8	lower quality
GGUF	i1-IQ3_XS	0.9
GGUF	i1-IQ3_S	0.9	beats Q3_K*
GGUF	i1-Q3_K_S	0.9	IQ3_XS probably better
GGUF	i1-IQ3_M	1.0
GGUF	i1-Q3_K_M	1.0	IQ3_S probably better
GGUF	i1-Q3_K_L	1.1	IQ3_M probably better
GGUF	i1-IQ4_XS	1.1
GGUF	i1-IQ4_NL	1.1	prefer IQ4_XS
GGUF	i1-Q4_0	1.2	fast, low quality
GGUF	i1-Q4_K_S	1.2	optimal size/speed/quality
GGUF	i1-Q4_K_M	1.2	fast, recommended
GGUF	i1-Q4_1	1.3
GGUF	i1-Q5_K_S	1.4
GGUF	i1-Q5_K_M	1.4
GGUF	i1-Q6_K	1.6	practically like static Q6_K

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

FAQ / Model Request

See https://huggingface.co/mradermacher/model_requests for some answers to questions you might have and/or if you want some other model quantized.

📄 License

This project is licensed under the apache-2.0 license.

👏 Thanks

I thank my company, nethype GmbH, for letting me use its servers and providing upgrades to my workstation to enable this work in my free time. Additional thanks to @nicoboss for giving me access to his private supercomputer, enabling me to provide many more imatrix quants, at much higher quality, than I would otherwise be able to.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご