SmolVLM2-2.2B-Instruct-4bit Open-Source Vision-Language Model - Efficiently Handle Video-Text-to-Text Tasks

Smolvlm2 2.2B Instruct 4bit

Developed by smdesai

SmolVLM2-2.2B-Instruct-4bit is a vision-language model based on MLX format conversion, focusing on video text-to-text tasks.

Image-to-Text

Transformers

EnglishOpen Source License:Apache-2.0 #Video Text Generation #Multimodal Instruction Fine-tuning #4-bit Quantized Efficient Inference

Downloads 24

Release Time : 2/20/2025

Model Overview

This model is converted from HuggingFaceTB/SmolVLM2-2.2B-Instruct, supporting multimodal interaction between video and text, suitable for tasks like video description generation.

Model Features

Multimodal Support

Supports interaction between video and text, capable of processing video content and generating relevant textual descriptions.

Efficient Inference

Utilizes 4-bit quantization technology to reduce model resource requirements and improve inference efficiency.

Extensive Dataset Training

Trained on multiple high-quality datasets, including Docmatix, LLaVA-OneVision-Data, etc.

Model Capabilities

Video Content Understanding

Text Generation

Multimodal Interaction

Use Cases

Video Content Analysis

Video Description Generation

Generates detailed textual descriptions based on video content.

Produces accurate and coherent video description texts.

Education

Video-Assisted Learning

Generates supplementary text for educational videos to aid learners in better understanding the content.

Enhances learning experience and comprehension.

Property	Details
Library Name	transformers
Model Type	smdesai/SmolVLM2-2.2B-Instruct-4bit
Base Model	HuggingFaceTB/SmolVLM-Instruct
Pipeline Tag	video-text-to-text
Tags	mlx
Training Datasets	HuggingFaceM4/the_cauldron, HuggingFaceM4/Docmatix, lmms-lab/LLaVA-OneVision-Data, lmms-lab/M4-Instruct-Data, HuggingFaceFV/finevideo, MAmmoTH-VL/MAmmoTH-VL-Instruct-12M, lmms-lab/LLaVA-Video-178K, orrzohar/Video-STaR, Mutonix/Vript, TIGER-Lab/VISTA-400K, Enxin/MovieChat-1K_train, ShareGPT4Video/ShareGPT4Video

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Smolvlm2 2.2B Instruct 4bit

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 smdesai/SmolVLM2-2.2B-Instruct-4bit

🚀 Quick Start

📦 Installation

Install the required package

💻 Usage Examples

Basic Usage

📄 License

📚 Documentation

Model Information