SmolDocling-256M-preview-mlx-fp16 Open-Source Vision-Language Model

Home

Smoldocling 256M Preview Mlx Fp16

Developed by ahishamm

This model is converted from ds4sd/SmolDocling-256M-preview to the MLX format, supporting image-text-to-text tasks.

Image-to-Text

Transformers

EnglishOpen Source License:Apache-2.0 #MLX optimization #Image-text generation #Lightweight model

Downloads 24

Release Time : 3/17/2025

Model Overview

SmolDocling-256M-preview-mlx-fp16 is a vision-language model based on the MLX framework, primarily used for image-text-to-text tasks. It is converted from the original model ds4sd/SmolDocling-256M-preview and optimized for efficient operation on Apple silicon.

Model Features

MLX format optimization

The model has been converted to the MLX format, making it particularly suitable for efficient operation on Apple silicon.

Vision-language processing

Supports image-text-to-text tasks, capable of understanding and generating text content related to images.

Lightweight model

With a parameter size of 256M, it is suitable for deployment and use in resource-constrained environments.

Model Capabilities

Image-text understanding

Text generation

Vision-language task processing

Use Cases

Document processing

Image document parsing

Extract text information from images and generate structured text.

Multimodal applications

Image caption generation

Generate descriptive text based on input images.

Property	Details
Library Name	transformers
License	apache - 2.0
Base Model	ds4sd/SmolDocling-256M-preview
Pipeline Tag	image-text-to-text
Tags	mlx, mlxvlm

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Smoldocling 256M Preview Mlx Fp16

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 ahishamm/SmolDocling-256M-preview-mlx-fp16

🚀 Quick Start

📦 Installation

💻 Usage Examples

Basic Usage

📄 License