Fashion-Mnist-SigLIP2 Open-Source Fashion Image Classification Model - Accurately Classify Images in the Fashion-MNIST Dataset

Fashion Mnist SigLIP2

Developed by prithivMLmods

A fashion image classification model fine-tuned based on the SigLIP2 architecture, specifically designed for the Fashion-MNIST dataset

Image Classification

Transformers

EnglishOpen Source License:Apache-2.0 #Fashion Image Classification #High-Precision Classification #E-commerce Product Tagging

Downloads 439

Release Time : 3/21/2025

Model Overview

This model is a vision-language encoder capable of classifying fashion images into 10 predefined Fashion-MNIST categories, such as T-shirts, trousers, dresses, etc.

Model Features

High-Precision Classification

Achieves 91.8% accuracy on the Fashion-MNIST test set, with F1 scores exceeding 99% for certain categories like trousers and bags

Based on SigLIP2 Architecture

Utilizes the google/siglip2-base-patch16-224 base model, featuring improved semantic understanding and localization capabilities

Lightweight Deployment

Supports rapid deployment via the Transformers library and is compatible with interactive demo tools like Gradio

Model Capabilities

Fashion Image Classification

Multi-Class Recognition

Visual Feature Extraction

Use Cases

E-Commerce

Product Auto-Classification

Automatically classify clothing products for online retail platforms

Optimizes product search and recommendation systems

Inventory Management

Automate the classification of fashion items in inventory

Improves inventory management efficiency

Education & Research

AI Teaching Example

Serves as a practical case for computer vision and machine learning courses

🚀 Fashion-Mnist-SigLIP2

Fashion-Mnist-SigLIP2 is an image classification vision-language encoder model. It is fine-tuned from google/siglip2-base-patch16-224 for single-label classification tasks, aiming to classify images into Fashion-MNIST categories using the SiglipForImageClassification architecture.

🚀 Quick Start

Run with Transformers🤗

First, install the necessary libraries:

!pip install -q transformers torch pillow gradio

Then, you can use the following code to classify fashion images:

import gradio as gr
from transformers import AutoImageProcessor
from transformers import SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch

# Load model and processor
model_name = "prithivMLmods/Fashion-Mnist-SigLIP2"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)

def fashion_mnist_classification(image):
    """Predicts fashion category for an image."""
    image = Image.fromarray(image).convert("RGB")
    inputs = processor(images=image, return_tensors="pt")
    
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
    
    labels = {
        "0": "T-shirt / top", "1": "Trouser", "2": "Pullover", "3": "Dress", "4": "Coat",
        "5": "Sandal", "6": "Shirt", "7": "Sneaker", "8": "Bag", "9": "Ankle boot"
    }
    predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
    
    return predictions

# Create Gradio interface
iface = gr.Interface(
    fn=fashion_mnist_classification,
    inputs=gr.Image(type="numpy"),
    outputs=gr.Label(label="Prediction Scores"),
    title="Fashion MNIST Classification Labels",
    description="Upload an image to classify it into one of the 10 Fashion-MNIST categories."
)

# Launch the app
if __name__ == "__main__":
    iface.launch()

✨ Features

Accurate Classification: The model has high precision and recall in classifying fashion images, as shown in the following classification report:

Classification Report:
               precision    recall  f1-score   support

T-shirt / top     0.8142    0.9147    0.8615      6000
      Trouser     0.9935    0.9870    0.9902      6000
     Pullover     0.8901    0.8610    0.8753      6000
        Dress     0.9098    0.9300    0.9198      6000
         Coat     0.8636    0.8865    0.8749      6000
       Sandal     0.9857    0.9847    0.9852      6000
        Shirt     0.8076    0.6962    0.7478      6000
      Sneaker     0.9663    0.9695    0.9679      6000
          Bag     0.9779    0.9805    0.9792      6000
   Ankle boot     0.9698    0.9700    0.9699      6000

     accuracy                         0.9180     60000
    macro avg     0.9179    0.9180    0.9172     60000
 weighted avg     0.9179    0.9180    0.9172     60000

Multiple Use Cases: It can be used in fashion recognition, e-commerce applications, automated fashion sorting, and educational purposes.

📚 Documentation

Model Information

Property	Details
Model Type	Image Classification Vision-Language Encoder Model
Base Model	google/siglip2-base-patch16-224
Pipeline Tag	image-classification
Library Name	transformers
Tags	fashion, mnist, siglip2
Training Data	zalando-datasets/fashion_mnist

Classification Classes

The model categorizes images into the following 10 classes:

Class 0: "T-shirt / top"
Class 1: "Trouser"
Class 2: "Pullover"
Class 3: "Dress"
Class 4: "Coat"
Class 5: "Sandal"
Class 6: "Shirt"
Class 7: "Sneaker"
Class 8: "Bag"
Class 9: "Ankle boot"

Intended Use

The Fashion-Mnist-SigLIP2 model is designed for fashion image classification. It helps categorize clothing and footwear items into predefined Fashion-MNIST classes. Potential use cases include:

Fashion Recognition: Classifying fashion images into common categories like shirts, sneakers, and dresses.
E-commerce Applications: Assisting online retailers in organizing and tagging clothing items for better search and recommendations.
Automated Fashion Sorting: Helping automated inventory management systems classify fashion items.
Educational Purposes: Supporting AI and ML research in vision-based fashion classification models.

- visual selection.png SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features https://arxiv.org/pdf/2502.14786

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご