vit_base_patch32_clip_224.laion400m_e31 Open Source Visual Model - Supports Dual Framework for Image Data Processing

Home

Vit Base Patch32 Clip 224.laion400m E31

Developed by timm

Vision Transformer model trained on the LAION-400M dataset, compatible with both OpenCLIP and timm frameworks

Image Classification

Safetensors

Open Source License:MIT #Zero-shot Image Classification #Multimodal CLIP Architecture #LAION-400M Pretraining

Downloads 10.90k

Release Time : 10/23/2024

Model Overview

This is a vision-language model based on the Vision Transformer architecture, primarily used for zero-shot image classification tasks. The model employs 32x32 patch size and 224x224 input resolution, optimized with quickgelu activation function during training.

Model Features

Dual Framework Compatibility

Supports both OpenCLIP and timm frameworks, offering flexible usage options

Quick Activation Function

Utilizes quickgelu activation function to optimize the training process

Large-scale Training Data

Trained on the extensive LAION-400M dataset

Model Capabilities

Zero-shot Image Classification

Image Feature Extraction

Cross-modal Representation Learning

Use Cases

Computer Vision

Image Classification

Classify images of new categories without specific training

Image Retrieval

Retrieve relevant images based on text descriptions

Multimodal Applications

Image-Text Matching

Evaluate the alignment between images and text descriptions

Property	Details
Model Type	Dual use `open_clip` and `timm` model
Training Data	LAION - 400M

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Vit Base Patch32 Clip 224.laion400m E31

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Model Card for vit_base_patch32_clip_224.laion400m_e31

🚀 Quick Start

📚 Documentation

Model Details

Model Usage

📄 License