CLIP-ViT-L-14-CommonPool.XL.laion-s13B-b90K Open-source Model - Supports Zero-shot Image Classification

CLIP ViT L 14 CommonPool.XL.laion S13b B90k

Developed by laion

A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks, trained on the laion dataset

Text-to-Image Open Source License:MIT #Zero-shot Image Classification #Multimodal Understanding #Large-scale Pretraining

Downloads 176

Release Time : 4/26/2023

Model Overview

This model is a variant of the CLIP architecture, combining Vision Transformer (ViT) and a text encoder, capable of understanding the relationship between images and text, suitable for cross-modal tasks such as zero-shot image classification.

Model Features

Zero-shot Learning Capability

Can perform image classification tasks without task-specific training

Cross-modal Understanding

Capable of processing and understanding both visual and textual information

Large-scale Pretraining

Trained on the large-scale laion-s13B-b90K dataset

Model Capabilities

Image Classification

Cross-modal Retrieval

Image-Text Matching

Use Cases

Content Management

Automatic Image Tagging

Automatically generates descriptive tags for unlabeled images

E-commerce

Visual Search

Searches for relevant product images via text queries

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

CLIP ViT L 14 CommonPool.XL.laion S13b B90k

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Model card for CLIP-ViT-L-14-CommonPool.XL.laion-s13B-b90K

🚀 Quick Start

✨ Features

📦 Installation

📄 License