C

Chinese Clip Vit Base Patch16

Developed by OFA-Sys
The base version of Chinese CLIP, using ViT-B/16 as the image encoder and RoBERTa-wwm-base as the text encoder, trained on a large-scale dataset of approximately 200 million Chinese image-text pairs.
Downloads 49.02k
Release Time : 11/9/2022

Model Overview

Chinese CLIP is a vision-language model capable of computing image and text embeddings and their similarity, supporting Chinese image-text retrieval and classification tasks.

Model Features

Chinese Optimization
Specifically optimized for Chinese language and scenarios, supporting Chinese image-text retrieval and classification tasks.
Large-Scale Training
Trained on a large-scale dataset of approximately 200 million Chinese image-text pairs, with strong generalization capabilities.
Multi-Task Support
Supports various vision-language tasks, including image-text retrieval and image classification.

Model Capabilities

Compute image and text embeddings
Compute image-text similarity
Chinese image-text retrieval
Zero-shot image classification

Use Cases

E-commerce
Product Search
Search for relevant product images using text descriptions
Achieves R@1 of 63.0 on the MUGE dataset
Content Moderation
Inappropriate Content Detection
Detect inappropriate images using text descriptions
Social Media
Image-Text Matching
Automatically generate appropriate text descriptions for images
Achieves image-to-text R@1 of 81.6 on the Flickr30K-CN dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase