C

Chinese Clip Vit Large Patch14

Developed by OFA-Sys
Chinese CLIP model, based on VIT architecture, supports Chinese vision-language tasks
Downloads 2,333
Release Time : 11/9/2022

Model Overview

This is a Chinese CLIP model based on the Vision Transformer architecture, capable of joint representation learning for images and text, suitable for cross-modal retrieval and classification tasks.

Model Features

Chinese Cross-modal Understanding
A vision-language joint representation model optimized specifically for Chinese scenarios
Efficient Visual Encoding
Based on ViT architecture, capable of efficiently processing image inputs
Zero-shot Classification Capability
Supports zero-shot image classification based on text descriptions

Model Capabilities

Image-text matching
Cross-modal retrieval
Zero-shot image classification
Chinese vision-language understanding

Use Cases

Content Moderation
Inappropriate Content Detection
Detect inappropriate image content through text descriptions
Can identify specific types of inappropriate content
E-commerce
Product Search
Search for related product images through text descriptions
Improves product search accuracy
Social Media
Content Recommendation
Recommend related image-text content based on user interests
Enhances user engagement
Featured Recommended AI Models
ยฉ 2025AIbase