C

Chinese Clip Vit Huge Patch14

Developed by OFA-Sys
Chinese CLIP is a multimodal model based on the Vision Transformer architecture, supporting Chinese vision-language tasks.
Downloads 623
Release Time : 11/9/2022

Model Overview

This model combines visual and language processing capabilities, enabling the understanding of associations between Chinese text and images, suitable for cross-modal retrieval and classification tasks.

Model Features

Chinese Multimodal Understanding
Optimized specifically for Chinese scenarios, capable of processing both image and Chinese text inputs
Vision Transformer Architecture
Adopts ViT-Base structure with 16x16 image patch processing, balancing performance and efficiency
Zero-shot Classification Capability
Can perform image classification tasks via text prompts without fine-tuning

Model Capabilities

Image-Text Matching
Cross-modal Retrieval
Zero-shot Image Classification
Chinese Scene Understanding

Use Cases

Content Moderation
Inappropriate Content Detection
Detect inappropriate image content through text descriptions
Can identify sensitive content in specific scenarios
E-commerce
Product Search
Find matching product images through natural language descriptions
Improves search accuracy and user experience
Featured Recommended AI Models
ยฉ 2025AIbase