C

Clip Fa Vision

Developed by SajjadAyoubi
CLIPfa is the Persian version of OpenAI's CLIP model, connecting Persian text and image representations through contrastive learning
Downloads 43
Release Time : 3/2/2022

Model Overview

A multimodal model based on contrastive learning that maps Persian text and images to a shared vector space for cross-modal retrieval and matching

Model Features

Persian adaptation
Uses Farahani's RoBERTa-fa as the text encoder, specifically optimized for Persian text understanding
Lightweight training
Effectively trained with only 400,000 data pairs (1/10 of the original version)
Dual-modal alignment
Visual and text encoders output a 768-dimensional shared vector space

Model Capabilities

Persian image-text matching
Cross-modal vector retrieval
Image semantic search
Text-guided image classification

Use Cases

Multimedia retrieval
Persian image search
Search for related images using Persian descriptions
Demonstrates retrieval effectiveness in a 25,000-image gallery
Content moderation
Multilingual inappropriate content detection
Detect inappropriate images through Persian text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase