C

Clip Finetuned Csu P14 336 E3l57 L

Developed by kevinoli
This model is a fine-tuned version of openai/clip-vit-large-patch14-336, primarily used for image-text matching tasks.
Downloads 31
Release Time : 8/21/2024

Model Overview

A vision-language model based on the CLIP architecture, fine-tuned for cross-modal tasks such as image classification and image retrieval.

Model Features

Cross-modal understanding
Capable of processing both visual and textual information to establish semantic connections between them
High-resolution processing
Supports input resolution of 336x336 pixels, higher than standard CLIP models
Fine-tuning optimization
Fine-tuned for 3 epochs on a specific dataset, with validation loss reduced to 0.47

Model Capabilities

Image-text matching
Zero-shot image classification
Cross-modal retrieval
Image feature extraction

Use Cases

Content retrieval
Text-based image search
Retrieve relevant images using natural language descriptions
Content moderation
Inappropriate content detection
Detect non-compliant image content through text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase