L

Longclip GmP ViT L 14

Developed by zer0int
A CLIP model fine-tuned based on BeichenZhang/LongCLIP-L, supporting long-text input (248 tokens) with performance enhanced by Geometric parameterization (GmP) technology
Downloads 4,859
Release Time : 6/15/2024

Model Overview

An improved CLIP model that breaks the traditional 77-token limit, specially optimized for long-text comprehension, and can serve as a text encoder for generative models like SDXL/Stable Diffusion

Model Features

Long-text support
Supports 248-token input (traditional CLIP only 77 tokens), significantly improving comprehension of long-text descriptions
Geometric parameterization (GmP)
Maintains the geometric properties of pre-trained knowledge through weight decomposition techniques, enhancing fine-tuning stability
Label smoothing loss
Uses a custom loss function, particularly suitable for small-batch/narrow-domain fine-tuning scenarios
Generative model compatibility
Can directly replace the text encoder of generative models like Stable Diffusion/Flux.1

Model Capabilities

Long-text image matching
Generative model text encoding
Cross-modal retrieval
Zero-shot classification

Use Cases

AI-generated content
SDXL text encoding enhancement
Serves as the text encoder for Stable Diffusion XL, supporting more detailed long-text prompts
Cosine similarity with 248-token input improves by approximately 29% compared to the 77-token truncated version
Cross-modal retrieval
E-commerce product search
Matches corresponding images based on detailed product descriptions
After narrow-domain fine-tuning, ImageNet accuracy reaches 0.89
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase