S

So400m Long

Developed by fancyfeast
A vision-language model fine-tuned based on SigLIP 2, with maximum text length increased from 64 to 256 tokens
Downloads 27
Release Time : 4/14/2025

Model Overview

This model is a fine-tuned version of SigLIP 2, focusing on extending context length and text type adaptation while preserving the original embedding space features and improving long-text processing capabilities

Model Features

Extended Context Length
Maximum text length increased from 64 tokens in the base model to 256 tokens
Preserved Original Features
Key components like the visual encoder tower are frozen to ensure the original embedding space features are retained
Multi-type Text Adaptation
Training data includes various image-text combinations such as descriptive captions, gallery tags, and prompts

Model Capabilities

Image-text matching
Cross-modal retrieval
Short-text preference recognition
Multi-type text processing

Use Cases

Content Retrieval
Gallery Tag Matching
Match relevant tag lists based on image content
Recognition capability for realistic images still has room for improvement
Multimodal Applications
Image-Text Pair Generation
Generate descriptive text or prompts for images
Tends to generate shorter text descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase