Cogflorence 2.2 Large
C
Cogflorence 2.2 Large
Developed by thwri
This model is a fine-tuned version of microsoft/Florence-2-large, trained on a 40,000-image subset of the Ejafa/ye-pop dataset, with annotation texts generated by THUDM/cogvlm2-llama3-chat-19B, suitable for image-to-text tasks.
Downloads 20.64k
Release Time : 8/23/2024
Model Overview
A fine-tuned vision-language model focused on generating detailed image descriptions and annotations.
Model Features
High-Quality Image Annotation
Capable of generating detailed and accurate image descriptions, capturing both details and emotions in the image
Multi-Stage Annotation Processing
Annotation texts are generated by CogVLM2 and then processed by Gemma, improving clarity of expression
Optimized Visual Encoding
Visual encoder parameters remain frozen during training, ensuring stability of visual features
Model Capabilities
Image Description Generation
Image Content Analysis
Visual Scene Understanding
Detailed Image Annotation
Use Cases
Content Creation
Automatic Image Annotation
Automatically generate detailed descriptions for images in a library
Improves image retrieval efficiency and enhances accessibility
Assistive Technology
Visual Impairment Assistance
Provide detailed image descriptions for visually impaired users
Helps in understanding visual content
Featured Recommended AI Models