Cockatiel 8B
A video caption generation model based on VILA-v1.5-8B, capable of generating detailed and human-preference-aligned captions for input videos.
Downloads 19
Release Time : 3/12/2025
Model Overview
This model achieves fine-grained video caption generation through the integration of synthetic data and human preference training, suitable for scenarios requiring high-quality video descriptions.
Model Features
Fine-grained Video Caption Generation
Capable of generating detailed and human-preference-aligned captions for input videos.
Synthetic Data and Human Preference Training
Achieves high-quality caption generation through the integration of synthetic data and human preference training.
Built on VILA-v1.5-8B
Constructed based on the powerful VILA-v1.5-8B model, delivering competitive performance.
Model Capabilities
Video Caption Generation
Multimodal Understanding
Detailed Description Generation
Use Cases
Video Content Understanding
Video Caption Generation
Generates detailed and human-preference-aligned captions for input videos.
High-quality video descriptions suitable for video content understanding and retrieval.
Multimodal Applications
Video Content Analysis
Performs content analysis by combining video and textual information.
Enhances the accuracy and detail level of video content understanding.
Featured Recommended AI Models
Š 2025AIbase