Cockatiel - An 8B open-source video subtitle generation model that creates detailed subtitles for videos according to human preferences.

Cockatiel 8B

Developed by Fr0zencr4nE

A video caption generation model based on VILA-v1.5-8B, capable of generating detailed and human-preference-aligned captions for input videos.

Video-to-Text

Transformers

#Detailed Video Caption Generation #Human Preference Optimization #Multimodal Understanding

Downloads 19

Release Time : 3/12/2025

Model Overview

This model achieves fine-grained video caption generation through the integration of synthetic data and human preference training, suitable for scenarios requiring high-quality video descriptions.

Model Features

Fine-grained Video Caption Generation

Capable of generating detailed and human-preference-aligned captions for input videos.

Synthetic Data and Human Preference Training

Achieves high-quality caption generation through the integration of synthetic data and human preference training.

Built on VILA-v1.5-8B

Constructed based on the powerful VILA-v1.5-8B model, delivering competitive performance.

Model Capabilities

Video Caption Generation

Multimodal Understanding

Detailed Description Generation

Use Cases

Video Content Understanding

Video Caption Generation

Generates detailed and human-preference-aligned captions for input videos.

High-quality video descriptions suitable for video content understanding and retrieval.

Multimodal Applications

Video Content Analysis

Performs content analysis by combining video and textual information.

Enhances the accuracy and detail level of video content understanding.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Cockatiel 8B

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Cockatiel - Video Captioner Model

🚀 Quick Start

📚 Documentation

📄 License