Tarsier 7b
Tarsier-7b is an open-source large-scale video-language model from the Tarsier series, specializing in generating high-quality video descriptions with excellent general video understanding capabilities.
Downloads 635
Release Time : 7/4/2024
Model Overview
Tarsier-7b is an open-source large-scale video-language model designed to generate high-quality video descriptions while possessing outstanding general video understanding capabilities. It is a member of the Tarsier series, built upon the liuhaotian/llava-v1.6-vicuna-7b model.
Model Features
High-Quality Video Description Generation
Capable of generating high-quality video descriptions suitable for various video content.
General Video Understanding Capabilities
Possesses excellent general video understanding capabilities, performing well across multiple benchmarks.
Two-Stage Training Strategy
Adopts a two-stage training strategy of multi-task pre-training and multi-granularity instruction fine-tuning to enhance model performance.
Model Capabilities
Video Description Generation
Video Question Answering
Multi-Granularity Video Understanding
Open-Ended Video Question Answering
Video Caption Generation
Use Cases
Video Content Analysis
Video Description Generation
Generates detailed textual descriptions for videos, suitable for video content indexing and retrieval.
High-quality video descriptions
Video Question Answering
Answers complex questions about video content, applicable in fields like education and entertainment.
Accurate video question answering results
Video Caption Generation
Automatic Caption Generation
Automatically generates captions for videos to enhance accessibility.
High-quality caption content
Featured Recommended AI Models