T

Tarsier 7b

Developed by omni-research
Tarsier-7b is an open-source large-scale video-language model from the Tarsier series, specializing in generating high-quality video descriptions with excellent general video understanding capabilities.
Downloads 635
Release Time : 7/4/2024

Model Overview

Tarsier-7b is an open-source large-scale video-language model designed to generate high-quality video descriptions while possessing outstanding general video understanding capabilities. It is a member of the Tarsier series, built upon the liuhaotian/llava-v1.6-vicuna-7b model.

Model Features

High-Quality Video Description Generation
Capable of generating high-quality video descriptions suitable for various video content.
General Video Understanding Capabilities
Possesses excellent general video understanding capabilities, performing well across multiple benchmarks.
Two-Stage Training Strategy
Adopts a two-stage training strategy of multi-task pre-training and multi-granularity instruction fine-tuning to enhance model performance.

Model Capabilities

Video Description Generation
Video Question Answering
Multi-Granularity Video Understanding
Open-Ended Video Question Answering
Video Caption Generation

Use Cases

Video Content Analysis
Video Description Generation
Generates detailed textual descriptions for videos, suitable for video content indexing and retrieval.
High-quality video descriptions
Video Question Answering
Answers complex questions about video content, applicable in fields like education and entertainment.
Accurate video question answering results
Video Caption Generation
Automatic Caption Generation
Automatically generates captions for videos to enhance accessibility.
High-quality caption content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase