K

Kangaroo

Developed by KangarooGroup
Kangaroo is a powerful multimodal large language model specifically designed for long video understanding, supporting bilingual dialogue (Chinese-English) and long video inputs.
Downloads 163
Release Time : 7/11/2024

Model Overview

The Kangaroo model specializes in video understanding tasks, including video captioning, Q&A, and dialogue, with exceptional capability in processing long video inputs (up to 160 frames).

Model Features

Long Video Input Support
Innovatively handles videos with varying frame counts and aspect ratios by extending input capacity to 160 frames
Outstanding Performance
Achieves or surpasses SOTA levels on multiple video understanding benchmarks
Video Annotation System
Developed a data filtering and auto-annotation system to generate large-scale video-text datasets
Bilingual Dialogue Capability
Supports single/multi-turn video dialogues in both Chinese and English

Model Capabilities

Video Content Description
Video Q&A
Video Dialogue
Long Video Understanding
Bilingual Processing (Chinese-English)

Use Cases

Video Content Analysis
Video Summarization
Automatically generates textual summaries of video content
Accurately captures key video content
Intelligent Customer Service
Video Product Q&A
Answers user questions about products in videos
Provides accurate product information solutions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase