Videochat Flash Qwen2 7B Res448
Apache-2.0
VideoChat-Flash-7B is a multimodal model built upon UMT-L (300M) and Qwen2-7B, using only 16 tokens per frame and supporting input sequences of up to approximately 10,000 frames.
Video-to-Text
Transformers English