# Multimodal Input Support
DAM 3B
Other
DAM-3B is a 3-billion-parameter vision-language model capable of generating fine-grained local descriptions for user-specified image regions.
Image-to-Text
Safetensors English
D
nvidia
1,417
81
NABLA VL
Apache-2.0
A Japanese Vision-Language Model (VLM) developed by NABLAS, supporting image, multi-image, and video inputs, suitable for various multimodal tasks.
Image-to-Text
Transformers Japanese

N
nablasinc
1,673
2
Hunyuan Gguf
Other
Tencent Hunyuan Community Edition's text-to-video model, capable of generating high-quality video content from text prompts.
Text-to-Video English
H
calcuis
1,871
61
Featured Recommended AI Models