AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Multimodal Input Support

# Multimodal Input Support

DAM 3B
Other
DAM-3B is a 3-billion-parameter vision-language model capable of generating fine-grained local descriptions for user-specified image regions.
Image-to-Text Safetensors English
D
nvidia
1,417
81
NABLA VL
Apache-2.0
A Japanese Vision-Language Model (VLM) developed by NABLAS, supporting image, multi-image, and video inputs, suitable for various multimodal tasks.
Image-to-Text Transformers Japanese
N
nablasinc
1,673
2
Hunyuan Gguf
Other
Tencent Hunyuan Community Edition's text-to-video model, capable of generating high-quality video content from text prompts.
Text-to-Video English
H
calcuis
1,871
61
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase