Model Selection

Multimodal Input Support

# Multimodal Input Support

DAM-3B is a 3-billion-parameter vision-language model capable of generating fine-grained local descriptions for user-specified image regions.

Safetensors English

A Japanese Vision-Language Model (VLM) developed by NABLAS, supporting image, multi-image, and video inputs, suitable for various multimodal tasks.

Transformers Japanese

Tencent Hunyuan Community Edition's text-to-video model, capable of generating high-quality video content from text prompts.

Text-to-Video English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase