D

DAM 3B

Developed by nvidia
DAM-3B is a 3-billion-parameter vision-language model capable of generating fine-grained local descriptions for user-specified image regions.
Downloads 1,417
Release Time : 4/21/2025

Model Overview

This model takes user-specified image regions in the form of points/boxes/scribbles/masks as input and generates fine-grained local descriptions of the image. It integrates global context with fine-grained local details through an innovative focus prompt mechanism and an enhanced local visual backbone network using gated cross-attention.

Model Features

Fine-grained Local Description
Capable of generating detailed descriptions for any user-specified image region
Multi-form Region Specification
Supports various forms of region specification including points, boxes, scribbles, and masks
Focus Prompt Mechanism
Innovative attention mechanism integrating global context with local details
Gated Cross-Attention
Enhanced local visual backbone network improves description quality

Model Capabilities

Image region description generation
Multi-form region input processing
Fine-grained visual understanding

Use Cases

Computer Vision Research
Fine-grained Image Understanding
Used to study the model's ability to understand local image details
Assistive Technology
Visual Assistance Description
Provides detailed descriptions of specific image regions for visually impaired individuals
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase