# Multimodal Adaptation
Sam2 Hiera Small.fb R896
Apache-2.0
SAM2 model based on the HieraDet image encoder, focused on image feature extraction tasks.
Image Segmentation
Transformers

S
timm
142
0
Resnet101 Clip.yfcc15m
MIT
CLIP-style dual-modal model trained on YFCC-15M dataset, compatible with both open_clip and timm frameworks
Image Classification
R
timm
134
0
Mambavision B 1K
Apache-2.0
PAVE is a model focused on repairing and adapting video large language models, aiming to enhance the conversion capability between video and text.
Video-to-Text
Transformers

M
nvidia
1,082
11
Featured Recommended AI Models