# CLIP Image Encoding
Git Large R Textcaps
MIT
GIT is a dual-conditioned Transformer decoder based on CLIP image tokens and text tokens, designed for tasks such as image caption generation and visual question answering.
Image-to-Text
Transformers Supports Multiple Languages

G
microsoft
51
10
Git Base Vqav2
MIT
GIT is a Transformer decoder-based vision-language model trained with CLIP image tokens and text token conditioning, suitable for tasks like image captioning and visual question answering.
Image-to-Text
Transformers Supports Multiple Languages

G
microsoft
199
19
Git Base
MIT
GIT is a dual-conditional Transformer decoder based on CLIP image tokens and text tokens, designed for image-to-text generation tasks.
Image-to-Text
Transformers Supports Multiple Languages

G
microsoft
365.74k
93
Featured Recommended AI Models