# ViT-RoBERTa architecture
Vit Roberta Fa Image Captioning Flickr30k
A Persian image captioning model based on ViT+RoBERTa architecture, specifically designed to generate Persian text descriptions from images
Image-to-Text Other
V
hezarai
85
1
Clip Fa Vision
CLIPfa is the Persian version of OpenAI's CLIP model, connecting Persian text and image representations through contrastive learning
Text-to-Image
Transformers

C
SajjadAyoubi
43
5
Featured Recommended AI Models