AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
ViT-RoBERTa architecture

# ViT-RoBERTa architecture

Vit Roberta Fa Image Captioning Flickr30k
A Persian image captioning model based on ViT+RoBERTa architecture, specifically designed to generate Persian text descriptions from images
Image-to-Text Other
V
hezarai
85
1
Clip Fa Vision
CLIPfa is the Persian version of OpenAI's CLIP model, connecting Persian text and image representations through contrastive learning
Text-to-Image Transformers
C
SajjadAyoubi
43
5
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase