Model Selection

ViT-RoBERTa architecture

# ViT-RoBERTa architecture

Vit Roberta Fa Image Captioning Flickr30k

A Persian image captioning model based on ViT+RoBERTa architecture, specifically designed to generate Persian text descriptions from images

Image-to-Text Other

CLIPfa is the Persian version of OpenAI's CLIP model, connecting Persian text and image representations through contrastive learning

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase