Swin Aragpt2 Image Captioning V3
S
Swin Aragpt2 Image Captioning V3
Developed by AsmaMassad
An image captioning model based on Swin Transformer and AraGPT2 architecture, capable of generating textual descriptions for input images.
Downloads 18
Release Time : 6/6/2023
Model Overview
This model is a vision-language model that combines the image encoding capability of Swin Transformer with the text generation capability of AraGPT2, specifically designed for image captioning tasks.
Model Features
Multimodal Architecture
Combines vision Transformer and language model to achieve image-to-text conversion
End-to-End Training
The entire model is fine-tuned end-to-end to optimize joint capabilities of image understanding and text generation
Cross-Modal Understanding
Capable of understanding image content and generating coherent descriptive text
Model Capabilities
Image Content Understanding
Arabic Text Generation
Image-to-Text Conversion
Use Cases
Assistive Technology
Visual Impairment Assistance
Generates image descriptions for visually impaired users
Content Generation
Social Media Content Auto-Generation
Automatically generates descriptive text for uploaded images
Featured Recommended AI Models
Š 2025AIbase