# Visual Question Answering Fine-tuning
Vilt Finetuned 100
Apache-2.0
A vision-language model fine-tuned on VQA datasets based on the ViLT-B32-MLM model
Text-to-Image
Transformers

V
bangbrecho
15
0
Blip Gqa Ft
MIT
A fine-tuned vision-language model based on Salesforce/blip2-opt-2.7b for visual question answering tasks
Text-to-Image
Transformers

B
phucd
29
0
Vilt Finetuned 200
Apache-2.0
This model is a vision-language model based on the ViLT architecture, fine-tuned on VQA datasets, suitable for visual question answering tasks.
Text-to-Image
Transformers

V
MariaK
84
0
Featured Recommended AI Models