git-base-textvqa Open-Source Visual Question Answering Model - Exceptionally Skilled in Handling Question Answering Tasks with Textual Images

Git Base Textvqa

Developed by Hellraiser24

A visual question answering model fine-tuned on the textvqa dataset based on microsoft/git-base-textvqa, excelling at handling image-based question answering tasks involving text

Large Language Model

Transformers

OtherOpen Source License:MIT #Text Visual Question Answering #Image Text Understanding #Multimodal Learning

Downloads 19

Release Time : 6/4/2023

Model Overview

This model is a fine-tuned version of the GIT architecture on the TextVQA dataset, specifically designed for visual question answering tasks that require understanding both images and their textual content

Model Features

Joint Text-Image Understanding

Capable of processing both visual information and textual content in images simultaneously

End-to-End Training

Uses a unified Transformer architecture for end-to-end training

Efficient Fine-tuning

Demonstrates good fine-tuning performance on the TextVQA dataset

Model Capabilities

Text recognition in images

Image-text based question answering

Multimodal understanding

Vision-language joint reasoning

Use Cases

Intelligent Assistance

Scene Text Question Answering

Answering questions about text content appearing in images

Achieved a loss value of 0.0472 on the TextVQA evaluation set

Accessibility Technology

Image Text Description

Describing text content in images for visually impaired individuals

Property	Details
Model Type	git - base - textvqa
Training Data	textvqa
License	MIT

Training Loss	Epoch	Step	Validation Loss
0.9764	0.2	500	0.0499
0.0524	0.4	1000	0.0492
0.0525	0.6	1500	0.0494
0.0531	0.8	2000	0.0480
0.0515	1.0	2500	0.0477
0.0473	1.2	3000	0.0483
0.0479	1.4	3500	0.0477
0.0473	1.6	4000	0.0476
0.0486	1.8	4500	0.0472
0.0471	2.0	5000	0.0473
0.0454	2.2	5500	0.0473
0.0452	2.4	6000	0.0476
0.0438	2.6	6500	0.0475
0.0463	2.8	7000	0.0474
0.0449	3.0	7500	0.0472

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Git Base Textvqa

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 git-base-textvqa

🚀 Quick Start

📦 Information

📚 Documentation

Training procedure

Training hyperparameters

Training results

Framework versions

📄 License