Finetuned Vit Image Text Classifier
An image classification model based on the ViT architecture, designed to identify whether an image contains text and the type of text (Latin, Chinese, Arabic)
Downloads 45
Release Time : 2/8/2023
Model Overview
This model is a fine-tuned image classifier based on google/vit-base-patch16-224-in21k, specifically designed for document text classification tasks. It can identify text types (Latin, Chinese, Arabic) and non-text images.
Model Features
High-Accuracy Text Classification
Achieves 90.3% accuracy on the test set, effectively distinguishing between different text types.
ViT-Based Architecture
Utilizes the Vision Transformer architecture with powerful image feature extraction capabilities.
Multi-Category Recognition
Can simultaneously identify Latin, Chinese, and Arabic text types as well as non-text images.
Model Capabilities
Image Classification
Text Type Recognition
Document Image Analysis
Use Cases
Document Processing
Multilingual Document Classification
Automatically classify scanned documents containing different language texts.
Accurately distinguishes between Latin, Chinese, and Arabic documents.
Image Content Filtering
Filter images containing specific language texts from a collection.
OCR Preprocessing
OCR Language Identification
Identify the text type in documents before OCR processing.
Improves the accuracy of subsequent OCR processing.
Featured Recommended AI Models
Š 2025AIbase