I

Internvit 6B 448px V1 2

Developed by OpenGVLab
InternViT-6B-448px-V1-2 is a foundational vision model with a feature backbone, comprising 55.4 million parameters, supporting image processing at 448x448 pixels.
Downloads 19
Release Time : 2/11/2024

Model Overview

This model is a foundational vision model primarily used for image feature extraction, supporting high-resolution processing and OCR capabilities.

Model Features

High-resolution processing
Supports high-resolution image processing at 448x448 pixels.
OCR capability
Enhanced OCR capability through additional training, suitable for text recognition tasks.
Parameter optimization
Parameters reduced from 5.9B to 5.5B by discarding the last 3 blocks, saving GPU memory.

Model Capabilities

Image feature extraction
High-resolution image processing
OCR text recognition

Use Cases

Computer vision
Image feature extraction
Used to extract high-dimensional features from images, supporting subsequent vision tasks.
OCR
Text recognition
Recognizes text content in images, suitable for scenarios like document digitization.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase