I

Internvit 6B 224px

Developed by OpenGVLab
InternViT-6B-224px is a foundational vision model focused on image feature extraction, with 5903 million parameters, supporting image inputs of 224x224 pixels.
Downloads 160
Release Time : 12/22/2023

Model Overview

This model is a foundational vision model primarily used for image feature extraction, suitable for various visual tasks.

Model Features

Large-scale pre-training
The model is pre-trained on multiple large-scale datasets, including LAION-en, LAION-COCO, COYO, etc.
High-performance feature extraction
Demonstrates excellent performance on various image classification tasks, such as IN-1K, IN-ReaL, and other datasets.
Penultimate fourth-layer feature optimization
Using the output of the penultimate fourth block yields the best results for VLLM, making it suitable for building vision-language models.

Model Capabilities

Image feature extraction
Visual task support
Large-scale image processing

Use Cases

Image classification
ImageNet classification
Evaluated via linear probing on the ImageNet-1K dataset.
88.2% accuracy
Vision-language models
VLLM construction
Using features from the penultimate fourth layer to build vision-language models.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase