# High-resolution visual feature extraction
Internvit 300M 448px
MIT
InternViT-300M-448px is an efficient vision foundation model developed through knowledge distillation from InternViT-6B-448px-V1-5, featuring dynamic input resolution of 448×448 and supporting 1 to 40 patch processing.
Text-to-Image
Transformers

I
OpenGVLab
7,506
57
Internvit 6B 448px V1 2
MIT
InternViT-6B-448px-V1-2 is a foundational vision model with a feature backbone, comprising 55.4 million parameters, supporting image processing at 448x448 pixels.
Text-to-Image
Transformers

I
OpenGVLab
19
27
Featured Recommended AI Models