I

Internvl 14B 224px

Developed by OpenGVLab
InternVL-14B-224px is a 14B-parameter vision-language foundation model supporting various vision-language tasks.
Downloads 521
Release Time : 12/22/2023

Model Overview

This model is a powerful vision-language foundation model supporting zero-shot image/video classification, text-image/video retrieval, image captioning, and more.

Model Features

Multi-task Support
Supports various vision-language tasks including zero-shot image/video classification, text-image/video retrieval, and image captioning.
Multilingual Support
Capable of processing text inputs in multiple languages such as English, Chinese, and Japanese.
High Performance
Excels in multiple benchmarks with strong zero-shot capabilities.

Model Capabilities

Zero-shot image classification
Zero-shot video classification
Text-image retrieval
Video retrieval
Image captioning

Use Cases

Content Understanding
Image Classification
Classify images without fine-tuning
Performs well on multiple datasets
Image Captioning
Generate natural language descriptions for input images
Produces accurate and fluent descriptions
Information Retrieval
Cross-modal Retrieval
Retrieve relevant images or videos based on text queries
High retrieval accuracy
Featured Recommended AI Models
ยฉ 2025AIbase