Y

Yi VL 6B

Developed by 01-ai
Yi-VL is an open-source multimodal vision-language model developed by 01.AI, supporting Chinese-English image-text dialogue and demonstrating excellent performance on MMMU and CMMMU benchmarks.
Downloads 336
Release Time : 12/25/2023

Model Overview

A multimodal version developed based on the Yi large language model series, capable of understanding image content and conducting multi-turn dialogues, supporting 448×448 high-resolution image understanding.

Model Features

Bilingual multimodal understanding
Supports both Chinese and English image-text dialogue capabilities, including text recognition in images
High-resolution image processing
Supports 448×448 high-resolution image understanding
Three-stage training process
Optimizes the fusion of visual and language features through progressive training strategies
Open-source for commercial use
Fully open for academic research and free for commercial use, automatically licensed upon application

Model Capabilities

Visual question answering
Image content understanding
Multi-turn image-text dialogue
Chinese-English bilingual processing
Image text recognition

Use Cases

Education
Multidisciplinary visual question answering
Answering image-related questions across multiple disciplines
Ranked first in MMMU and CMMMU benchmarks
Content analysis
Image information extraction
Extracting, organizing, and summarizing information from images
Capable of recognizing complex visual details
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase