T

Trillion LLaVA 7B

Developed by trillionlabs
Trillion-LLaVA-7B is a vision-language model (VLM) capable of understanding images, developed based on the Trillion-7B-preview foundation model.
Downloads 199
Release Time : 4/20/2025

Model Overview

This model is a vision-language model that can understand and process tasks combining images and text, excelling particularly in cross-lingual visual reasoning.

Model Features

Cross-lingual Visual Reasoning Ability
Despite being trained only with English vision-language instruction pairs, the model performs excellently in Korean visual reasoning tasks.
Multilingual Foundation
The model's strong multilingual foundation enables effective cross-lingual transfer of visual reasoning capabilities without requiring language-specific vision training data.
Two-stage Training Strategy
Adopts the same dataset and two-stage training strategy as LLaVA, ensuring model performance stability and reliability.

Model Capabilities

Image Understanding
Visual Question Answering
Multilingual Visual Reasoning

Use Cases

Visual Question Answering
Multilingual Visual Question Answering
Provides accurate answers in English and Korean visual question answering tasks.
Performs excellently in benchmarks such as MMBENCH, SEED-I, MMStar, and K-DTCB.
Cross-lingual Visual Reasoning
Korean Visual Reasoning
Despite being trained only with English data, the model performs excellently in Korean visual reasoning tasks.
Scores 0.61 in MMBENCH Korean tests, outperforming other comparative models.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase