T

Trillion LLaVA 7B FP16

Developed by trillionlabs
Trillion-LLaVA-7B is a vision-language model with image understanding capabilities, trained on English visual-language instruction pairs, demonstrating exceptional cross-lingual visual reasoning abilities.
Downloads 14
Release Time : 4/20/2025

Model Overview

This model is developed based on Trillion-7B-preview, adopting the same architecture and training strategy as LLaVA, focusing on vision-language understanding tasks, particularly showcasing outstanding performance in Korean visual reasoning tasks.

Model Features

Cross-lingual Visual Reasoning Ability
Trained only with English visual-language pairs, yet performs excellently in Korean visual reasoning tasks
Two-stage Training Strategy
Adopts the same two-stage training method as LLaVA to ensure model performance
Multilingual Foundation
Strong multilingual capabilities enable effective cross-lingual visual reasoning transfer

Model Capabilities

Image Understanding
Visual Question Answering
Cross-lingual Visual Reasoning
Multimodal Understanding

Use Cases

Visual Question Answering Systems
Multilingual Visual Question Answering
Supports answering image-related questions in English and Korean
Achieved a score of 0.61 in the MMBENCH Korean test
Educational Assistance
Multilingual Learning Aid
Helps learners understand different languages through visual content
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase