Trillion LLaVA 7B FP16
T
Trillion LLaVA 7B FP16
Developed by trillionlabs
Trillion-LLaVA-7B is a vision-language model with image understanding capabilities, trained on English visual-language instruction pairs, demonstrating exceptional cross-lingual visual reasoning abilities.
Downloads 14
Release Time : 4/20/2025
Model Overview
This model is developed based on Trillion-7B-preview, adopting the same architecture and training strategy as LLaVA, focusing on vision-language understanding tasks, particularly showcasing outstanding performance in Korean visual reasoning tasks.
Model Features
Cross-lingual Visual Reasoning Ability
Trained only with English visual-language pairs, yet performs excellently in Korean visual reasoning tasks
Two-stage Training Strategy
Adopts the same two-stage training method as LLaVA to ensure model performance
Multilingual Foundation
Strong multilingual capabilities enable effective cross-lingual visual reasoning transfer
Model Capabilities
Image Understanding
Visual Question Answering
Cross-lingual Visual Reasoning
Multimodal Understanding
Use Cases
Visual Question Answering Systems
Multilingual Visual Question Answering
Supports answering image-related questions in English and Korean
Achieved a score of 0.61 in the MMBENCH Korean test
Educational Assistance
Multilingual Learning Aid
Helps learners understand different languages through visual content
Featured Recommended AI Models