T

Tiny Llava V1 Hf

Developed by bczhou
TinyLLaVA is a compact large-scale multimodal model framework focused on vision-language tasks, featuring small parameter size yet excellent performance.
Downloads 2,372
Release Time : 1/11/2024

Model Overview

TinyLLaVA is an efficient multimodal model capable of handling image-to-text generation tasks, supporting both English and Chinese, with outstanding performance across multiple benchmarks.

Model Features

High-Performance Small-Scale Model
The 3.1B-parameter TinyLLaVA outperforms 7B-parameter models like LLaVA-1.5 and Qwen-VL in performance
Multimodal Capabilities
Supports image understanding and text generation, capable of handling complex vision-language tasks
Efficient Inference
Small parameter size enables faster inference speed and lower resource consumption

Model Capabilities

Image understanding
Visual question answering
Image caption generation
Multimodal reasoning

Use Cases

Visual Question Answering
Image content Q&A
Answer various questions about image content
Achieves 79.9% accuracy on VQA-v2 dataset
Image Captioning
Automatic image annotation
Generate detailed descriptive text for images
Scores 75.8 on LLaVA-Bench-Wild
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase