L

Llava Jp 1.3b V1.1

Developed by toshi456
LLaVA-JP is a multimodal vision-language model that supports Japanese, capable of understanding and generating descriptions and dialogues about input images.
Downloads 90
Release Time : 4/17/2024

Model Overview

This model combines a visual encoder and a text decoder, supports high-resolution image input, and is specifically optimized for Japanese visual language tasks.

Model Features

High-Resolution Support
Supports 768x768 high-resolution image input through scaling_on_scales technology
Japanese Optimization
Specifically trained and optimized for Japanese visual language tasks
Two-Stage Training
Pre-trains the visual projector first, followed by instruction fine-tuning

Model Capabilities

Image understanding
Japanese image caption generation
Japanese visual question answering
Multimodal dialogue

Use Cases

Assistive Technology
Visual Assistance
Provides image content descriptions for visually impaired individuals
Content Analysis
Social Media Analysis
Automatically analyzes social media image content and generates descriptions
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase