H

Heron NVILA Lite 33B

Developed by turing-motors
Heron-NVILA-Lite-33B is a vision-language model based on the NVILA-Lite architecture, specifically trained for Japanese, and supports multimodal tasks in both Japanese and English.
Downloads 99
Release Time : 5/12/2025

Model Overview

This model combines a visual encoder and a large language model, capable of handling image-text interaction tasks, and is particularly optimized for performance in the Japanese environment.

Model Features

Japanese optimization
Specifically trained for the Japanese environment, performs excellently in Japanese vision-language tasks
Multimodal capabilities
Can handle both image and text inputs simultaneously to achieve image-text interaction
High-performance architecture
Combines an advanced visual encoder and a large language model to provide powerful inference capabilities

Model Capabilities

Image description generation
Visual question answering
Multi-round image-text dialogue
Cross-lingual understanding
Image content analysis

Use Cases

Content understanding
Image description generation
Generate detailed text descriptions for the input images
Scored 3.85/5.0 in the Japanese visual question answering 500 test
Customer service
Multi-round image-text dialogue
Supports multi-round dialogue interaction based on images
Scored 4.0/5.0 in the Japanese VLM wild benchmark test
Featured Recommended AI Models
ยฉ 2025AIbase