Heron-NVILA-Lite-15B is a vision-language model based on the NVILA-Lite architecture, specifically trained for Japanese, supporting both Japanese and English with image-text understanding and generation capabilities.
Image-to-Text
Safetensors Supports Multiple Languages