A

Asagi 14B

Developed by MIL-UT
Asagi-14B is a large-scale Japanese Vision and Language Model (VLM) trained on a wide range of Japanese datasets, integrating diverse data sources.
Downloads 83
Release Time : 1/9/2025

Model Overview

Asagi-14B is a vision-language model combining a visual encoder and a large language model, primarily used for image-text-to-text tasks with support for Japanese.

Model Features

Multimodal Capability
Combines a visual encoder and language model to understand and generate text descriptions based on images.
Japanese Optimization
Specifically trained and optimized for Japanese, suitable for vision and language tasks in Japanese contexts.
Data Synthesis
Most training data is synthesized by models, avoiding the use of LLM outputs with restrictive licenses.

Model Capabilities

Image Caption Generation
Visual Question Answering
Multimodal Reasoning
Japanese Text Generation

Use Cases

Image Understanding and Description
Scene Recognition
Recognizes scenes in images and generates detailed descriptions.
For example, identifies that a photo was taken at a famous location in Tokyo and describes its details.
Object Recognition
Recognizes objects in images and generates descriptions.
For example, identifies buildings, natural landscapes, etc., in the image.
Visual Question Answering
Image Content Q&A
Answers detailed questions about image content.
For example, answers questions about the shooting location of the image or objects within it.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase