A vision-language model trained based on Bllossom/llama-3.2-Korean-Bllossom-AICA-5B, supporting Korean and English, specializing in image-to-text and text classification tasks in the fashion domain.
Image-to-Text
Transformers Supports Multiple Languages