M

Mblip Mt0 Xl

Developed by Gregor
mBLIP is a multilingual vision-language model based on BLIP-2 architecture, supporting image caption generation and visual question answering tasks in 96 languages.
Downloads 374
Release Time : 7/10/2023

Model Overview

mBLIP is a BLIP-2 model composed of Vision Transformer (ViT), Query Transformer (Q-Former) and large language model (LLM), realigned to multilingual LLM (mt0-xl) through multilingual task mixture, supporting image caption generation and visual question answering tasks.

Model Features

Multilingual Support
Supports image understanding and generation tasks in 96 languages
Efficient Alignment
Realigns vision and language components through multilingual task mixture
Zero-shot Capability
Capable of conditional text generation in zero-shot settings

Model Capabilities

Image-to-text
Multilingual image caption generation
Visual question answering
Multilingual understanding

Use Cases

Content Generation
Multilingual Image Captioning
Generate descriptions for images in different languages
Can generate image captions in 96 languages
Q&A Systems
Multilingual Visual QA
Answer questions about image content
Supports Q&A in 96 languages
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase