M

Mblip Bloomz 7b

Developed by Gregor
mBLIP is a multilingual vision-language model based on the BLIP-2 architecture, supporting image caption generation and visual question answering tasks in 96 languages.
Downloads 21
Release Time : 9/21/2023

Model Overview

mBLIP is an efficient vision-language model composed of a Vision Transformer (ViT), Query Transformer (Q-Former), and a large language model (BLOOMZ-7B), supporting multilingual image understanding and generation tasks.

Model Features

Multilingual Support
Supports image understanding and generation tasks in 96 languages
Efficient Alignment
Aligns visual components with multilingual LLM through multilingual task mixing
Versatile Applications
Supports various tasks such as image caption generation and visual question answering
Flexible Deployment
Supports full-precision, half-precision, and low-precision (8-bit/4-bit) inference

Model Capabilities

Multilingual image caption generation
Multilingual visual question answering
Cross-modal understanding
Multilingual text generation

Use Cases

Content Generation
Multilingual Image Captioning
Generate descriptive text for images in different languages
Can generate accurate image captions in 96 languages
Education
Multilingual Visual Question Answering
Answer questions about image content in different languages
Supports visual question answering in 96 languages
Featured Recommended AI Models
ยฉ 2025AIbase