Mengzi-Oscar-Base: An Open-Source Chinese Multimodal Pretrained Model - Accurately Process Graphical and Textual Information with 3.7 Million Image-Text Pairs

Mengzi Oscar Base

Developed by Langboat

A Chinese multimodal pretraining model built on the Oscar framework, initialized with Mengzi-Bert base version, trained on 3.7 million image-text pairs.

Image-to-Text

Transformers

ChineseOpen Source License:Apache-2.0 #Chinese Multimodal Understanding #Image-Text Pretraining #Lightweight Model

Downloads 20

Release Time : 3/2/2022

Model Overview

The Mengzi-Oscar model is a Chinese-oriented multimodal pretraining model capable of handling joint understanding tasks of images and text, suitable for scenarios such as image-text matching and visual question answering.

Model Features

Multimodal Pretraining

Capable of processing both image and text information simultaneously to achieve cross-modal understanding

Chinese Optimization

Specifically optimized for Chinese scenarios, using Mengzi-Bert as the base model

Large-scale Training Data

Trained on 3.7 million Chinese image-text pairs, covering a wide range of scenarios

Model Capabilities

Image-text matching

Visual question answering

Cross-modal understanding

Chinese multimodal task processing

Use Cases

Intelligent Customer Service

Image-based Customer Service Q&A

Answer related questions based on images provided by users

Content Moderation

Image-Text Consistency Review

Check if the image content matches the description text

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Mengzi Oscar Base

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Mengzi-oscar-base (Chinese Multi-modal pre-training model)

🚀 Quick Start

📦 Installation

💻 Usage Examples

Pretrain & fine-tune

📄 License

📚 Citation