Mengzi Oscar Base
A Chinese multimodal pretraining model built on the Oscar framework, initialized with Mengzi-Bert base version, trained on 3.7 million image-text pairs.
Downloads 20
Release Time : 3/2/2022
Model Overview
The Mengzi-Oscar model is a Chinese-oriented multimodal pretraining model capable of handling joint understanding tasks of images and text, suitable for scenarios such as image-text matching and visual question answering.
Model Features
Multimodal Pretraining
Capable of processing both image and text information simultaneously to achieve cross-modal understanding
Chinese Optimization
Specifically optimized for Chinese scenarios, using Mengzi-Bert as the base model
Large-scale Training Data
Trained on 3.7 million Chinese image-text pairs, covering a wide range of scenarios
Model Capabilities
Image-text matching
Visual question answering
Cross-modal understanding
Chinese multimodal task processing
Use Cases
Intelligent Customer Service
Image-based Customer Service Q&A
Answer related questions based on images provided by users
Content Moderation
Image-Text Consistency Review
Check if the image content matches the description text
Featured Recommended AI Models
Š 2025AIbase