X

Xgen Mm Phi3 Mini Instruct Dpo R V1.5

Developed by Salesforce
xGen-MM is a series of multimodal foundation models developed by Salesforce AI Research, improved based on the BLIP series, and trained on high-quality image captions and interleaved image-text data.
Downloads 305
Release Time : 8/9/2024

Model Overview

This model is the DPO (Direct Preference Optimization) version of the xGen-MM series, focusing on enhancing multimodal understanding capabilities and safety, suitable for image-text generation and interactive tasks.

Model Features

Multimodal Understanding
Performs excellently in single-image and multi-image benchmarks, supporting complex multimodal interactive tasks.
Safety Optimization
Significantly reduces the probability of harmful content generation through DPO training (VLGuard score of 5.2, outperforming benchmark models).
Comprehensive Performance
Surpasses peer models in multiple benchmarks such as POPE, MMBench, and SEED-IMG.

Model Capabilities

Image Caption Generation
Multi-image Reasoning
Safe Content Filtering
Visual Question Answering
Cross-modal Understanding

Use Cases

Content Moderation
Harmful Content Detection
Automatically identifies potential harmful content in images and text
VLGuard score of 5.2 (lower is better)
Education
Multimodal Learning Assistant
Parses and explains image-text content in educational materials
MMBench development set score of 76.4
Featured Recommended AI Models
ยฉ 2025AIbase