X

Xgen Mm Phi3 Mini Instruct R V1

Developed by Salesforce
xGen-MM is the latest foundational large multimodal model series developed by Salesforce AI Research, based on improvements to the BLIP series, featuring powerful image understanding and text generation capabilities.
Downloads 804
Release Time : 5/6/2024

Model Overview

xGen-MM is a large multimodal model series developed by Salesforce AI Research, supporting joint processing of images and text, suitable for various vision-language tasks.

Model Features

Powerful Multimodal Capabilities
Supports joint processing of images and text, with exceptional image understanding and text generation capabilities.
Efficient Visual Token Sampling
Supports flexible high-resolution image encoding with efficient visual token sampling capabilities.
Contextual Learning Ability
The pre-trained foundational model demonstrates strong contextual learning capabilities.

Model Capabilities

Image Caption Generation
Visual Question Answering
Multimodal Reasoning
Joint Image-Text Processing

Use Cases

Visual Question Answering
Image Content Question Answering
Answer natural language questions about image content.
Performs excellently on multiple benchmarks.
Image Caption Generation
Automatic Image Annotation
Generate detailed natural language descriptions for images.
Performs excellently on datasets like COCO.
Featured Recommended AI Models
ยฉ 2025AIbase