X

Xgen Mm Phi3 Mini Base R V1

Developed by Salesforce
XGen-MM is the latest multimodal large model series developed by Salesforce AI Research. Based on the successful design of BLIP, it achieves a more powerful and superior model architecture through fundamental enhancements.
Downloads 240
Release Time : 5/7/2024

Model Overview

This model is trained on large-scale high-quality image description datasets and interleaved image-text data. It supports image-text-to-text tasks and has strong context learning ability.

Model Features

Powerful pre-trained base model
Achieves state-of-the-art performance with a 5B parameter scale and demonstrates strong context learning ability.
Flexible instruction fine-tuning
The instruction fine-tuned model performs best among open-source/closed-source vision-language models with a 5B parameter scale.
High-resolution image encoding
Supports flexible high-resolution image encoding and efficient visual token sampling.

Model Capabilities

Image description generation
Visual question answering
Multimodal context learning
High-resolution image processing

Use Cases

Image understanding and description
Image content description
Generate a detailed description of the image content
Example output: The dog is sitting on the beach waving to its owner.
Visual question answering
Image-based question answering
Answer natural language questions about the image content
Performs excellently in benchmarks such as OKVQA and TextVQA
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase