Model Selection

Interleaved Image-Text Processing

# Interleaved Image-Text Processing

Xgen Mm Phi3 Mini Instruct Interleave R V1.5

xGen-MM is a series of the latest foundational large multimodal models (LMMs) developed by Salesforce AI Research, building upon the successful design of the BLIP series with foundational enhancements to ensure a more robust and superior model foundation.

Safetensors English

Xgen Mm Phi3 Mini Instruct Singleimg R V1.5

xGen-MM is a series of the latest foundational large multimodal models developed by Salesforce AI Research. It is improved based on the successful design of the BLIP series, providing more powerful multimodal processing capabilities.

Safetensors English

Xgen Mm Phi3 Mini Instruct R V1

xGen-MM is the latest foundational large multimodal model series developed by Salesforce AI Research, based on improvements to the BLIP series, featuring powerful image understanding and text generation capabilities.

Transformers English

Idefics2 is an open-source multimodal model capable of accepting arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.

Transformers English

Idefics 9b Instruct

IDEFICS is an open-source reproduction of DeepMind's proprietary visual language model Flamingo. It is a multimodal model that can accept arbitrary sequences of images and text as input and generate text output.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase