# Interleaved Image-Text Processing

Xgen Mm Phi3 Mini Instruct Interleave R V1.5
Apache-2.0
xGen-MM is a series of the latest foundational large multimodal models (LMMs) developed by Salesforce AI Research, building upon the successful design of the BLIP series with foundational enhancements to ensure a more robust and superior model foundation.
Image-to-Text Safetensors English
X
Salesforce
7,373
51
Xgen Mm Phi3 Mini Instruct Singleimg R V1.5
Apache-2.0
xGen-MM is a series of the latest foundational large multimodal models developed by Salesforce AI Research. It is improved based on the successful design of the BLIP series, providing more powerful multimodal processing capabilities.
Image-to-Text Safetensors English
X
Salesforce
313
15
Xgen Mm Phi3 Mini Instruct R V1
xGen-MM is the latest foundational large multimodal model series developed by Salesforce AI Research, based on improvements to the BLIP series, featuring powerful image understanding and text generation capabilities.
Image-to-Text Transformers English
X
Salesforce
804
186
Idefics2 8b
Apache-2.0
Idefics2 is an open-source multimodal model capable of accepting arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.
Image-to-Text Transformers English
I
HuggingFaceM4
14.99k
603
Idefics 9b Instruct
Other
IDEFICS is an open-source reproduction of DeepMind's proprietary visual language model Flamingo. It is a multimodal model that can accept arbitrary sequences of images and text as input and generate text output.
Image-to-Text Transformers English
I
HuggingFaceM4
28.34k
104
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase