InternLM-XComposer2-VL-1_8b Open-Source Vision-Language Model - Efficient Image-Text Understanding and Creation

Home

Internlm Xcomposer2 Vl 1 8b

Developed by internlm

A vision-language large model based on InternLM2 with outstanding image-text understanding and creation capabilities

Text-to-Image

Transformers

Open Source License:Other #Image-text understanding and creation #Multimodal large model #Vision-language interaction

Downloads 169

Release Time : 4/9/2024

Model Overview

InternLM-XComposer2 is a vision-language large model (VLLM) based on InternLM2, excelling in multiple multimodal benchmarks with image-text understanding and creation capabilities.

Model Features

Multimodal understanding capability

Capable of processing and understanding both image and text information simultaneously

Image-text creation capability

Supports free-form interleaved image-text creation tasks

High-performance

Outstanding performance in multiple multimodal benchmarks

Model Capabilities

Image understanding

Visual question answering

Image-text description generation

Multimodal content creation

Use Cases

Content creation

Image-text content generation

Generate detailed descriptions or create related textual content based on images

Examples demonstrate the model's ability to accurately describe image content and interpret text information within images

Visual question answering

Image understanding and analysis

Answer various questions about image content

🚀 InternLM-XComposer2

InternLM-XComposer2 is a vision - language large model (VLLM) based on InternLM2. It excels in advanced text - image comprehension and composition, offering powerful capabilities for multimodal tasks.

InternLM-XComposer2

[💻Github Repo](https://github.com/InternLM/InternLM-XComposer) [Paper](https://arxiv.org/abs/2401.16420)

We release the InternLM-XComposer2 series in two versions:

InternLM-XComposer2-VL: A pretrained VLLM model initialized with InternLM2, delivering strong performance on various multimodal benchmarks.
InternLM-XComposer2: A finetuned VLLM for Free - from Interleaved Text - Image Composition.

🚀 Quick Start

We provide a simple example to show how to use InternLM-XComposer with 🤗 Transformers.

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2-vl-1_8b', trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2-vl-1_8b', trust_remote_code=True)

query = '<ImageHere>Please describe this image in detail.'
image = './image1.webp'
with torch.cuda.amp.autocast():
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)
print(response)
# The image is a captivating photograph of a sunset over a mountainous landscape. The sky, painted in hues of orange and pink,
# serves as a backdrop for two silhouetted figures standing on the mountain. The text on the image, written in white, is a quote 
# from Oscar Wilde, which reads, "Live life with no excuses, travel with no regret." This quote, combined with the serene setting,
# serves as a powerful reminder to embrace life's journey without hesitation or regret.

💻 Usage Examples

Basic Usage

To load the InternLM-XComposer2-VL-1.8B model using Transformers, use the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2-vl-1_8b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda()
model = model.eval()

📄 License

The code is licensed under Apache 2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact internlm@pjlab.org.cn.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご