InternLM-XComposer2-VL-7B-4Bit Open-Source Vision-Language Model - Achieve Image-Text Understanding and Creation for Free

Home

Internlm Xcomposer2 Vl 7b 4bit

Developed by internlm

A vision-language large model based on InternLM2, with outstanding image-text understanding and creation capabilities

Image-to-Text

Transformers

Open Source License:Other #Interleaved Image-Text Creation #Multimodal Understanding #Vision-Language Model

Downloads 1,635

Release Time : 2/6/2024

Model Overview

InternLM-XComposer2-VL is a pre-trained vision-language model using InternLM2 as its large language model foundation, demonstrating excellent performance in multimodal benchmarks

Model Features

Multimodal Understanding and Creation

Possesses outstanding image-text understanding and creation capabilities, supporting free interleaved image-text creation

Quantized Version

Provides a 4-bit quantized version to reduce computational resource requirements

High Performance

Demonstrates excellent performance in multimodal benchmarks

Model Capabilities

Image-Text Understanding

Image-Text Creation

Multimodal Interaction

Text Generation

Use Cases

Content Creation

Image Caption Generation

Generates detailed descriptions based on input images

Produces accurate and detailed image descriptions

Interleaved Image-Text Creation

Supports free interleaved image-text content creation

Creates content rich in both images and text

Visual Question Answering

Image Content Q&A

Answers various questions about image content

Accurately understands image content and answers questions

🚀 InternLM-XComposer2

InternLM-XComposer2 is a vision-language large model (VLLM) based on InternLM2, designed for advanced text-image comprehension and composition.

InternLM-XComposer2

[💻Github Repo](https://github.com/InternLM/InternLM-XComposer) [Paper](https://arxiv.org/abs/2401.16420)

InternLM-XComposer2 is a vision-language large model (VLLM) based on InternLM2 for advanced text-image comprehension and composition.

We release InternLM-XComposer2 series in two versions:

InternLM-XComposer2-VL: The pretrained VLLM model with InternLM2 as the initialization of the LLM, achieving strong performance on various multimodal benchmarks.
InternLM-XComposer2: The finetuned VLLM for Free-from Interleaved Text-Image Composition.

This is the 4-bit version of InternLM-XComposer2-VL, install the latest version of auto_gptq before using.

🚀 Quick Start

We provide a simple example to show how to use InternLM-XComposer with 🤗 Transformers.

Basic Usage

import torch, auto_gptq
from transformers import AutoModel, AutoTokenizer 
from auto_gptq.modeling import BaseGPTQForCausalLM

auto_gptq.modeling._base.SUPPORTED_MODELS = ["internlm"]
torch.set_grad_enabled(False)

class InternLMXComposer2QForCausalLM(BaseGPTQForCausalLM):
    layers_block_name = "model.layers"
    outside_layer_modules = [
        'vit', 'vision_proj', 'model.tok_embeddings', 'model.norm', 'output', 
    ]
    inside_layer_modules = [
        ["attention.wqkv.linear"],
        ["attention.wo.linear"],
        ["feed_forward.w1.linear", "feed_forward.w3.linear"],
        ["feed_forward.w2.linear"],
    ]
 
# init model and tokenizer
model = InternLMXComposer2QForCausalLM.from_quantized(
  'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True, device="cuda:0").eval()
tokenizer = AutoTokenizer.from_pretrained(
  'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True)

text = '<ImageHere>Please describe this image in detail.'
image = 'examples/image1.webp'
with torch.cuda.amp.autocast(): 
  response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False) 
print(response)
#The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regrets." 
#The quote is displayed in white text against a dark background. In the foreground, there are two silhouettes of people standing on a hill at sunset. 
#They appear to be hiking or climbing, as one of them is holding a walking stick. 
#The sky behind them is painted with hues of orange and purple, creating a beautiful contrast with the dark figures.

📄 License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact internlm@pjlab.org.cn.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご