InternLM-XComposer2.5開源圖文模型 - 70億參數達GPT-4V水平，支持長上下文創作

首頁

Internlm Xcomposer2d5 7b

由internlm開發

InternLM-XComposer2.5是一款卓越的圖文理解與創作模型，僅以70億參數即可達到GPT-4V級別的能力，支持長上下文窗口擴展。

文本生成圖像

Transformers

開源協議:其他 #長上下文圖文理解 #視頻內容解析 #網頁生成

下載量 1,501

發布時間 : 7/2/2024

模型概述

該模型通過2.4萬張交錯圖文上下文進行訓練，可通過RoPE外推技術擴展至9.6萬長上下文窗口，在需要大量輸入輸出上下文的場景中表現尤為出色。

模型特點

強大的圖文理解能力

僅70億參數即可達到GPT-4V級別的圖文理解能力

長上下文處理

通過RoPE外推技術可擴展至9.6萬長上下文窗口

多模態支持

支持圖像、視頻等多種媒體格式的理解與分析

網頁生成能力

可根據指令、簡歷或截圖生成完整的網頁代碼

模型能力

視頻內容理解

多圖多輪對話

高清圖像解析

指令生成網頁

簡歷轉網頁

截圖轉網頁

使用案例

內容理解

視頻內容分析

分析視頻幀畫面並詳細描述視頻內容

能準確識別視頻中的運動員、比賽場景和關鍵細節

多圖對比分析

對多張圖片進行對比分析並提供建議

能詳細分析不同車輛的優劣勢並提供購買建議

網頁生成

指令生成網頁

根據自然語言指令生成完整網頁代碼

生成符合要求的科研機構官網HTML代碼

簡歷轉網頁

將Markdown格式簡歷轉換為個人網頁

生成美觀的個人簡歷網頁

🚀 InternLM-XComposer-2.5

InternLM-XComposer2.5 在各種文本 - 圖像理解和合成應用中表現出色，僅使用7B的大語言模型（LLM）後端就達到了GPT - 4V的水平能力。IXC2.5使用24K交錯的圖像 - 文本上下文進行訓練，通過RoPE外推可以無縫擴展到96K的長上下文。這種長上下文能力使IXC - 2.5在需要大量輸入和輸出上下文的任務中表現卓越。

InternLM-XComposer-2.5

[💻Github 倉庫](https://github.com/InternLM/InternLM-XComposer) [在線演示](https://huggingface.co/spaces/Willow123/InternLM-XComposer) [論文](https://huggingface.co/papers/2407.03320)

🚀 快速開始

從Transformers導入模型

要使用Transformers加載InternLM - XComposer2 - 4KHD模型，請使用以下代碼：

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2d5-7b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# Set `torch_dtype=torch.floatb16` to load model in bfloat16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
model = model.eval()

💻 使用示例

基礎用法

我們提供了一個簡單的示例，展示如何使用🤗 Transformers調用InternLM - XComposer2.5。

視頻理解

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Here are some frames of a video. Describe this video in detail'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The video opens with a shot of an athlete, dressed in a red and yellow uniform with the word "CHINA" emblazoned across the front, preparing for a race. 
#The athlete, Liu Xiang, is seen in a crouched position, focused and ready, with the Olympic rings visible in the background, indicating the prestigious setting of the Olympic Games. As the race commences, the athletes are seen sprinting towards the hurdles, their determination evident in their powerful strides. 
#The camera captures the intensity of the competition, with the athletes' numbers and times displayed on the screen, providing a real-time update on their performance. The race reaches a climax as Liu Xiang, still in his red and yellow uniform, triumphantly crosses the finish line, his arms raised in victory. 
#The crowd in the stands erupts into cheers, their excitement palpable as they witness the athlete's success. The video concludes with a close-up shot of Liu Xiang, still basking in the glory of his victory, as the Olympic rings continue to symbolize the significance of the event.

query = 'tell me the athlete code of Liu Xiang'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, history=his, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The athlete code of Liu Xiang, as displayed on his uniform in the video, is "1363".

多圖像多輪對話

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Image1 <ImageHere>; Image2 <ImageHere>; Image3 <ImageHere>; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'
image = ['./examples/cars1.jpg',
        './examples/cars2.jpg',
        './examples/cars3.jpg',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#To analyze the advantages and disadvantages of each car, we need to consider factors such as brand reputation, performance, design, cost, and maintenance. 
#1. Mercedes-Benz: - Advantages: Known for its luxury and reliability, the Mercedes-Benz brand offers a high level of comfort, advanced technology, and superior craftsmanship. The vehicle in the image appears to be an SUV, which is versatile for both city driving and off-road conditions. - Disadvantages: Typically, Mercedes-Benz vehicles are more expensive compared to other brands, and they may require more frequent maintenance due to their luxury status. 
#2. Bugatti: - Advantages: Bugatti is renowned for producing some of the fastest and most powerful cars in the world. The vehicle in the image is a sports car, likely offering an exhilarating driving experience with its high-performance engine and advanced aerodynamics. - Disadvantages: Bugatti cars are extremely expensive, making them less accessible to the average consumer. They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature. 
#3. BMW: - Advantages: BMW is a well-respected manufacturer known for its innovative engineering and design. The car in the image appears to be an electric vehicle (EV), which offers advantages such as lower emissions, reduced noise pollution, and potentially lower operating costs. - Disadvantages: EVs like the one in the image may have a shorter range compared to traditional gasoline-powered vehicles, limiting their suitability for long-distance travel. 
#Additionally, the initial cost of purchasing an EV can be higher than that of a conventional car. In conclusion, each car has its own set of advantages and disadvantages. The Mercedes-Benz offers luxury and versatility, the Bugatti provides high performance and exclusivity, and the BMW represents innovation in electric vehicle technology. The choice ultimately depends on the buyer's priorities, such as comfort, performance, environmental impact, and budget.

query = 'Image4 <ImageHere>; How about the car in Image4'
image.append('./examples/cars4.jpg')        
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, history= his, use_meta=True)
print(response)
#The car in Image4 is a red sports car, which appears to be a Ferrari. Ferrari is a renowned Italian manufacturer known for producing some of the most iconic and high-performance sports cars in the world. - Advantages: Ferrari vehicles are synonymous with speed, luxury, and engineering excellence. 
#The car in the image likely offers an exhilarating driving experience with its powerful engine, advanced aerodynamics, and high-quality craftsmanship. The red color adds to the car's aesthetic appeal, making it stand out on the road. - Disadvantages: Ferrari cars are extremely expensive, making them less accessible to the average consumer. 
#They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature. In conclusion, the Ferrari in Image4 represents a pinnacle of automotive engineering and design, offering unmatched performance and luxury. 
#However, its high cost and specialized maintenance requirements make it less practical for everyday use compared to the other vehicles in the images.

高分辨率圖像理解

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Analyze the given image in a detail manner'
image = ['./examples/dubai.png']
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The infographic is a visual representation of various facts about Dubai. It begins with a statement about Palm Jumeirah, highlighting it as the largest artificial island visible from space. It then provides a historical context, noting that in 1968, there were only a few cars in Dubai, contrasting this with the current figure of more than 1.5 million vehicles. 
#The infographic also points out that Dubai has the world's largest Gold Chain, with 7 of the top 10 tallest hotels located there. Additionally, it mentions that the crime rate is near 0%, and the income tax rate is also 0%, with 20% of the world's total cranes operating in Dubai. Furthermore, it states that 17% of the population is Emirati, and 83% are immigrants.
#The Dubai Mall is highlighted as the largest shopping mall in the world, with 1200 stores. The infographic also notes that Dubai has no standard address system, with no zip codes, area codes, or postal services. It mentions that the Burj Khalifa is so tall that its residents on top floors need to wait longer to break fast during Ramadan. 
#The infographic also includes information about Dubai's climate-controlled City, with the Royal Suite at Burj Al Arab costing $24,000 per night. Lastly, it notes that the net worth of the four listed billionaires is roughly equal to the GDP of Honduras.

指令生成網頁

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'A website for Research institutions. The name is Shanghai AI lab. Top Navigation Bar is blue.Below left, an image shows the logo of the lab. In the right, there is a passage of text below that describes the mission of the laboratory.There are several images to show the research projects of Shanghai AI lab.'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_webpage(query, seed=202, task='Instruction-aware Webpage Generation', repetition_penalty=3.0)
print(response)
# see the Instruction-aware Webpage Generation.html

查看指令生成網頁的結果。

簡歷生成網頁

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

## the input should be a resume in markdown format
query = './examples/resume.md'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.resume_2_webpage(query, seed=202, repetition_penalty=3.0)
print(response)

查看簡歷生成網頁的結果。

截圖生成網頁

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = 'Generate the HTML code of this web image with Tailwind CSS.'
image = ['./examples/screenshot.jpg']
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.screen_2_webpage(query, image, seed=202, repetition_penalty=3.0)
print(response)

查看截圖生成網頁的結果。

撰寫文章

import torch
from transformers import AutoModel, AutoTokenizer

torch.set_grad_enabled(False)

# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer

query = '閱讀下面的材料，根據要求寫作。 電影《長安三萬裡》的出現讓人感慨，影片並未將重點全落在大唐風華上，也展現了恢弘氣象的陰暗面，即舊門閥的資源壟斷、朝政的日益衰敗與青年才俊的壯志難酬。高適仕進無門，只能回鄉>沉潛修行。李白雖得玉真公主舉薦，擢入翰林，但他只是成為唐玄宗的御用文人，不能真正實現有益於朝政的志意。然而，片中高潮部分《將進酒》一節，人至中年、掛著肚腩的李白引眾人乘仙鶴上天，一路從水面、瀑布飛昇至銀河進入仙>宮，李白狂奔著與仙人們碰杯，最後大家縱身飛向漩渦般的九重天。肉身的微賤、世路的“天生我材必有用，坎坷，拘不住精神的高蹈。“天生我材必有用，千金散盡還復來。” 古往今來，身處閒頓、遭受挫折、被病痛折磨，很多人都曾經歷>了人生的“失意”，卻反而成就了他們“詩意”的人生。對正在追求人生價值的當代青年來說，如何對待人生中的缺憾和困頓?詩意人生中又有怎樣的自我堅守和自我認同?請結合“失意”與“詩意”這兩個關鍵詞寫一篇文章。 要求:選準角度，確定>立意，明確文體，自擬標題;不要套作，不得抄襲;不得洩露個人信息;不少於 800 字。'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_artical(query, seed=8192)
print(response)
#詩意人生，貴在堅守
#《菜根譚》有云:“閒時要有吃緊的心思,忙裡要留吃閒工夫。”人生在世,總有失意之時,當面對缺憾和困頓,詩意地生活著才能為人生增添一抹亮色。何謂詩意地生活? 所謂詩意地生活，便是在於堅守本心、直面遺憾、超越自我,在失意中尋找人生價值。
#詩意地生活,需堅守本心,淡然處之。
#陶淵明曾執意辭去彭澤縣令,歸隱田園,“採菊東籬下,悠然見南山”,在山水間寄情自娛；王維面對仕途失意,終日沉醉於詩酒之中,“興來每獨往,勝事空自知”,在詩酒中閒逸自如;李白仕途不順,被賜金放還,但他依舊豪氣干雲,“天生我才必有用,千金散盡還復來”,在失意中坦然豁達。堅守本心，便能在遭遇失意之時守住自己的精神家園,讓生活充滿詩意。反之,若不能堅守本心,而只是一味迎合世俗以求得升遷,那縱使身居高位,亦會喪失生活的樂趣。
#詩意地生活,需直面遺憾,超越自我。
#“西塞山前白鷺飛,桃花流水鱖魚肥。青箬笠,綠柳枝,半斤酒,一綸絲。五湖四海皆如此,何妨到此處歸。”白居易的《漁歌子》寫出了多少人的願望:沒有權勢紛擾,沒有貧困淒涼,只有青山綠水、白鷺鷗鳥作伴,如此自由自在的生活令人神往。然而,白居易卻並沒有因此真的歸隱山林,而是直麵人生,超越自我,寫下了一首首詩意而富有現實關懷的作品。如果白居易只顧逃避人生,那又怎會擁有“大弦嘈嘈如急雨,小弦切切如私語”的絕美比喻呢?如果白居易只顧歸隱山林,那又怎會寫出“此曲只應天上有,人間哪得配白居易”這樣的詩句呢?
#詩意地生活,需直面遺憾,堅守本心。
#李文波患有漸凍症,醫生說他活不過五年,但他沒有因此放棄對音樂的熱愛,而是與病魔作鬥爭,演奏出美妙的樂曲;孫家林自幼患有腦癱,但他不甘於命運的捉弄,終成全國最美教師;史鐵生飽受疾病折磨,但他仍能發出“我常常在我的心頭清點,我有什麼?”的叩問,並由此走上文學道路,為後世留下豐厚的文化遺產。這些人沒有逃避,而是選擇直麵人生的缺憾,在堅守本心的同時超越自我,最終實現了自己的價值。
#詩意地生活,是於失意中堅守本心,於缺憾中超越自我。當面對人生的缺憾與挫折,堅守本心、超越自我的同時,也必將書寫屬於自己的輝煌篇章。
#願你我都能詩意地生活著!

query = 'Please write a blog based on the title: French Pastries: A Sweet Indulgence'
with torch.autocast(device_type='cuda', dtype=torch.float16):
    response = model.write_artical(query, seed=8192)
print(response)
#French Pastries: A Sweet Indulgence
#The French are well known for their love of pastries, and it’s a love that is passed down through generations. When one visits France, they are treated to an assortment of baked goods that can range from the delicate macaron to the rich and decadent chocolate mousse. While there are many delicious types of pastries found in France, five stand out as being the most iconic. Each of these pastries has its own unique qualities that make it special.
#1. Croissant
#One of the most famous pastries from France is the croissant. It is a buttery, flaky pastry that is best enjoyed fresh from the bakery. The dough is laminated with butter, giving it its signature layers. Croissants are typically eaten for breakfast or brunch, often accompanied by coffee or hot chocolate.
#2. Macaron
#The macaron is a small, delicate French confection made from almond flour, powdered sugar, and egg whites. The macaron itself is sandwiched with a ganache or jam filling. They come in a variety of colors and flavors, making them a popular choice for both casual snacking and upscale desserts.
#3. Madeleine
#The madeleine is a small shell-shaped cake that is light and sponge-like. It is often flavored with lemon or orange zest and sometimes dipped in chocolate. Madeleines are perfect for an afternoon snack with tea or coffee.
#4. Éclair
#The éclair is a long, thin pastry filled with cream and topped with chocolate glaze. It is a classic French treat that is both sweet and satisfying. Éclairs can be found in bakeries all over France and are often enjoyed with a cup of hot chocolate.
#5. Tarte Tatin
#The tarte Tatin is an apple tart that is known for its caramelized apples and puff pastry crust. It is named after the Tatin sisters who created the recipe in the late 19th century. Tarte Tatin is best served warm with a scoop of vanilla ice cream.
#These pastries are just a few of the many delicious treats that France has to offer. Whether you are a seasoned traveler or a first-time visitor, indulging in French pastries is a must-do activity. So go ahead, treat yourself—you deserve it!