Internlm Xcomposer2d5 7b
模型概述
該模型通過2.4萬張交錯圖文上下文進行訓練,可通過RoPE外推技術擴展至9.6萬長上下文窗口,在需要大量輸入輸出上下文的場景中表現尤為出色。
模型特點
強大的圖文理解能力
僅70億參數即可達到GPT-4V級別的圖文理解能力
長上下文處理
通過RoPE外推技術可擴展至9.6萬長上下文窗口
多模態支持
支持圖像、視頻等多種媒體格式的理解與分析
網頁生成能力
可根據指令、簡歷或截圖生成完整的網頁代碼
模型能力
視頻內容理解
多圖多輪對話
高清圖像解析
指令生成網頁
簡歷轉網頁
截圖轉網頁
使用案例
內容理解
視頻內容分析
分析視頻幀畫面並詳細描述視頻內容
能準確識別視頻中的運動員、比賽場景和關鍵細節
多圖對比分析
對多張圖片進行對比分析並提供建議
能詳細分析不同車輛的優劣勢並提供購買建議
網頁生成
指令生成網頁
根據自然語言指令生成完整網頁代碼
生成符合要求的科研機構官網HTML代碼
簡歷轉網頁
將Markdown格式簡歷轉換為個人網頁
生成美觀的個人簡歷網頁
🚀 InternLM-XComposer-2.5
InternLM-XComposer2.5 在各種文本 - 圖像理解和合成應用中表現出色,僅使用7B的大語言模型(LLM)後端就達到了GPT - 4V的水平能力。IXC2.5使用24K交錯的圖像 - 文本上下文進行訓練,通過RoPE外推可以無縫擴展到96K的長上下文。這種長上下文能力使IXC - 2.5在需要大量輸入和輸出上下文的任務中表現卓越。
InternLM-XComposer-2.5
[💻Github 倉庫](https://github.com/InternLM/InternLM-XComposer)
[在線演示](https://huggingface.co/spaces/Willow123/InternLM-XComposer)
[論文](https://huggingface.co/papers/2407.03320)
🚀 快速開始
從Transformers導入模型
要使用Transformers加載InternLM - XComposer2 - 4KHD模型,請使用以下代碼:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
ckpt_path = "internlm/internlm-xcomposer2d5-7b"
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda()
# Set `torch_dtype=torch.floatb16` to load model in bfloat16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
model = model.eval()
💻 使用示例
基礎用法
我們提供了一個簡單的示例,展示如何使用🤗 Transformers調用InternLM - XComposer2.5。
視頻理解
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = 'Here are some frames of a video. Describe this video in detail'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The video opens with a shot of an athlete, dressed in a red and yellow uniform with the word "CHINA" emblazoned across the front, preparing for a race.
#The athlete, Liu Xiang, is seen in a crouched position, focused and ready, with the Olympic rings visible in the background, indicating the prestigious setting of the Olympic Games. As the race commences, the athletes are seen sprinting towards the hurdles, their determination evident in their powerful strides.
#The camera captures the intensity of the competition, with the athletes' numbers and times displayed on the screen, providing a real-time update on their performance. The race reaches a climax as Liu Xiang, still in his red and yellow uniform, triumphantly crosses the finish line, his arms raised in victory.
#The crowd in the stands erupts into cheers, their excitement palpable as they witness the athlete's success. The video concludes with a close-up shot of Liu Xiang, still basking in the glory of his victory, as the Olympic rings continue to symbolize the significance of the event.
query = 'tell me the athlete code of Liu Xiang'
image = ['./examples/liuxiang.mp4',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, _ = model.chat(tokenizer, query, image, history=his, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The athlete code of Liu Xiang, as displayed on his uniform in the video, is "1363".
多圖像多輪對話
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = 'Image1 <ImageHere>; Image2 <ImageHere>; Image3 <ImageHere>; I want to buy a car from the three given cars, analyze their advantages and weaknesses one by one'
image = ['./examples/cars1.jpg',
'./examples/cars2.jpg',
'./examples/cars3.jpg',]
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, his = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#To analyze the advantages and disadvantages of each car, we need to consider factors such as brand reputation, performance, design, cost, and maintenance.
#1. Mercedes-Benz: - Advantages: Known for its luxury and reliability, the Mercedes-Benz brand offers a high level of comfort, advanced technology, and superior craftsmanship. The vehicle in the image appears to be an SUV, which is versatile for both city driving and off-road conditions. - Disadvantages: Typically, Mercedes-Benz vehicles are more expensive compared to other brands, and they may require more frequent maintenance due to their luxury status.
#2. Bugatti: - Advantages: Bugatti is renowned for producing some of the fastest and most powerful cars in the world. The vehicle in the image is a sports car, likely offering an exhilarating driving experience with its high-performance engine and advanced aerodynamics. - Disadvantages: Bugatti cars are extremely expensive, making them less accessible to the average consumer. They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature.
#3. BMW: - Advantages: BMW is a well-respected manufacturer known for its innovative engineering and design. The car in the image appears to be an electric vehicle (EV), which offers advantages such as lower emissions, reduced noise pollution, and potentially lower operating costs. - Disadvantages: EVs like the one in the image may have a shorter range compared to traditional gasoline-powered vehicles, limiting their suitability for long-distance travel.
#Additionally, the initial cost of purchasing an EV can be higher than that of a conventional car. In conclusion, each car has its own set of advantages and disadvantages. The Mercedes-Benz offers luxury and versatility, the Bugatti provides high performance and exclusivity, and the BMW represents innovation in electric vehicle technology. The choice ultimately depends on the buyer's priorities, such as comfort, performance, environmental impact, and budget.
query = 'Image4 <ImageHere>; How about the car in Image4'
image.append('./examples/cars4.jpg')
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, history= his, use_meta=True)
print(response)
#The car in Image4 is a red sports car, which appears to be a Ferrari. Ferrari is a renowned Italian manufacturer known for producing some of the most iconic and high-performance sports cars in the world. - Advantages: Ferrari vehicles are synonymous with speed, luxury, and engineering excellence.
#The car in the image likely offers an exhilarating driving experience with its powerful engine, advanced aerodynamics, and high-quality craftsmanship. The red color adds to the car's aesthetic appeal, making it stand out on the road. - Disadvantages: Ferrari cars are extremely expensive, making them less accessible to the average consumer.
#They also require specialized knowledge for maintenance and may not be suitable for everyday driving due to their high-performance nature. In conclusion, the Ferrari in Image4 represents a pinnacle of automotive engineering and design, offering unmatched performance and luxury.
#However, its high cost and specialized maintenance requirements make it less practical for everyday use compared to the other vehicles in the images.
高分辨率圖像理解
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = 'Analyze the given image in a detail manner'
image = ['./examples/dubai.png']
with torch.autocast(device_type='cuda', dtype=torch.float16):
response, _ = model.chat(tokenizer, query, image, do_sample=False, num_beams=3, use_meta=True)
print(response)
#The infographic is a visual representation of various facts about Dubai. It begins with a statement about Palm Jumeirah, highlighting it as the largest artificial island visible from space. It then provides a historical context, noting that in 1968, there were only a few cars in Dubai, contrasting this with the current figure of more than 1.5 million vehicles.
#The infographic also points out that Dubai has the world's largest Gold Chain, with 7 of the top 10 tallest hotels located there. Additionally, it mentions that the crime rate is near 0%, and the income tax rate is also 0%, with 20% of the world's total cranes operating in Dubai. Furthermore, it states that 17% of the population is Emirati, and 83% are immigrants.
#The Dubai Mall is highlighted as the largest shopping mall in the world, with 1200 stores. The infographic also notes that Dubai has no standard address system, with no zip codes, area codes, or postal services. It mentions that the Burj Khalifa is so tall that its residents on top floors need to wait longer to break fast during Ramadan.
#The infographic also includes information about Dubai's climate-controlled City, with the Royal Suite at Burj Al Arab costing $24,000 per night. Lastly, it notes that the net worth of the four listed billionaires is roughly equal to the GDP of Honduras.
指令生成網頁
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = 'A website for Research institutions. The name is Shanghai AI lab. Top Navigation Bar is blue.Below left, an image shows the logo of the lab. In the right, there is a passage of text below that describes the mission of the laboratory.There are several images to show the research projects of Shanghai AI lab.'
with torch.autocast(device_type='cuda', dtype=torch.float16):
response = model.write_webpage(query, seed=202, task='Instruction-aware Webpage Generation', repetition_penalty=3.0)
print(response)
# see the Instruction-aware Webpage Generation.html
查看指令生成網頁的結果。
簡歷生成網頁
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
## the input should be a resume in markdown format
query = './examples/resume.md'
with torch.autocast(device_type='cuda', dtype=torch.float16):
response = model.resume_2_webpage(query, seed=202, repetition_penalty=3.0)
print(response)
查看簡歷生成網頁的結果。
截圖生成網頁
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = 'Generate the HTML code of this web image with Tailwind CSS.'
image = ['./examples/screenshot.jpg']
with torch.autocast(device_type='cuda', dtype=torch.float16):
response = model.screen_2_webpage(query, image, seed=202, repetition_penalty=3.0)
print(response)
查看截圖生成網頁的結果。
撰寫文章
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('internlm/internlm-xcomposer2d5-7b', torch_dtype=torch.bfloat16, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('internlm/internlm-xcomposer2d5-7b', trust_remote_code=True)
model.tokenizer = tokenizer
query = '閱讀下面的材料,根據要求寫作。 電影《長安三萬裡》的出現讓人感慨,影片並未將重點全落在大唐風華上,也展現了恢弘氣象的陰暗面,即舊門閥的資源壟斷、朝政的日益衰敗與青年才俊的壯志難酬。高適仕進無門,只能回鄉>沉潛修行。李白雖得玉真公主舉薦,擢入翰林,但他只是成為唐玄宗的御用文人,不能真正實現有益於朝政的志意。然而,片中高潮部分《將進酒》一節,人至中年、掛著肚腩的李白引眾人乘仙鶴上天,一路從水面、瀑布飛昇至銀河進入仙>宮,李白狂奔著與仙人們碰杯,最後大家縱身飛向漩渦般的九重天。肉身的微賤、世路的“天生我材必有用,坎坷,拘不住精神的高蹈。“天生我材必有用,千金散盡還復來。” 古往今來,身處閒頓、遭受挫折、被病痛折磨,很多人都曾經歷>了人生的“失意”,卻反而成就了他們“詩意”的人生。對正在追求人生價值的當代青年來說,如何對待人生中的缺憾和困頓?詩意人生中又有怎樣的自我堅守和自我認同?請結合“失意”與“詩意”這兩個關鍵詞寫一篇文章。 要求:選準角度,確定>立意,明確文體,自擬標題;不要套作,不得抄襲;不得洩露個人信息;不少於 800 字。'
with torch.autocast(device_type='cuda', dtype=torch.float16):
response = model.write_artical(query, seed=8192)
print(response)
#詩意人生,貴在堅守
#《菜根譚》有云:“閒時要有吃緊的心思,忙裡要留吃閒工夫。”人生在世,總有失意之時,當面對缺憾和困頓,詩意地生活著才能為人生增添一抹亮色。何謂詩意地生活? 所謂詩意地生活,便是在於堅守本心、直面遺憾、超越自我,在失意中尋找人生價值。
#詩意地生活,需堅守本心,淡然處之。
#陶淵明曾執意辭去彭澤縣令,歸隱田園,“採菊東籬下,悠然見南山”,在山水間寄情自娛;王維面對仕途失意,終日沉醉於詩酒之中,“興來每獨往,勝事空自知”,在詩酒中閒逸自如;李白仕途不順,被賜金放還,但他依舊豪氣干雲,“天生我才必有用,千金散盡還復來”,在失意中坦然豁達。堅守本心,便能在遭遇失意之時守住自己的精神家園,讓生活充滿詩意。反之,若不能堅守本心,而只是一味迎合世俗以求得升遷,那縱使身居高位,亦會喪失生活的樂趣。
#詩意地生活,需直面遺憾,超越自我。
#“西塞山前白鷺飛,桃花流水鱖魚肥。青箬笠,綠柳枝,半斤酒,一綸絲。五湖四海皆如此,何妨到此處歸。”白居易的《漁歌子》寫出了多少人的願望:沒有權勢紛擾,沒有貧困淒涼,只有青山綠水、白鷺鷗鳥作伴,如此自由自在的生活令人神往。然而,白居易卻並沒有因此真的歸隱山林,而是直麵人生,超越自我,寫下了一首首詩意而富有現實關懷的作品。如果白居易只顧逃避人生,那又怎會擁有“大弦嘈嘈如急雨,小弦切切如私語”的絕美比喻呢?如果白居易只顧歸隱山林,那又怎會寫出“此曲只應天上有,人間哪得配白居易”這樣的詩句呢?
#詩意地生活,需直面遺憾,堅守本心。
#李文波患有漸凍症,醫生說他活不過五年,但他沒有因此放棄對音樂的熱愛,而是與病魔作鬥爭,演奏出美妙的樂曲;孫家林自幼患有腦癱,但他不甘於命運的捉弄,終成全國最美教師;史鐵生飽受疾病折磨,但他仍能發出“我常常在我的心頭清點,我有什麼?”的叩問,並由此走上文學道路,為後世留下豐厚的文化遺產。這些人沒有逃避,而是選擇直麵人生的缺憾,在堅守本心的同時超越自我,最終實現了自己的價值。
#詩意地生活,是於失意中堅守本心,於缺憾中超越自我。當面對人生的缺憾與挫折,堅守本心、超越自我的同時,也必將書寫屬於自己的輝煌篇章。
#願你我都能詩意地生活著!
query = 'Please write a blog based on the title: French Pastries: A Sweet Indulgence'
with torch.autocast(device_type='cuda', dtype=torch.float16):
response = model.write_artical(query, seed=8192)
print(response)
#French Pastries: A Sweet Indulgence
#The French are well known for their love of pastries, and it’s a love that is passed down through generations. When one visits France, they are treated to an assortment of baked goods that can range from the delicate macaron to the rich and decadent chocolate mousse. While there are many delicious types of pastries found in France, five stand out as being the most iconic. Each of these pastries has its own unique qualities that make it special.
#1. Croissant
#One of the most famous pastries from France is the croissant. It is a buttery, flaky pastry that is best enjoyed fresh from the bakery. The dough is laminated with butter, giving it its signature layers. Croissants are typically eaten for breakfast or brunch, often accompanied by coffee or hot chocolate.
#2. Macaron
#The macaron is a small, delicate French confection made from almond flour, powdered sugar, and egg whites. The macaron itself is sandwiched with a ganache or jam filling. They come in a variety of colors and flavors, making them a popular choice for both casual snacking and upscale desserts.
#3. Madeleine
#The madeleine is a small shell-shaped cake that is light and sponge-like. It is often flavored with lemon or orange zest and sometimes dipped in chocolate. Madeleines are perfect for an afternoon snack with tea or coffee.
#4. Éclair
#The éclair is a long, thin pastry filled with cream and topped with chocolate glaze. It is a classic French treat that is both sweet and satisfying. Éclairs can be found in bakeries all over France and are often enjoyed with a cup of hot chocolate.
#5. Tarte Tatin
#The tarte Tatin is an apple tart that is known for its caramelized apples and puff pastry crust. It is named after the Tatin sisters who created the recipe in the late 19th century. Tarte Tatin is best served warm with a scoop of vanilla ice cream.
#These pastries are just a few of the many delicious treats that France has to offer. Whether you are a seasoned traveler or a first-time visitor, indulging in French pastries is a must-do activity. So go ahead, treat yourself—you deserve it!
📄 許可證
代碼採用Apache 2.0許可證,而模型權重完全開放用於學術研究,也允許免費商業使用。如需申請商業許可證,請填寫申請表(英文/中文)。如有其他問題或合作需求,請聯繫internlm@pjlab.org.cn。
Clip Vit Large Patch14 336
基於Vision Transformer架構的大規模視覺語言預訓練模型,支持圖像與文本的跨模態理解
文本生成圖像
Transformers

C
openai
5.9M
241
Fashion Clip
MIT
FashionCLIP是基於CLIP開發的視覺語言模型,專門針對時尚領域進行微調,能夠生成通用產品表徵。
文本生成圖像
Transformers 英語

F
patrickjohncyh
3.8M
222
Gemma 3 1b It
Gemma 3是Google推出的輕量級先進開放模型系列,基於與Gemini模型相同的研究和技術構建。該模型是多模態模型,能夠處理文本和圖像輸入並生成文本輸出。
文本生成圖像
Transformers

G
google
2.1M
347
Blip Vqa Base
Bsd-3-clause
BLIP是一個統一的視覺語言預訓練框架,擅長視覺問答任務,通過語言-圖像聯合訓練實現多模態理解與生成能力
文本生成圖像
Transformers

B
Salesforce
1.9M
154
CLIP ViT H 14 Laion2b S32b B79k
MIT
基於OpenCLIP框架在LAION-2B英文數據集上訓練的視覺-語言模型,支持零樣本圖像分類和跨模態檢索任務
文本生成圖像
Safetensors
C
laion
1.8M
368
CLIP ViT B 32 Laion2b S34b B79k
MIT
基於OpenCLIP框架在LAION-2B英語子集上訓練的視覺-語言模型,支持零樣本圖像分類和跨模態檢索
文本生成圖像
Safetensors
C
laion
1.1M
112
Pickscore V1
PickScore v1 是一個針對文本生成圖像的評分函數,可用於預測人類偏好、評估模型性能和圖像排序等任務。
文本生成圖像
Transformers

P
yuvalkirstain
1.1M
44
Owlv2 Base Patch16 Ensemble
Apache-2.0
OWLv2是一種零樣本文本條件目標檢測模型,可通過文本查詢在圖像中定位對象。
文本生成圖像
Transformers

O
google
932.80k
99
Llama 3.2 11B Vision Instruct
Llama 3.2 是 Meta 發佈的多語言多模態大型語言模型,支持圖像文本到文本的轉換任務,具備強大的跨模態理解能力。
文本生成圖像
Transformers 支持多種語言

L
meta-llama
784.19k
1,424
Owlvit Base Patch32
Apache-2.0
OWL-ViT是一個零樣本文本條件目標檢測模型,可以通過文本查詢搜索圖像中的對象,無需特定類別的訓練數據。
文本生成圖像
Transformers

O
google
764.95k
129
精選推薦AI模型
Llama 3 Typhoon V1.5x 8b Instruct
專為泰語設計的80億參數指令模型,性能媲美GPT-3.5-turbo,優化了應用場景、檢索增強生成、受限生成和推理任務
大型語言模型
Transformers 支持多種語言

L
scb10x
3,269
16
Cadet Tiny
Openrail
Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型,專為邊緣設備推理設計,體積僅為Cosmo-3B模型的2%左右。
對話系統
Transformers 英語

C
ToddGoldfarb
2,691
6
Roberta Base Chinese Extractive Qa
基於RoBERTa架構的中文抽取式問答模型,適用於從給定文本中提取答案的任務。
問答系統 中文
R
uer
2,694
98