InternLM-XComposer2.5-Rewardオープンソース多モーダル報酬モデル - 人間の偏好に合致した報酬スコアを提供

ホーム

Internlm Xcomposer2d5 7b Reward

internlmによって開発

InternLM-XComposer2.5-Rewardは、InternLM-XComposer2.5を基に訓練されたマルチモーダル報酬モデルで、人間の嗜好に合致した報酬スコアを提供できます。

マルチモーダル融合

Transformers

複数言語対応オープンソースライセンス:その他 #マルチモーダル報酬モデル #人間の嗜好スコアリング #テキスト・画像・動画評価

ダウンロード数 767

リリース時間 : 1/21/2025

モデル概要

このモデルは、テキスト、画像、動画の分野における嗜好サンプルで訓練されており、対話や画像分析などのタスクの出力品質を評価できます。

モデル特徴

マルチモーダル評価

テキストと画像入力を同時に処理し、総合的な評価が可能

人間の嗜好アライメント

嗜好サンプルによる訓練で、評価結果が人間の嗜好と一致

高性能

VLRewardBenchやRewardBenchなど、複数のベンチマークで優れた性能を発揮

モデル能力

対話品質評価

画像分析評価

マルチモーダルコンテンツ評価

嗜好ランキング

使用事例

コンテンツ評価

対話品質スコアリング

AIアシスタントが生成した対話応答の品質を評価

0-10点のスコアを提供可能

マルチモーダルコンテンツランキング

画像とテキストを含む複数の応答を品質でランク付け

品質の高い順にランキング結果を返す

モデル訓練

強化学習報酬モデル

強化学習における報酬信号の提供者として機能

人間の嗜好に合致したAIモデルの訓練を支援

🚀 InternLM-XComposer-2.5-Reward

InternLM-XComposer-2.5-Reward は、internlm/internlm-xcomposer2d5-7b をベースに学習されたマルチモーダルな報酬モデルです。このモデルは、テキスト、画像、ビデオの各ドメインの好みのサンプルを使用して学習され、人間の好みに合致する適切な報酬スコアを割り当てます。

InternLM-XComposer-2.5-Reward

💻Github Repo

Paper

🚀 クイックスタート

概要

InternLM-XComposer2.5-Reward は、internlm/internlm-xcomposer2d5-7b を基盤に学習されたマルチモーダルな報酬モデルです。このモデルは、テキスト、画像、ビデオの各ドメインの好みのサンプルを用いて学習され、人間の好みに沿った適切な報酬スコアを割り当てます。

性能評価

VLRewardBench での結果

モデル一般幻覚推論全体マクロ

InternLM-XComposer2.5-7B-Reward 84.7 62.5 62.9 65.8 70.0
RewardBench での結果 | モデル | スコア | チャット | 難しいチャット | 安全性 | 推論 | | --- | --- | --- | --- | --- | --- | | InternLM-XComposer2.5-7B-Reward | 88.6 | 90.8 | 83.8 | 87.8 | 90.0 |
RM-Bench での結果 | モデル | チャット | 数学 | コード | 安全性 | 簡単 | 普通 | 難しい | 平均 | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | InternLM-XComposer2.5-7B-Reward | 65.5 | 55.9 | 51.7 | 93.8 | 87.5 | 71.3 | 47.4 | 68.8 |

モデル	一般	幻覚	推論	全体	マクロ
InternLM-XComposer2.5-7B-Reward	84.7	62.5	62.9	65.8	70.0

💻 使用例

基本的な使用法

以下は、モデルを使用してチャットの報酬スコアを取得したり、2つのチャットを比較したり、複数のチャットをランキングしたりする例です。

import torch
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "internlm/internlm-xcomposer2d5-7b-reward", 
    device_map="cuda", 
    torch_dtype=torch.float16, 
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-xcomposer2d5-7b-reward", trust_remote_code=True)
model.tokenizer = tokenizer

chat_1 = [
    {"role": "user", "content": 'I want to buy a car from the input image, analyze the advantages and weaknesses.'},
    {"role": "assistant", "content": "The car in the image is a Mercedes-Benz G-Class. This is a luxury SUV known for its ruggedness, off-road capability, and iconic boxy design. Here's an analysis of its advantages and weaknesses: Advantages: Off-road capability: The G-Class is built on a robust ladder frame chassis and has a powerful engine, making it highly capable in off-road conditions. It features a low-range transfer case, locking differentials, and a high ground clearance. Luxury and comfort: Despite its rugged exterior, the G-Class offers a luxurious interior with high-quality materials, comfortable seats, and advanced technology features. Iconic design: The distinctive boxy shape of the G-Class has made it a cult classic. It's instantly recognizable and has a timeless appeal. Strong resale value: Due to its exclusivity and desirability, the G-Class tends to hold its value well over time. Weaknesses: Fuel efficiency: The G-Class is not known for its fuel efficiency, especially with the larger engine options. Ride quality: The firm suspension is designed for off-road performance, which can make the ride a bit bumpy on paved roads. Price: The G-Class is a very expensive vehicle, even for a luxury SUV. Size and maneuverability: The large size of the G-Class can make it challenging to park and maneuver in tight spaces."}
]
chat_2 = [
    {"role": "user", "content": 'I want to buy a car from the input image, analyze the advantages and weaknesses.'},
    {"role": "assistant", "content": "Based on the image, it appears to be a Ferrari F8 Tributo. Advantages: Performance: Ferraris are renowned for their exceptional performance. The F8 Tributo is no exception, boasting a powerful twin-turbocharged V8 engine that delivers exhilarating acceleration and top speed. Handling: Ferraris are known for their precise handling and agility. The F8 Tributo continues this tradition, with advanced aerodynamics and sophisticated suspension systems ensuring a thrilling driving experience. Design: Ferrari is synonymous with iconic and head-turning designs. The F8 Tributo is a testament to this, featuring sleek lines, aggressive styling, and a captivating presence. Exclusivity: Owning a Ferrari is a status symbol. It represents luxury, exclusivity, and a passion for high-performance automobiles. Weaknesses: Price: Ferraris come with a hefty price tag. The F8 Tributo is no exception, making it an investment for those with significant financial resources. Fuel Efficiency: High-performance sports cars like the F8 Tributo are not known for their fuel efficiency. You can expect lower miles per gallon compared to everyday vehicles. Maintenance Costs: Owning a Ferrari comes with associated maintenance costs, which can be higher than those for regular cars. Practicality: The F8 Tributo is primarily a two-seater sports car, making it less practical for everyday use or carrying passengers. Ride Comfort: While the F8 Tributo offers a thrilling driving experience, its stiff suspension might not be ideal for long-distance comfort."}
]
image = ['./examples/cars1.jpg']
hd_num = 9


# get reward score for a single chat
with torch.autocast(device_type='cuda', dtype=torch.float16):
    score1 = model.get_score(chat_1, image, hd_num=hd_num)
    score2 = model.get_score(chat_2, image, hd_num=hd_num)
print("score1: ", score1)
print("score2: ", score2)
# >>> score1:  5.76
# >>> score2:  -2.84375


# batch inference, get multiple scores at once
with torch.autocast(device_type='cuda', dtype=torch.float16):
    scores = model.get_scores([chat_1, chat_2], [image, image], hd_num=hd_num)
print("scores: ", scores)
# >>> scores:  [5.76171875, -2.845703125]


# compare whether chat_1 is better than chat_2
with torch.autocast(device_type='cuda', dtype=torch.float16):
    compare_res = model.compare(chat_1, image, chat_2, image, hd_num=hd_num)
print("compare_res: ", compare_res)
# >>> compare_res:  True


# rank multiple chats, it will return the ranking index of each chat
# the chat with the highest score will have ranking index as 0
with torch.autocast(device_type='cuda', dtype=torch.float16):
    rank_res = model.rank([chat_1, chat_2], [image, image], hd_num=hd_num)
print("rank_res: ", rank_res)  # lower index means higher score
# >>> rank_res:  [0, 1]

📄 ライセンス

コードは Apache-2.0 ライセンスの下で提供されています。一方、モデルの重みは学術研究に完全にオープンであり、無料の商用利用も許可されています。商用ライセンスを申請するには、申請フォーム (英語)/申請表（日本語）に記入してください。その他の質問やコラボレーションについては、internlm@pjlab.org.cn までご連絡ください。