InternLM-XComposer2.5-Reward Open-Source Multi-Modal Reward Model - Provides Reward Scores Aligned with Human Preferences

Internlm Xcomposer2d5 7b Reward

Developed by internlm

InternLM-XComposer2.5-Reward is a multimodal reward model based on InternLM-XComposer2.5, capable of providing reward scores that align with human preferences.

Multimodal Fusion

Transformers

Supports Multiple LanguagesOpen Source License:Other #Multimodal Reward Model #Human Preference Scoring #Text-Image-Video Evaluation

Downloads 767

Release Time : 1/21/2025

Model Overview

This model is trained on preference samples from text, image, and video domains, enabling it to evaluate the output quality of tasks such as dialogue and image analysis.

Model Features

Multimodal Evaluation

Capable of processing both text and image inputs for comprehensive evaluation.

Human Preference Alignment

Trained on preference samples, the scoring results align with human preferences.

High Performance

Excels in multiple benchmarks such as VLRewardBench and RewardBench.

Model Capabilities

Dialogue Quality Evaluation

Image Analysis Evaluation

Multimodal Content Scoring

Preference Ranking

Use Cases

Content Evaluation

Dialogue Quality Scoring

Evaluates the quality of dialogue responses generated by AI assistants.

Provides a score ranging from 0 to 10.

Multimodal Content Ranking

Ranks the quality of multiple responses containing images and text.

Returns a ranked list from highest to lowest quality.

Model Training

Reinforcement Learning Reward Model

Serves as a reward signal provider in reinforcement learning.

Helps train AI models that better align with human preferences.

🚀 InternLM-XComposer-2.5-Reward

InternLM-XComposer-2.5-Reward is a multi-modal reward model. It's trained on the basis of internlm/internlm-xcomposer2d5-7b, using preference samples across text, image, and video domains to assign reward scores that match human preferences.

InternLM-XComposer-2.5-Reward

[💻Github Repo](https://github.com/InternLM/InternLM-XComposer) [Paper](https://huggingface.co/papers/2501.12368)

✨ Features

InternLM-XComposer2.5-Reward is a multi-modal reward model trained on the foundation of internlm/internlm-xcomposer2d5-7b. This model has been trained using preference samples across text, image and video domains, and assigning appropriate reward scores that align with human preferences.

📊 Performance Evaluation

Result on VLRewardBench

Models	General	Hallucination	Reasoning	Overall	Macro
InternLM-XComposer2.5-7B-Reward	84.7	62.5	62.9	65.8	70.0

Result on RewardBench

Models	Score	Chat	Chat Hard	Safety	Reasoning
InternLM-XComposer2.5-7B-Reward	88.6	90.8	83.8	87.8	90.0

Result on RM-Bench

Models	Chat	Math	Code	Safety	Easy	Normal	Hard	Average
InternLM-XComposer2.5-7B-Reward	65.5	55.9	51.7	93.8	87.5	71.3	47.4	68.8

💻 Usage Examples

Basic Usage

Here is an example of how to use the model to get the reward score of a chat, compare two chats, or rank multiple chats.

import torch
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "internlm/internlm-xcomposer2d5-7b-reward", 
    device_map="cuda", 
    torch_dtype=torch.float16, 
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm-xcomposer2d5-7b-reward", trust_remote_code=True)
model.tokenizer = tokenizer

chat_1 = [
    {"role": "user", "content": 'I want to buy a car from the input image, analyze the advantages and weaknesses.'},
    {"role": "assistant", "content": "The car in the image is a Mercedes-Benz G-Class. This is a luxury SUV known for its ruggedness, off-road capability, and iconic boxy design. Here's an analysis of its advantages and weaknesses: Advantages: Off-road capability: The G-Class is built on a robust ladder frame chassis and has a powerful engine, making it highly capable in off-road conditions. It features a low-range transfer case, locking differentials, and a high ground clearance. Luxury and comfort: Despite its rugged exterior, the G-Class offers a luxurious interior with high-quality materials, comfortable seats, and advanced technology features. Iconic design: The distinctive boxy shape of the G-Class has made it a cult classic. It's instantly recognizable and has a timeless appeal. Strong resale value: Due to its exclusivity and desirability, the G-Class tends to hold its value well over time. Weaknesses: Fuel efficiency: The G-Class is not known for its fuel efficiency, especially with the larger engine options. Ride quality: The firm suspension is designed for off-road performance, which can make the ride a bit bumpy on paved roads. Price: The G-Class is a very expensive vehicle, even for a luxury SUV. Size and maneuverability: The large size of the G-Class can make it challenging to park and maneuver in tight spaces."}
]
chat_2 = [
    {"role": "user", "content": 'I want to buy a car from the input image, analyze the advantages and weaknesses.'},
    {"role": "assistant", "content": "Based on the image, it appears to be a Ferrari F8 Tributo. Advantages: Performance: Ferraris are renowned for their exceptional performance. The F8 Tributo is no exception, boasting a powerful twin-turbocharged V8 engine that delivers exhilarating acceleration and top speed. Handling: Ferraris are known for their precise handling and agility. The F8 Tributo continues this tradition, with advanced aerodynamics and sophisticated suspension systems ensuring a thrilling driving experience. Design: Ferrari is synonymous with iconic and head-turning designs. The F8 Tributo is a testament to this, featuring sleek lines, aggressive styling, and a captivating presence. Exclusivity: Owning a Ferrari is a status symbol. It represents luxury, exclusivity, and a passion for high-performance automobiles. Weaknesses: Price: Ferraris come with a hefty price tag. The F8 Tributo is no exception, making it an investment for those with significant financial resources. Fuel Efficiency: High-performance sports cars like the F8 Tributo are not known for their fuel efficiency. You can expect lower miles per gallon compared to everyday vehicles. Maintenance Costs: Owning a Ferrari comes with associated maintenance costs, which can be higher than those for regular cars. Practicality: The F8 Tributo is primarily a two-seater sports car, making it less practical for everyday use or carrying passengers. Ride Comfort: While the F8 Tributo offers a thrilling driving experience, its stiff suspension might not be ideal for long-distance comfort."}
]
image = ['./examples/cars1.jpg']
hd_num = 9


# get reward score for a single chat
with torch.autocast(device_type='cuda', dtype=torch.float16):
    score1 = model.get_score(chat_1, image, hd_num=hd_num)
    score2 = model.get_score(chat_2, image, hd_num=hd_num)
print("score1: ", score1)
print("score2: ", score2)
# >>> score1:  5.76
# >>> score2:  -2.84375


# batch inference, get multiple scores at once
with torch.autocast(device_type='cuda', dtype=torch.float16):
    scores = model.get_scores([chat_1, chat_2], [image, image], hd_num=hd_num)
print("scores: ", scores)
# >>> scores:  [5.76171875, -2.845703125]


# compare whether chat_1 is better than chat_2
with torch.autocast(device_type='cuda', dtype=torch.float16):
    compare_res = model.compare(chat_1, image, chat_2, image, hd_num=hd_num)
print("compare_res: ", compare_res)
# >>> compare_res:  True


# rank multiple chats, it will return the ranking index of each chat
# the chat with the highest score will have ranking index as 0
with torch.autocast(device_type='cuda', dtype=torch.float16):
    rank_res = model.rank([chat_1, chat_2], [image, image], hd_num=hd_num)
print("rank_res: ", rank_res)  # lower index means higher score
# >>> rank_res:  [0, 1]

📄 License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact internlm@pjlab.org.cn.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご