🚀 TBAC-VLR1-3B-preview
TBAC-VLR1-3B-preview 是由腾讯 PCG 基础算法中心微调的多模态语言模型。它基于 Qwen2.5-VL-3B-Instruct,使用 Group Relative Policy Optimization (GRPO) 增强多模态推理能力,在多个多模态推理基准测试中,于同规模模型里取得了最先进的成果。
🚀 快速开始
TBAC-VLR1-3B-preview 是一个经过微调的多模态语言模型,基于 Qwen2.5-VL-3B-Instruct 进一步优化,在多模态推理任务中表现出色。
✨ 主要特性
- 多模态融合:能够处理图像和文本输入,实现跨模态的信息理解和推理。
- 强大推理能力:采用 Group Relative Policy Optimization (GRPO) 方法,在多个多模态推理基准测试中达到同规模模型的领先水平。
📊 性能表现
模型 |
平均得分 |
MathVista |
MathVision |
MathVerse |
DynaMath |
WeMath |
LogicVista |
Qwen2-VL-2B |
20.5 |
48.0 |
16.1 |
17.5 |
3.8 |
10.8 |
26.6 |
InternVL2.5-2B |
21.2 |
51.1 |
14.0 |
22.3 |
4.4 |
8.0 |
27.3 |
InternVL3-2B |
29.1 |
57.6 |
20.2 |
24.5 |
14.8 |
22.9 |
40.3 |
Qwen2.5-VL-3B |
31.8 |
61.2 |
21.9 |
31.2 |
13.2 |
22.9 |
40.3 |
VLM-R1-3B-Math-0305 |
33.4 |
62.7 |
21.9 |
32.2 |
13.0 |
30.0 |
40.5 |
Taichu-VLR-3B |
33.6 |
64.9 |
23.1 |
32.1 |
12.6 |
30.4 |
38.7 |
VLAA-Thinker-Qwen2.5VL-3B |
35.4 |
61.0 |
24.4 |
36.4 |
18.2 |
33.8 |
38.5 |
TBAC-VLR1-3B-preview |
35.7 |
64.8 |
25.0 |
33.2 |
17.7 |
32.4 |
40.8 |

对比结果来源于 https://opencompass.org.cn。本模型的结果是通过在每个基准测试上进行离线评估得出的。
💻 使用示例
基础用法
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"TencentBAC/TBAC-VLR1-3B-preview", torch_dtype="auto", device_map="auto"
)
processor = AutoProcessor.from_pretrained("TencentBAC/TBAC-VLR1-3B-preview")
messages = [
{
"role": "system",
"content": "You are a helpful assistant. The user asks a question, and you solve it. You need first think about the reasoning process in the mind and then provides the user with the answer. The answer are enclosed within \\boxed{} tags i.e., reasoning process here \\boxed{ answer here }."
},
{
"role": "user",
"content": [
{
"type": "image",
"image": image_path,
},
{"type": "text", "text": query},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=128, do_sample=False)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
📚 详细文档
引用方式
如果您在研究中发现本模型很有用,请考虑给予 ❤️ 并进行引用。谢谢!
@misc{Xu2025tbacvlr1,
title={TBAC-VLR1-3B-preview},
author={Junzhe Xu and Yuyang yin},
url={https://huggingface.co/TencentBAC/TBAC-VLR1-3B-preview},
year={2025},
}
关于
本模型由腾讯 PCG 基础算法中心创建,保留所有权利。
📄 许可证
本项目采用 Apache-2.0 许可证。