🚀 MathCoder-VL:連接視覺與代碼,提升多模態數學推理能力
MathCoder-VL 是一系列專門為解決通用數學問題而設計的開源大型多模態模型(LMMs)。同時,還推出了圖像轉代碼模型 FigCodifier-8B。
倉庫鏈接
論文鏈接
🚀 快速開始
模型信息
屬性 |
詳情 |
模型類型 |
image-text-to-text |
評估指標 |
accuracy |
標籤 |
mathematics、reasoning、multi-modal-qa、math-qa、figure-qa、geometry-qa、math-word-problem、textbook-qa、vqa、geometry-diagram、synthetic-scene、chart、plot、scientific-figure、table、function-plot、abstract-scene、puzzle-test、document-image、science |
庫名稱 |
transformers |
基礎模型 |
OpenGVLab/InternVL2-8B |
數據集 |
MathLLMs/MM-MathInstruct |
許可證 |
apache-2.0 |
模型對比
使用示例
訓練和推理代碼請參考 InternVL。
基礎用法
from datasets import load_dataset
from PIL import Image
from io import BytesIO
mm_mathinstruct = load_dataset("MathLLMs/MM-MathInstruct")
print(mm_mathinstruct)
img = Image.open(BytesIO(mm_mathinstruct['train'][-1]['image']))
img.show()
運行上述代碼後,應該會輸出:
DatasetDict({
train: Dataset({
features: ['id', 'image', 'question', 'solution', 'image_path'],
num_rows: 2871988
})
})
📚 詳細文檔
動機
FigCodifier 的構建
MathCoder-VL 的構建
性能表現
📄 許可證
本項目採用 apache-2.0 許可證。
📖 引用
如果您使用了我們的數據、模型或代碼,請引用以下論文:
@inproceedings{
wang2025mathcodervl,
title={MathCoder-{VL}: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning},
author={Ke Wang and Junting Pan and Linda Wei and Aojun Zhou and Weikang Shi and Zimu Lu and Han Xiao and Yunqiao Yang and Houxing Ren and Mingjie Zhan and Hongsheng Li},
booktitle={The 63rd Annual Meeting of the Association for Computational Linguistics},
year={2025},
url={https://openreview.net/forum?id=nuvtX1imAb}
}
@inproceedings{
lu2025mathcoder2,
title={MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code},
author={Zimu Lu and Aojun Zhou and Ke Wang and Houxing Ren and Weikang Shi and Junting Pan and Mingjie Zhan and Hongsheng Li},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=1Iuw1jcIrf}
}
@inproceedings{
wang2024mathcoder,
title={MathCoder: Seamless Code Integration in {LLM}s for Enhanced Mathematical Reasoning},
author={Ke Wang and Houxing Ren and Aojun Zhou and Zimu Lu and Sichun Luo and Weikang Shi and Renrui Zhang and Linqi Song and Mingjie Zhan and Hongsheng Li},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=z8TW0ttBPp}
}