đ InternLM-Math-Plus
InternLM-Math-Plus is a state-of-the-art bilingual open-sourced Math reasoning large language model. It serves as a solver, prover, verifier, and augmentor, offering high - performance math reasoning capabilities.
đ Quick Start
For quick access to the model, you can visit the following links:
⨠Features
State - of - the - art Performance
- Formal Math Reasoning: Achieves excellent results on the MiniF2F - test benchmark, outperforming many existing models.
- Informal Math Reasoning: Performs well on benchmarks like MATH, GSM8K, and MathBench - A, competing with top - tier models such as Claude 3 Opus.
Multiple Sizes
Available in 4 sizes including 1.8B, 7B, 20B, and 8x22B, meeting different application requirements.
Bilingual Support
Supports both English and Chinese, facilitating a wider range of users.
đ Documentation
News
- [2024.05.24] Released the updated version InternLM2 - Math - Plus with 4 sizes (1.8B, 7B, 20B, and 8x22B) and state - of - the - art performances. Significantly improved informal math reasoning performance (chain - of - thought and code - intepreter) and formal math reasoning performance (LEAN 4 translation and LEAN 4 theorem proving).
- [2024.02.10] Added tech reports and citation reference.
- [2024.01.31] Added MiniF2F results with evaluation codes.
- [2024.01.29] Added checkpoints from ModelScope. Updated results about majority voting and Code Intepreter. Tech report is on the way.
- [2024.01.26] Added checkpoints from OpenXLab, which ease Chinese users to download.
Performance
Formal Math Reasoning
We evaluate the performance of InternLM2 - Math - Plus on the formal math reasoning benchmark MiniF2F - test. The evaluation setting is the same as Llemma with LEAN 4.
Models |
MiniF2F - test |
ReProver |
26.5 |
LLMStep |
27.9 |
GPT - F |
36.6 |
HTPS |
41.0 |
Llemma - 7B |
26.2 |
Llemma - 34B |
25.8 |
InternLM2 - Math - 7B - Base |
30.3 |
InternLM2 - Math - 20B - Base |
29.5 |
InternLM2 - Math - Plus - 1.8B |
38.9 |
InternLM2 - Math - Plus - 7B |
43.4 |
InternLM2 - Math - Plus - 20B |
42.6 |
InternLM2 - Math - Plus - Mixtral8x22B |
37.3 |
Informal Math Reasoning
We evaluate the performance of InternLM2 - Math - Plus on the informal math reasoning benchmarks MATH and GSM8K.
Model |
MATH |
MATH - Python |
GSM8K |
MiniCPM - 2B |
10.2 |
- |
53.8 |
InternLM2 - Math - Plus - 1.8B |
37.0 |
41.5 |
58.8 |
InternLM2 - Math - 7B |
34.6 |
50.9 |
78.1 |
Deepseek - Math - 7B - RL |
51.7 |
58.8 |
88.2 |
InternLM2 - Math - Plus - 7B |
53.0 |
59.7 |
85.8 |
InternLM2 - Math - 20B |
37.7 |
54.3 |
82.6 |
InternLM2 - Math - Plus - 20B |
53.8 |
61.8 |
87.7 |
Mixtral8x22B - Instruct - v0.1 |
41.8 |
- |
78.6 |
Eurux - 8x22B - NCA |
49.0 |
- |
- |
InternLM2 - Math - Plus - Mixtral8x22B |
58.1 |
68.5 |
91.8 |
We also evaluate models on [MathBench - A](https://github.com/open - compass/MathBench).
Model |
Arithmetic |
Primary |
Middle |
High |
College |
Average |
GPT - 4o - 0513 |
77.7 |
87.7 |
76.3 |
59.0 |
54.0 |
70.9 |
Claude 3 Opus |
85.7 |
85.0 |
58.0 |
42.7 |
43.7 |
63.0 |
Qwen - Max - 0428 |
72.3 |
86.3 |
65.0 |
45.0 |
27.3 |
59.2 |
Qwen - 1.5 - 110B |
70.3 |
82.3 |
64.0 |
47.3 |
28.0 |
58.4 |
Deepseek - V2 |
82.7 |
89.3 |
59.0 |
39.3 |
29.3 |
59.9 |
Llama - 3 - 70B - Instruct |
70.3 |
86.0 |
53.0 |
38.7 |
34.7 |
56.5 |
InternLM2 - Math - Plus - Mixtral8x22B |
77.5 |
82.0 |
63.6 |
50.3 |
36.8 |
62.0 |
InternLM2 - Math - 20B |
58.7 |
70.0 |
43.7 |
24.7 |
12.7 |
42.0 |
InternLM2 - Math - Plus - 20B |
65.8 |
79.7 |
59.5 |
47.6 |
24.8 |
55.5 |
Llama3 - 8B - Instruct |
54.7 |
71.0 |
25.0 |
19.0 |
14.0 |
36.7 |
InternLM2 - Math - 7B |
53.7 |
67.0 |
41.3 |
18.3 |
8.0 |
37.7 |
Deepseek - Math - 7B - RL |
68.0 |
83.3 |
44.3 |
33.0 |
23.0 |
50.3 |
InternLM2 - Math - Plus - 7B |
61.4 |
78.3 |
52.5 |
40.5 |
21.7 |
50.9 |
MiniCPM - 2B |
49.3 |
51.7 |
18.0 |
8.7 |
3.7 |
26.3 |
InternLM2 - Math - Plus - 1.8B |
43.0 |
43.3 |
25.4 |
18.9 |
4.7 |
27.1 |
Citation and Tech Report
@misc{ying2024internlmmath,
title={InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning},
author={Huaiyuan Ying and Shuo Zhang and Linyang Li and Zhejian Zhou and Yunfan Shao and Zhaoye Fei and Yichuan Ma and Jiawei Hong and Kuikun Liu and Ziyi Wang and Yudong Wang and Zijian Wu and Shuaibin Li and Fengzhe Zhou and Hongwei Liu and Songyang Zhang and Wenwei Zhang and Hang Yan and Xipeng Qiu and Jiayu Wang and Kai Chen and Dahua Lin},
year={2024},
eprint={2402.06332},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
đ License
The model is released under the other
license.