đ MetaMath-Llemma-7B
MetaMath-Llemma-7B is a model fine - tuned on MetaMathQA datasets, based on the Llemma - 7B model, which significantly improves performance on mathematical tasks.
đ Quick Start
Check out our paper at https://arxiv.org/abs/2309.12284.
View the project page: https://meta-math.github.io/
⨠Features
- All MetaMathQA data are augmented from the training sets of GSM8K and MATH, with none of the augmented data from the testing set.
- MetaMath-Llemma-7B is fully fine - tuned on the MetaMathQA datasets and based on the powerful Llemma - 7B model, boosting the MATH Pass@1 from 19.8 to 30.0.
đĻ Installation
pip install transformers==4.35.0
pip install torch==2.0.1
pip install sentencepiece==0.1.99
pip install tokenizers==0.13.3
pip install accelerate==0.21.0
pip install bitsandbytes==0.40.0
pip install vllm
pip install fraction
pip install protobuf
đģ Usage Examples
Basic Usage
The prompting template is as follows:
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Response: Let's think step by step."
You need to use your query question to replace the {instruction}
.
đ Documentation
Important Note
All MetaMathQA data are augmented from the training sets of GSM8K and MATH.
None of the augmented data is from the testing set.
You can check the original_question
in meta-math/MetaMathQA
, each item is from the GSM8K or MATH train set.
Experiments
Model |
GSM8k Pass@1 |
MATH Pass@1 |
MPT-7B |
6.8 |
3.0 |
Falcon-7B |
6.8 |
2.3 |
LLaMA-1-7B |
11.0 |
2.9 |
LLaMA-2-7B |
14.6 |
2.5 |
MPT-30B |
15.2 |
3.1 |
LLaMA-1-13B |
17.8 |
3.9 |
GPT-Neo-2.7B |
19.5 |
-- |
Falcon-40B |
19.6 |
2.5 |
Baichuan-chat-13B |
23.9 |
-- |
Vicuna-v1.3-13B |
27.6 |
-- |
LLaMA-2-13B |
28.7 |
3.9 |
InternLM-7B |
31.2 |
-- |
ChatGLM-2-6B |
32.4 |
-- |
GPT-J-6B |
34.9 |
-- |
LLaMA-1-33B |
35.6 |
3.9 |
LLaMA-2-34B |
42.2 |
6.24 |
RFT-7B |
50.3 |
-- |
LLaMA-1-65B |
50.9 |
10.6 |
Qwen-7B |
51.6 |
-- |
WizardMath-7B |
54.9 |
10.7 |
LLaMA-2-70B |
56.8 |
13.5 |
WizardMath-13B |
63.9 |
14.0 |
MAmmoTH-7B (COT) |
50.5 |
10.4 |
MAmmoTH-7B (POT+COT) |
53.6 |
31.5 |
Arithmo-Mistral-7B |
74.7 |
25.3 |
MetaMath-7B |
66.5 |
19.8 |
MetaMath-13B |
72.3 |
22.4 |
đĨ MetaMath-Llemma-7B |
69.2 |
30.0 |
đĨ MetaMath-Mistral-7B |
77.7 |
28.2 |
Information Table
Property |
Details |
Model Type |
MetaMath-Llemma-7B |
Training Data |
meta-math/MetaMathQA |
License |
llama2 |
đ License
The model is under the llama2 license.
đ§ Technical Details
MetaMath-Llemma-7B is fully fine - tuned on the MetaMathQA datasets and based on the powerful Llemma - 7B model. It is glad to see using MetaMathQA datasets and changing the base model from llama - 2 - 7B to Llemma - 7B can boost the MATH performance from 19.8 to 30.0.
đ Citation
@article{yu2023metamath,
title={MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models},
author={Yu, Longhui and Jiang, Weisen and Shi, Han and Yu, Jincheng and Liu, Zhengying and Zhang, Yu and Kwok, James T and Li, Zhenguo and Weller, Adrian and Liu, Weiyang},
journal={arXiv preprint arXiv:2309.12284},
year={2023}
}