đ Llemma
Llemma is a language model designed for mathematics. It addresses the challenge of accurate mathematical reasoning and computation in natural language processing. By leveraging advanced training techniques and high - quality datasets, it offers significant value in mathematical tasks such as chain - of - thought reasoning and tool - assisted problem - solving.
⨠Features
- Strong Mathematical Reasoning: Particularly proficient in chain - of - thought mathematical reasoning.
- Tool Utilization: Capable of using computational tools like Python and formal theorem provers for mathematics.
- Multiple Parameter Versions: Available in 7B and 34B parameter versions to suit different application scenarios.
đ Quick Start
Llemma 7B is initialized with Code Llama 7B weights and trained on the [Proof - Pile - 2](https://huggingface.co/datasets/EleutherAI/proof - pile - 2) for 200B tokens. A 34B parameter version, Llemma 34B, is also available.
đ Documentation
Evaluations
Llemma models show excellent performance in various mathematical evaluations.
Chain - of - thought Math
On chain - of - thought mathematics tasks, Llemma models outperform Llama - 2, Code Llama, and when compared at the same model size, outperform Minerva.
Model |
Size |
GSM8k |
OCW |
MMLU - STEM |
SAT |
MATH |
Llama 2 |
7B |
11.8% |
3.7% |
29.9% |
25% |
3.2% |
Code Llama |
7B |
10.5% |
4.4% |
25.1% |
9.4% |
4.5% |
LLEMMA |
7B |
36.4% |
7.7% |
37.7% |
53.1% |
18.0% |
Minerva |
8B |
16.2% |
7.7% |
35.6% |
- |
14.1% |
------------ |
------ |
-------- |
------- |
----------- |
------- |
------- |
Code Llama |
34B |
29.6% |
7.0% |
40.5% |
40.6% |
12.2% |
LLEMMA |
34B |
51.5% |
11.8% |
49.0% |
71.9% |
25.0% |
------------ |
------ |
-------- |
------- |
----------- |
------- |
------- |
Minerva |
62B |
52.4% |
12.0% |
53.9% |
- |
27.6% |
Minerva |
540B |
58.8% |
17.6% |
63.9% |
- |
33.6% |
Further performance can be extracted by using majority voting:
Model |
Size |
GSM8k maj@100 |
OCW maj@100 |
MMLU - STEM maj@16 |
SAT maj@16 |
MATH maj@256 |
LLEMMA |
7B |
54.0% |
14.3% |
49.9% |
78.1% |
33.5 |
Minerva |
8B |
28.4% |
12.5% |
43.4% |
- |
25.4% |
--------- |
------ |
------------- |
----------- |
----------------- |
----------- |
------------ |
LLEMMA |
34B |
69.3% |
18.4% |
59.7% |
81.3% |
43.1% |
--------- |
------ |
------------- |
----------- |
----------------- |
----------- |
------------ |
Minerva |
62B |
68.5% |
23.5% |
63.5% |
- |
43.4% |
Minerva |
540B |
78.5% |
30.8% |
75.0% |
- |
50.3% |
Tool Use and Theorem Proving
In addition to chain - of - thought reasoning, Llemma has strong capabilities in computational mathematics tasks. For tool use and formal theorem proving evaluations, see our paper.
Citation
@misc{azerbayev2023llemma,
title={Llemma: An Open Language Model For Mathematics},
author={Zhangir Azerbayev and Hailey Schoelkopf and Keiran Paster and Marco Dos Santos and Stephen McAleer and Albert Q. Jiang and Jia Deng and Stella Biderman and Sean Welleck},
year={2023},
eprint={2310.10631},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
đ License
The license for this project is llama2.
Additional Information
- Datasets:
- EleutherAI/proof - pile - 2
- open - web - math/open - web - math
- Language: en
- Tags: math, reasoning

ArXiv | Models | [Data](https://huggingface.co/datasets/EleutherAI/proof - pile - 2) | [Code](https://github.com/EleutherAI/math - lm) | Blog | [Sample Explorer](https://llemma - demo.github.io/)
Authors: [Zhangir Azerbayev](https://zhangir - azerbayev.github.io/), Hailey Schoelkopf, Keiran Paster, Marco Dos Santos, Stephen McAleer, Albert Q. Jiang, Jia Deng, Stella Biderman, Sean Welleck