🚀 Bonsai: A Small Ternary-Weight Language Model
Bonsai is a small ternary - weight language model developed by deepgrove. It uses the Llama architecture and Mistral tokenizer, with modified linear layers for ternary weights. Trained with less than 5 billion tokens, it offers a new level of efficiency.
🚀 Quick Start
Bonsai can be used via the Huggingface Transformers library. Here is a quick example:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepgrove/Bonsai", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepgrove/Bonsai", trust_remote_code=True)
text = "What is the capital of France?"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
⚠️ Important Note
Bonsai is not instruction - tuned. It is highly recommended to finetune the model before using it in a downstream task.
✨ Features
Model Details
Bonsai is a small 500 - million - parameter ternary - weight language model developed by deepgrove. It follows the Llama architecture and Mistral tokenizer as in Danube 3, with modified linear layers for ternary weights. It's mainly trained using DCLM - Pro and Fineweb - Edu, and is trained with less than 5 billion tokens.
Property |
Details |
Developed by |
deepgrove |
Language(s) (NLP) |
English |
License |
Apache 2 |
Repository |
https://github.com/deepgrove - ai/Bonsai |
Paper |
https://github.com/deepgrove - ai/Bonsai/tree/main/paper/Bonsai.pdf |
📚 Documentation
Usage
Bonsai can be easily used through the Huggingface Transformers library. Currently, all operations are performed in 16 - bit precision, and the team is working on integrating the model design with custom mixed - precision kernels.
Evaluation
Bonsai achieves competitive performance among its peers and is one of the first ternary models to do so. The evaluation results are as follows. For more detailed results and comparisons with other ternary models, please refer to the accompanying paper linked above. The lm - eval is used for all benchmarks except for MMLU, and lighteval's cloze formulation is used for MMLU.
Model |
ARC - c |
ARC - e |
HS. |
OBQA |
PiQA |
Wino. |
MMLU |
Avg |
MobiLlama 0.5B |
26.62 |
46.68 |
51.66 |
30.00 |
71.65 |
54.50 |
28.61 |
44.25 |
Qwen 2 0.5B |
28.84 |
50.29 |
49.12 |
33.00 |
69.26 |
56.99 |
31.78 |
45.61 |
MobileLLM 600M |
29.01 |
56.65 |
55.35 |
34.00 |
71.65 |
59.75 |
31.40 |
48.13 |
Qwen 2.5 0.5B |
32.25 |
58.29 |
52.18 |
35.40 |
69.91 |
56.12 |
33.40 |
48.22 |
Bonsai |
33.36 |
57.95 |
48.04 |
34.00 |
70.24 |
54.85 |
30.28 |
46.96 |