đ Model Card for MediaTek Research Breeze-7B-Instruct-v1_0
MediaTek Research Breeze-7B is a language model family based on Mistral-7B, specifically designed for Traditional Chinese use. It offers two main models: Breeze-7B-Base and Breeze-7B-Instruct, with the current release version being v1.0, which shows significant performance improvements in both English and Traditional Chinese.
đ Quick Start
For details of this model, please read our paper. You can also try the demo here.
⨠Features
Breeze-7B-Base-v1_0
- Expanded the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese.
- 8k-token context length.
Breeze-7B-Instruct-v1_0
- Expanded the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese.
- 8k-token context length.
- Supports multi-turn dialogue (without special handling for harmfulness).
đ Documentation
Model Details
Breeze-7B-Base-v1_0
- Finetuned from: mistralai/Mistral-7B-v0.1
- Model type: Causal decoder-only transformer language model
- Language: English and Traditional Chinese (zh-tw)
Breeze-7B-Instruct-v1_0
Base Model Performance
We compared Breeze-7B-Base-v1_0 with other open-source base language models of similar parameter size. The evaluation was conducted using the code revised from EleutherAI/lm-evaluation-harness.
Models |
#Parameters |
â TMMLU+ (ACC) |
DRCD (EM) |
Table (ACC) |
MMLU (ACC) |
|
|
TC, Knowledge |
TC, Reasoning |
TC, Reasoning |
EN, Knowledge |
|
|
5 shot |
3 shot |
5 shot |
5 shot |
Yi-6B |
6B |
49.63 |
76.61 |
34.72 |
65.35 |
Qwen1.5-7B |
7B |
46.59 |
74.41 |
30.56 |
63.07 |
Breeze-7B-Base-v1_0 |
7B |
42.67 |
80.61 |
31.99 |
61.24 |
Mistral-7B-v0.1 |
7B |
36.93 |
79.27 |
27.78 |
64.89 |
Instruction-tuned Model Performance
We compared Breeze-7B-Instruct-v1_0 with other open-source instruction-tuned language models of similar parameter size. Different evaluation codes were used for different benchmarks.
* Taiwan-LLM models respond to multi-turn questions (English) in Traditional Chinese.
Details on MT-Bench-tw (0 shot): Models |
STEM |
Extraction |
Reasoning |
Math |
Coding |
Roleplay |
Writing |
Humanities |
AVG |
GPT-3.5-Turbo |
7.8 |
6.1 |
5.1 |
6.4 |
6.2 |
8.7 |
7.4 |
9.3 |
7.1 |
Qwen1.5-7B-Chat |
9 |
5.6 |
4.7 |
2.8 |
3.7 |
8.0 |
8.0 |
9.4 |
6.4 |
Breeze-7B-Instruct-v1_0 |
7.8 |
5.2 |
4.2 |
4.2 |
4.1 |
7.6 |
5.9 |
9.1 |
6.0 |
Mistral-7B-v0.2-Instruct |
6.9 |
4.6 |
4.3 |
3.3 |
4.4 |
7.2 |
6.2 |
7.8 |
5.6 |
Yi-6B-Chat |
7.3 |
2.7 |
3.1 |
3.3 |
2.3 |
7.2 |
5.2 |
8.8 |
5.0 |
Taiwan-LLM-13B-v2.0-chat |
6.1 |
3.4 |
4.1 |
2.3 |
3.1 |
7.4 |
6.6 |
6.8 |
5.0 |
Taiwan-LLM-7B-v2.1-chat |
5.2 |
2.6 |
2.3 |
1.2 |
3.4 |
6.6 |
5.7 |
6.8 |
4.2 |
Details on TMMLU+ (0 shot): Model |
STEM |
Social Science |
Humanities |
Other |
AVG |
GPT-3.5-Turbo |
41.58 |
48.52 |
40.96 |
43.18 |
43.56 |
Qwen1.5-7B-Chat |
41.48 |
51.66 |
44.05 |
45.40 |
45.65 |
Breeze-7B-Instruct-v1_0 |
36.46 |
48.38 |
45.11 |
40.75 |
42.67 |
Mistral-7B-v0.2-Instruct |
32.79 |
38.05 |
34.89 |
34.04 |
34.94 |
Yi-6B-Chat |
37.80 |
51.74 |
45.36 |
44.25 |
44.79 |
Taiwan-LLM-13B-v2.0-chat |
27.74 |
33.69 |
27.03 |
29.43 |
29.47 |
Taiwan-LLM-7B-v2.1-chat |
25.58 |
31.76 |
27.36 |
27.61 |
28.08 |
Inference Performance
In this test, we used the first 700 characters of the web article as the input and asked the model to rewrite the same article. All inferences were run on 2 RTX A6000 GPUs.
Models |
â Inference Time (sec) |
Estimated Max Input Length (Char) |
Qwen1.5-7B-Chat |
9.35 |
38.9k |
Yi-6B-Chat |
10.62 |
5.2k |
Breeze-7B-Instruct-v1_0 |
10.74 |
11.1k |
Mistral-7B-Instruct-v0.2 |
20.48 |
5.1k |
Taiwan-LLM-7B-v2.1-chat |
26.26 |
2.2k |
đ License
This model is licensed under the Apache-2.0 license.