Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Breeze-7B-Base-v0.1-GGUF
This repository contains GGUF format model files for MediaTek Research's Breeze-7B-Base-v0.1, which is optimized for Traditional Chinese tasks.
🚀 Quick Start
This repo provides GGUF format model files for MediaTek Research's Breeze-7B-Base-v0.1.
Model Information
- Model creator: MediaTek Research
- Original model: Breeze-7B-Base-v0.1
✨ Features
Breeze-7B-Base-v0.1
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
- 8k-token context length
Breeze-7B-Instruct-v0.1
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
- 8k-token context length
- Multi-turn dialogue (without special handling for harmfulness)
Breeze-7B-Instruct-64k-v0.1
- Expanding the vocabulary dictionary size from 32k to 62k to better support Traditional Chinese
- 64k-token context length
- Multi-turn dialogue (without special handling for harmfulness)
📚 Documentation
About GGUF
GGUF is a new format introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp.
Here is an incomplete list of clients and libraries that are known to support GGUF:
- llama.cpp. The source project for GGUF. Offers a CLI and a server option.
- text-generation-webui, the most widely used web UI, with many features and powerful extensions. Supports GPU acceleration.
- KoboldCpp, a fully featured web UI, with GPU accel across all platforms and GPU architectures. Especially good for story telling.
- GPT4All, a free and open source local running GUI, supporting Windows, Linux and macOS with full GPU accel.
- LM Studio, an easy-to-use and powerful local GUI for Windows and macOS (Silicon), with GPU acceleration. Linux available, in beta as of 27/11/2023.
- LoLLMS Web UI, a great web UI with many interesting and unique features, including a full model library for easy model selection.
- Faraday.dev, an attractive and easy to use character-based chat GUI for Windows and macOS (both Silicon and Intel), with GPU acceleration.
- llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
- candle, a Rust ML framework with a focus on performance, including GPU support, and ease of use.
- ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. Note, as of time of writing (November 27th 2023), ctransformers has not been updated in a long time and does not support many recent models.
Original model card
Breeze-7B is a language model family that builds on top of Mistral-7B, specifically intended for Traditional Chinese use.
- Breeze-7B-Base is the base model for the Breeze-7B series. It is suitable for use if you have substantial fine-tuning data to tune it for your specific use case.
- Breeze-7B-Instruct derives from the base model Breeze-7B-Base, making the resulting model amenable to be used as-is for commonly seen tasks.
- Breeze-7B-Instruct-64k is a slightly modified version of Breeze-7B-Instruct to enable a 64k-token context length. Roughly speaking, that is equivalent to 88k Traditional Chinese characters.
The current release version of Breeze-7B is v0.1.
Practicality
- Breeze-7B-Base expands the original vocabulary with additional 30,000 Traditional Chinese tokens. With the expanded vocabulary, everything else being equal, Breeze-7B operates at twice the inference speed for Traditional Chinese to Mistral-7B and Llama 7B. [See Inference Performance.]
- Breeze-7B-Instruct can be used as is for common tasks such as Q&A, RAG, multi-round chat, and summarization.
- In particular, Breeze-7B-Instruct-64k can perform tasks at a document level, not a chapter level.
Performance
- Breeze-7B-Instruct demonstrates impressive performance in benchmarks for Traditional Chinese, when compared to similar sized open-source contemporaries such as Taiwan-LLM-7B/13B-chat, QWen-7B-Chat, and Yi-6B-Chat. [See Chat Model Performance.]
- Breeze-7B-Instruct shows comparable results to Mistral-7B-Instruct-v0.1 on the MMLU and MT-Bench benchmarks. [See Chat Model Performance.]
A project by the members (in alphabetical order): Chan-Jan Hsu 許湛然, Chang-Le Liu 劉昶樂, Feng-Ting Liao 廖峰挺, Po-Chun Hsu 許博竣, Yi-Chang Chen 陳宜昌, and the supervisor Da-Shan Shiu 許大山.
Model Details
Property | Breeze-7B-Base-v0.1 | Breeze-7B-Instruct-v0.1 | Breeze-7B-Instruct-64k-v0.1 |
---|---|---|---|
Finetuned from | mistralai/Mistral-7B-v0.1 | MediaTek-Research/Breeze-7B-Base-v0.1 | MediaTek-Research/Breeze-7B-Instruct-v0.1 |
Model Type | Causal decoder-only transformer language model | Causal decoder-only transformer language model | Causal decoder-only transformer language model |
Language | English and Traditional Chinese (zh-tw) | English and Traditional Chinese (zh-tw) | English and Traditional Chinese (zh-tw) |
Base Model Performance
TMMLU+, DRCD, and Table source from MediaTek-Research/TCEval-v2. MediaTek-Research/TCEval-v2 derives from TCEval-v1 and ikala/tmmluplus. MMLU sources from hails/mmlu_no_train. We use the code revised from EleutherAI/lm-evaluation-harness to evaluate TMMLU+, DRCD, Table, and MMLU.
Models | Size | ↑ TMMLU+ (ACC) TC, Knowledge 5 shot |
DRCD (EM) TC, Reasoning 3 shot |
Table (ACC) TC, Reasoning 5 shot |
MMLU (ACC) EN, Knowledge 5 shot |
---|---|---|---|---|---|
Yi-34B | 34B | 63.10 | 84.57 | 49.31 | 77.42 |
Qwen-14B | 14B | 51.30 | 16.95 * | 50.69 | 68.83 |
Yi-6B | 6B | 49.63 | 76.61 | 34.72 | 65.35 |
Qwen-7B | 7B | 42.84 | 0.0 * | 39.58 | 61.00 |
Breeze-7B-Base-v0.1 | 7B | 40.35 | 81.13 | 28.47 | 61.63 |
Mistral-7B-v0.1 | 7B | 36.93 | 79.27 | 27.78 | 64.89 |
* Few-shot learning cannot effectively guide the model to generate the proper answer.
Chat Model Performance
TMMLU+, DRCD, Table, and MT-Bench-tw source from MediaTek-Research/TCEval-v2. MediaTek-Research/TCEval-v2 derives from TCEval-v1 and ikala/tmmluplus. MMLU sources from hails/mmlu_no_train. MT-Bench source from lmsys/mt_bench_human_judgments. We use the code revised from EleutherAI/lm-evaluation-harness to evaluate TMMLU+, DRCD, Table, and MMLU. We use the code revised from fastchat llm_judge (GPT4 as judge) to evaluate MT-Bench-tw and MT-Bench.
Models | Size | ↑ MT-Bench-tw (Score) TC, Chat 0 shot |
TMMLU+ (ACC) TC, Knowledge 0 shot |
TMMLU+ (ACC) TC, Knowledge 5 shot |
DRCD (EM) TC, Reasoning 3 shot |
Table (ACC) TC, Reasoning 0 shot |
MT-Bench (Score) EN, Chat 0 shot |
MMLU (ACC) EN, Knowledge 0 shot |
MMLU (ACC) EN, Knowledge 5 shot |
---|---|---|---|---|---|---|---|---|---|
gpt-3.5-turbo | - | 7.1 | 41.76 | - | - | - | 7.9 | 70.00 | - |
Yi-34B-Chat | 34B | 6.9 | 54.87 | - | - | 36.81 | 7.6 | 71.04 | - |
Qwen-14B-Chat | 14B | 6.4 | 48.41 | - | - | 41.67 | 7.2 | 64.91 | - |
Breeze-7B-Instruct-v0.1 | 7B | 5.7 | 41.61 | - | - | 45.83 | 7.1 | 63.26 | - |
Breeze-7B-Instruct-64k-v0.1 | 7B | 5.5 | 40.99 | - | - | 36.11 | 7.1 | 63.68 | - |
Qwen-7B-Chat | 7B | 5.4 | 40.02 | - | - | 33.33 | 6.2 | 55.94 | - |
Yi-6B-Chat | 6B | 5.0 | 44.79 | - | - | 25.69 | 6.0 | 59.45 | - |
Taiwan-LLM-13B-v2.0-chat | 13B | 5.0 | 29.47 | - | - | 23.61 | -* | 50.50 | - |
Taiwan-LLM-7B-v2.1-chat | 7B | 4.2 | 28.08 | - | - | 31.25 | -* | 42.72 | - |
* Taiwan-LLM models responds to multi-turn questions (English) in Traditional Chinese.
Category Score of MT-Bench-tw (0 shot)
Models | STEM | Extraction | Reasoning | Math | Coding | Roleplay | Writing | Humanities | ↑ AVG |
---|---|---|---|---|---|---|---|---|---|
gpt-3.5-turbo | 7.8 | 6.1 | 5.1 | 6.4 | 6.2 | 8.7 | 7.4 | 9.3 | 7.1 |
Yi-34B-Chat | 9.0 | 4.8 | 5.7 | 4.0 | 4.7 | 8.5 | 8.7 | - | - |
📄 License
This project is licensed under the Apache-2.0 license.

