🚀 CantoneseLLMChat-v1.0-7B
The first generation Cantonese LLM from hon9kon9ize, excelling in Hong Kong related specific knowledge and Cantonese conversation.

Cantonese LLM Chat v1.0 is the first generation Cantonese LLM from hon9kon9ize. Building upon the success of v0.5 preview, the model excels in Hong Kong related specific knowledge and Cantonese conversation.
✨ Features
- Specific Knowledge: Specialized in Hong Kong related knowledge.
- Language Proficiency: Skilled in Cantonese conversations.
📚 Documentation
Model description
The base model is obtained via Continuous Pre-Training of Qwen 2.5 7B with 600 million publicly available Hong Kong news articles and Cantonese websites. The instructions fine-tuned model is trained with a dataset consisting of 75,000 instruction pairs. 45,000 pairs are Cantonese instructions generated by other LLMs and reviewed by humans.
The model is trained with 1 Nvidia H100 80GB HBM3 GPU on Genkai Supercomputer.
Model Information
Property |
Details |
Base Model |
Qwen 2.5 7B |
Training Data |
600 million Hong Kong news articles and Cantonese websites for pre - training; 75,000 instruction pairs for fine - tuning (45,000 Cantonese instructions generated by other LLMs and reviewed by humans) |
Training Hardware |
1 Nvidia H100 80GB HBM3 GPU on Genkai Supercomputer |
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "hon9kon9ize/CantoneseLLMChat-v1.0-7B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
def chat(messages, temperature=0.9, max_new_tokens=200):
input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')
output_ids = model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature)
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False)
return response
prompt = "邊個係香港特首?"
messages = [
{"role": "system", "content": "you are a helpful assistant."},
{"role": "user", "content": prompt}
]
print(chat(messages))
📈 Performance
This model is the best in class open source LLM in understanding Cantonese and Hong Kong culture in the HK-Eval Benchmark. However, as one could observe, reasoning models have performed dramatically better than their counterparts. We are currently working on reasoning models for v2.
Model |
HK Culture (zero - shot) |
Cantonese Linguistics |
CantonesellmChat v0.5 6B |
52.0% |
12.8% |
CantonesellmChat v0.5 34B |
72.5% |
54.5% |
CantonesellmChat v1.0 3B |
56.0% |
45.7% |
CantonesellmChat v1.0 7B |
60.3% |
46.5% |
CantonesellmChat v1.0 32B |
69.8% |
52.7% |
CantonesellmChat v1.0 72B |
75.4% |
59.6% |
Llama 3.1 8B Instruct |
45.6% |
35.1% |
Llama 3.1 70B Instruct |
63.0% |
50.3% |
Qwen2.5 7B Instruct |
51.2% |
30.3% |
Qwen2.5 32B Instruct |
59.9% |
45.1% |
Qwen2.5 72B Instruct |
65.9% |
45.9% |
Claude 3.5 Sonnet |
71.7% |
63.2% |
DeepSeek R1 |
88.8% |
77.5% |
Gemini 2.0 Flash |
80.2% |
75.3% |
Gemini 2.5 Pro |
92.1% |
87.3% |
GPT4o |
77.5% |
63.8% |
GPT4o - mini |
55.6% |
57.3% |
📄 License
The license for this model is other
.