🚀 CantoneseLLMChat-v1.0-32B
CantoneseLLMChat-v1.0-32B is the first-generation Cantonese LLM from hon9kon9ize, excelling in Hong Kong-related specific knowledge and Cantonese conversation.

🚀 Quick Start
Cantonese LLM Chat v1.0 is the first generation Cantonese LLM from hon9kon9ize. Building upon the success of v0.5 preview, the model excels in Hong Kong related specific knowledge and Cantonese conversation.
✨ Features
- Enhanced Knowledge: Specialized in Hong Kong-related specific knowledge.
- Fluent Conversation: Capable of smooth Cantonese conversations.
📦 Installation
This section is skipped as no installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "hon9kon9ize/CantoneseLLMChat-v1.0-32B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
def chat(messages, temperature=0.9, max_new_tokens=200):
input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')
output_ids = model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature)
response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False)
return response
prompt = "邊個係香港特首?"
messages = [
{"role": "system", "content": "you are a helpful assistant."},
{"role": "user", "content": prompt}
]
print(chat(messages))
📚 Documentation
Model description
The base model is obtained via Continuous Pre-Training of Qwen 2.5 32B with 600 million publicly available Hong Kong news articles and Cantonese websites. The instructions fine-tuned model is trained with a dataset consisting of 75,000 instruction pairs. 45,000 pairs are Cantonese instructions generated by other LLMs and reviewed by humans.
The model is trained with 16 Nvidia H100 96GB HBM2e GPUs on Genkai Supercomputer.
🔧 Technical Details
The model's performance is evaluated in the HK-Eval Benchmark. It shows excellent performance in understanding Cantonese and Hong Kong culture among open-source LLMs. However, reasoning models have better performance, and the development team is currently working on reasoning models for v2.
Property |
Details |
Model Type |
CantoneseLLMChat-v1.0-32B |
Base Model |
hon9kon9ize/CantoneseLLM-v1.0-32B-cpt |
Training Data |
600 million publicly available Hong Kong news articles and Cantonese websites for pre - training; a dataset of 75,000 instruction pairs for fine - tuning |
Training Hardware |
16 Nvidia H100 96GB HBM2e GPUs on Genkai Supercomputer |
Model |
HK Culture (zero-shot) |
Cantonese Linguistics |
CantonesellmChat v0.5 6B |
52.0% |
12.8% |
CantonesellmChat v0.5 34B |
72.5% |
54.5% |
CantonesellmChat v1.0 3B |
56.0% |
45.7% |
CantonesellmChat v1.0 7B |
60.3% |
46.5% |
CantonesellmChat v1.0 32B |
69.8% |
52.7% |
CantonesellmChat v1.0 72B |
75.4% |
59.6% |
Llama 3.1 8B Instruct |
45.6% |
35.1% |
Llama 3.1 70B Instruct |
63.0% |
50.3% |
Qwen2.5 7B Instruct |
51.2% |
30.3% |
Qwen2.5 32B Instruct |
59.9% |
45.1% |
Qwen2.5 72B Instruct |
65.9% |
45.9% |
Claude 3.5 Sonnet |
71.7% |
63.2% |
DeepSeek R1 |
88.8% |
77.5% |
Gemini 2.0 Flash |
80.2% |
75.3% |
Gemini 2.5 Pro |
92.1% |
87.3% |
GPT4o |
77.5% |
63.8% |
GPT4o-mini |
55.6% |
57.3% |
📄 License
The license information is "other".