SeaLLM-7B-v2 Open-Source Multilingual Large Model - Oriented towards Southeast Asian Languages, Superior Task Performance and Halved Size

Seallm 7B V2

Developed by SeaLLMs

SeaLLM-7B-v2 is a state-of-the-art multilingual large language model for Southeast Asian languages, with half the size but superior performance in multilingual tasks such as world knowledge, mathematical reasoning, and instruction following.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Other #Southeast Asian multilingual #Mathematical reasoning optimization #Zero-shot chain-of-thought

Downloads 1,993

Release Time : 1/29/2024

Model Overview

SeaLLM-7B-v2 is a multilingual large language model based on Mistral-7B, specifically optimized for Southeast Asian languages, excelling in multilingual tasks such as mathematical reasoning, commonsense reasoning, and world knowledge.

Model Features

Strong mathematical reasoning capability

Achieved 78.2 points in GSM8K zero-shot chain-of-thought reasoning, setting a new record for 7B models, and surpassed GPT-3.5 in GSM8K translation tasks for Southeast Asian languages.

Multilingual commonsense reasoning

Competitive with GPT-3.5 in zero-shot commonsense benchmarks, scoring 82.5, 68.3, and 80.9 in Arc-C, Winogrande, and Hellaswag respectively.

Optimized for Southeast Asian languages

Specifically optimized for Southeast Asian languages, achieving a score of 45.74 in the Vietnamese VMLU benchmark, making it the only open-source multilingual model that can compete with monolingual models of the same scale.

Efficient inference

With only 7B parameters, it surpasses larger models in multiple tasks, providing efficient inference performance.

Model Capabilities

Multilingual text generation

Mathematical reasoning

Commonsense reasoning

World knowledge QA

Instruction following

Use Cases

Education

Math problem solving

Helps students solve math problems, especially those in Southeast Asian languages.

Excellent performance in GSM8K and MATH benchmarks

Multilingual applications

Southeast Asian language assistant

Provides multilingual conversation and Q&A services for users in Southeast Asia.

Surpassed GPT-3.5 in tasks involving Vietnamese, Thai, and other languages

Research

Multilingual model research

Serves as a benchmark model for studying multilingual model performance.

Provides comprehensive multilingual evaluation data

🚀 SeaLLM-7B-v2

SeaLLM-7B-v2 is a state-of-the-art multilingual large language model tailored for Southeast Asian languages. It offers high performance across diverse tasks, including world knowledge, math reasoning, and instruction following.

✨ Features

Impressive Math Reasoning: Achieves the 7B-SOTA on the Zero-shot CoT GSM8K task with a score of 78.2. Outperforms GPT-3.5 in many GSM8K-translated tasks in SEA languages (🇨🇳 🇻🇳 🇮🇩 🇹🇭) and MGSM (🇨🇳 🇹🇭). Also surpasses GPT-3.5 in MATH CoT for Thai 🇹🇭.
Strong Commonsense Reasoning: Scores competitively against GPT-3.5 in many zero-shot CoT commonsense benchmarks, with scores of 82.5, 68.3, and 80.9 on Arc-C, Winogrande, and Hellaswag respectively.
High MT-bench Score: Achieves a score of 7.54 on the 🇬🇧 MT-bench, ranking 3rd on the leaderboard for the 7B category and being the most outperforming multilingual model.
Competitive in Vietnamese: Scores 45.74 on the VMLU benchmark for Vietnamese 🇻🇳, and is the only open-source multilingual model that can compete with monolingual models of similar sizes.

🚀 Quick Start

We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages 🇬🇧 🇨🇳 🇻🇳 🇮🇩 🇹🇭 🇲🇾 🇰🇭 🇱🇦 🇲🇲 🇵🇭. It is the most significant upgrade since SeaLLM-13B, with half the size, outperforming in diverse multilingual tasks such as world knowledge, math reasoning, and instruction following.

Release and DEMO

DEMO: SeaLLMs/SeaLLM-7B.
Technical report: Arxiv: SeaLLMs - Large Language Models for Southeast Asia.
Model weights:
- SeaLLM-7B-v2.
- SeaLLM-7B-v2-gguf.
- SeaLLM-7B-v2-GGUF (thanks Lonestriker). NOTE: use seallm.preset.json to work properly.
Run locally:
- LM-studio:
  - SeaLLM-7B-v2-q4_0 and SeaLLM-7B-v2-q8_0.
  - LM-studio requires this seallm.preset.json to set the chat template properly.
- ollama ollama run nxphi47/seallm-7b-v2:q4_0
- MLX for Apple Silicon: mlx-community/SeaLLM-7B-v2-4bit-mlx

⚠️ Important Note

By using our released weights, codes, and demos, you agree to and comply with the terms and conditions specified in our SeaLLMs Terms Of Use.

💡 Usage Tip

We must note that even though the weights, codes, and demos are released in an open manner, similar to other pre-trained language models, and despite our best efforts in red teaming and safety fine-tuning and enforcement, our models come with potential risks, including but not limited to inaccurate, misleading or potentially harmful generation. Developers and stakeholders should perform their own red teaming and provide related security measures before deployment, and they must abide by and comply with local governance and regulations. In no event shall the authors be held liable for any claim, damages, or other liability arising from the use of the released weights, codes, or demos.

📚 Documentation

What's new since SeaLLM-13B-v1 and SeaLLM-7B-v1?

SeaLLM-7B-v2 is continue-pretrained from Mistral-7B and underwent carefully designed tuning with a focus on reasoning.

🔧 Technical Details

Evaluation

Zero-shot CoT Multilingual Math Reasoning

SeaLLM-7B-v2 achieves a score of 78.2 on the GSM8K with zero-shot CoT reasoning, making it the state of the art in the realm of 7B models. It also outperforms GPT-3.5 in the same GSM8K benchmark when translated into SEA languages (🇨🇳 🇻🇳 🇮🇩 🇹🇭). SeaLLM-7B-v2 also surpasses GPT-3.5 on the Thai-translated MATH benchmark, with scores of 22.4 vs 18.1.

See details on English and translated GSM8K and MATH with zero-shot reasoning

Model	GSM8K en	MATH en	GSM8K zh	MATH zh	GSM8K vi	MATH vi	GSM8K id	MATH id	GSM8K th	MATH th
GPT-3.5	80.8	34.1	48.2	21.5	55	26.5	64.3	26.4	35.8	18.1
Qwen-14B-chat	61.4	18.4	41.6	11.8	33.6	3.6	44.7	8.6	22	6
Vistral-7b-chat	48.2	12.5			48.7	3.1
Qwen1.5-7B-chat	56.8	15.3	40	2.7	37.7	9	36.9	7.7	21.9
SeaLLM-7B-v2	78.2	27.5	53.7	17.6	69.9	23.8	71.5	24.4	59.6	22.4

Baselines were evaluated using their respective chat-template and system prompts (Qwen1.5-7B-chat, Vistral).

Zero-shot MGSM

SeaLLM-7B-v2 also outperforms GPT-3.5 and Qwen-14B on the multilingual MGSM for Zh and Th.

Model	MGSM-Zh	MGSM-Th
ChatGPT (reported)	61.2	47.2
Qwen-14B-chat	59.6	28
SeaLLM-7B-v2	64.8	62.4

Zero-shot Commonsense Reasoning

We compare SeaLLM-7B-v2 with ChatGPT and Mistral-7B-instruct on various zero-shot commonsense benchmarks (Arc-Challenge, Winogrande, and Hellaswag). We use the 2-stage technique in (Kojima et al., 2023) to obtain the answer. Note that we DID NOT use "Let's think step-by-step" to invoke explicit CoT.

0-shot reasoning	Arc-Challenge	Winogrande	Hellaswag
ChatGPT (reported)	84.6*	66.8*	72.0*
ChatGPT (reproduced)	84.1	63.1	79.5
Mistral-7B-Instruct	68.1	56.4	45.6
Qwen1.5-7B-chat	79.3	59.4	69.3
SeaLLM-7B-v2	82.5	68.3	80.9

Baselines were evaluated using their respective chat-template and system prompts (Qwen1.5-7B-chat, Mistral).

Multilingual World Knowledge

We evaluate models on 3 benchmarks following the recommended default setups: 5-shot MMLU for En, 3-shot M3Exam (M3e) for En, Zh, Vi, Id, Th, and zero-shot VMLU for Vi.

Model	Langs	En MMLU	En M3e	Zh M3e	Vi M3e	Vi VMLU	Id M3e	Th M3e
GPT-3.5	Multi	68.90	75.46	60.20	58.64	46.32	49.27	37.41
Vistral-7B-chat	Mono	56.86	67.00	44.56	54.33	50.03	36.49	25.27
Qwen1.5-7B-chat	Multi	61.00	52.07	81.96	43.38	45.02	24.29	20.25
SeaLLM-7B-v2	Multi	61.89	70.91	55.43	51.15	45.74	42.25	35.52

VMLU reproduce script here. Lm-eval was used to evaluate MMLU. 0-shot VMLU scores for baselines were evaluated using their respective chat-template and system prompts (Qwen1.5-7B-chat).

MT-Bench

On the English MT-bench metric, SeaLLM-7B-v2 achieves a score of 7.54 on the MT-bench (3rd place on the leaderboard for the 7B category), outperforms many 70B models, and is arguably the only one that handles 10 SEA languages.

Refer to mt_bench/seallm_7b_v2.jsonl for the MT-bench predictions of SeaLLM-7B-v2, and here to reproduce it.

Model	Access	Langs	MT-Bench
GPT-4-turbo	closed	multi	9.32
GPT-4-0613	closed	multi	9.18
Mixtral-8x7b (46B)	open	multi	8.3
Starling-LM-7B-alpha	open	mono (en)	8.0
OpenChat-3.5-7B	open	mono (en)	7.81
SeaLLM-7B-v2	open	multi (10+)	7.54
Qwen-14B	open	multi	6.96
Llama-2-70B	open	mono (en)	6.86
Mistral-7B-instuct	open	mono (en)	6.84

Sea-Bench

Similar to MT-Bench, Sea-bench is a set of categorized instruction test sets to measure models' ability as an assistant, specifically focused on 9 SEA languages, including non-Latin low-resource languages.

As shown, the significant improvements come from math-reasoning, reaching the GPT-3.5 level of performance.

Refer to sea_bench/seallm_7b_v2.jsonl for the Sea-bench predictions of SeaLLM-7B-v2.

💻 Usage Examples

Basic Usage

prompt = """<|im_start|>system
You are a helpful assistant.</s><|im_start|>user
Hello world</s><|im_start|>assistant
Hi there, how can I help?</s>"""

# NOTE: previous commit has \n between </s> and <|im_start|>, that was incorrect!
# <|im_start|> is not a special token.
# Transformers chat_template should be consistent with vLLM format below.

# ! ENSURE 1 and only 1 bos `<s>` at the beginning of sequence
print(tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt)))

'<s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'system', '<0x0A>', 'You', '▁are', '▁a', '▁helpful', '▁assistant', '.', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'user', '<0x0A>', 'Hello', '▁world', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'ass', 'istant', '<0x0A>', 'Hi', '▁there', ',', '▁how', '▁can', '▁I', '▁help', '?', '</s>']
"""

Advanced Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

# use bfloat16 to ensure the best performance.
model = AutoModelForCausalLM.from_pretrained("SeaLLMs/SeaLLM-7B-v2", torch_dtype=torch.bfloat16, device_map=device)
tokenizer = AutoTokenizer.from_pretrained("SeaLLMs/SeaLLM-7B-v2")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello world"},
    {"role": "assistant", "content": "Hi there, how can I help you today?"},
    {"role": "user", "content": "Explain general relativity in details."}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
print(tokenizer.convert_ids_to_tokens(encodeds[0]))
# ['<s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'system', '<0x0A>', 'You', '▁are', '▁a', '▁helpful', '▁assistant', '.', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'user', '<0x0A>', 'Hello', '▁world', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'ass', 'istant', '<0x0A>', 'Hi', '▁there', ',', '▁how', '▁can', '▁I', '▁help', '▁you', '▁today', '?', '</s>', '▁<', '|', 'im', '_', 'start', '|', '>', 'user', '<0x0A>', 'Ex', 'plain', '▁general', '▁rel', 'ativity', '▁in', '▁details']

📄 License

The model is released under the SeaLLMs Terms Of Use.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご