Llama-3-Taiwan-70B-Instruct開源AI模型 - 支持繁英雙語，實現頂尖中文NLP應用！

首頁

Llama 3 Taiwan 70B Instruct

由yentinglin開發

Llama-3-Taiwan-70B是基於Llama-3架構、使用繁體中文和英文大數據微調的700億參數模型，在多項繁體中文NLP基準測試中展現了頂尖性能。

大型語言模型

Transformers

支持多種語言#繁體中文優化 #700億參數大模型 #臺灣本地化知識

下載量 1,279

發布時間 : 5/31/2024

模型概述

專為繁體中文和英文用戶優化的大語言模型，具備卓越的語言理解與生成能力、邏輯推理能力和多輪對話能力。

模型特點

強大的語言理解與生成能力

在多項繁體中文NLP基準測試中展現了頂尖性能。

支持8K上下文長度

能夠處理更長的文本輸入和輸出。

多輪對話能力

能夠進行連貫的多輪對話。

邏輯推理能力

具備較強的邏輯推理和問題解決能力。

模型能力

文本生成

多輪對話

邏輯推理

檢索增強生成(RAG)

結構化輸出

使用案例

對話系統

多輪對話

用戶與AI助手進行多輪對話，助手提供有用、詳細且禮貌的回答。

能夠進行連貫的多輪對話。

信息檢索與生成

檢索增強生成(RAG)

結合檢索信息生成更準確的回答。

提高回答的準確性和相關性。

結構化輸出

生成結構化數據輸出。

能夠生成符合特定格式的輸出。

🚀 Llama-3-Taiwan-70B

Llama-3-Taiwan-70B是一款基於Llama-3架構，在大量繁體中文和英文數據上微調的700億參數模型。它在各種繁體中文自然語言處理基準測試中表現出色，具備強大的語言理解、生成、推理和多輪對話能力。

Taiwan LLM Logo

🚀 快速開始

免費API：在NVIDIA NIM上使用免費API
演示網站：在twllm.com上交互式試用Llama-3-Taiwan。
聊天機器人競技場：參與精彩的聊天機器人競技場，與其他聊天機器人一較高下！

✨ 主要特性

參數規模：700億參數。
支持語言：繁體中文（zh-tw）、英文（en）。
微調數據：在高質量的繁體中文和英文語料庫上微調，涵蓋一般知識以及法律、製造、醫療和電子領域的行業知識。
上下文長度：8K上下文長度（另有128k版本）。
開源許可：基於Llama-3許可證發布的開源模型。

📦 安裝指南

Hugging Face Transformers庫

import torch
from transformers import pipeline, StoppingCriteria

# Define a custom stopping criteria class
class EosListStoppingCriteria(StoppingCriteria):
    def __init__(self, eos_sequence=[128256]):
        self.eos_sequence = eos_sequence

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
        last_ids = input_ids[:, -len(self.eos_sequence):].tolist()
        return self.eos_sequence in last_ids

# Initialize the model with automatic device mapping
llm = pipeline("text-generation", model="yentinglin/Llama-3-Taiwan-70B-Instruct", device_map="auto")
tokenizer = llm.tokenizer

# Define a conversation example
chat = [
    {"role": "system", "content": "You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project."},
    {"role": "user", "content": "你好，請問你可以完成什麼任務？"},
    {"role": "assistant", "content": "你好，我可以幫助您解決各種問題、提供資訊並協助完成多種任務。例如：回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。"},
    {"role": "user", "content": "太棒了！"}
]
flatten_chat_for_generation = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
"""
<|im_start|>user
You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.<|im_end|>
<|im_start|>user
你好，請問你可以完成什麼任務？<|im_end|>
<|im_start|>assistant
你好，我可以幫助您解決各種問題、提供資訊和協助您完成許多不同的任務。例如：回答技術問題、提供建議、翻譯文字、尋找資料或協助您安排行程等。請告訴我如何能幫助您。<|im_end|>
<|im_start|>user
太棒了！<|im_end|>
<|im_start|>assistant

"""

# Generate a response using the custom stopping criteria
output = llm(flatten_chat_for_generation, return_full_text=False, max_new_tokens=128, top_p=0.9, temperature=0.7, stopping_criteria=[EosListStoppingCriteria([tokenizer.eos_token_id])])
print(output[0]['generated_text'])
"謝謝！很高興能夠為您服務。如果有任何其他需要協助的地方，請隨時與我聯繫。我會盡最大努力為您提供所需的支援。"

vLLM

啟動服務器：

export NUM_GPUS=4
export PORT=8000

docker run \
  -e HF_TOKEN=$HF_TOKEN \
  --gpus '"device=0,1,2,3"' \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -p "${PORT}:8000" \
  --ipc=host \
  vllm/vllm-openai:v0.4.0.post1 \
  --model "yentinglin/Llama-3-Taiwan-70B-Instruct" \
  -tp "${NUM_GPUS}"

示例客戶端代碼：

# pip install "openai>=1.0.0"
from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="yentinglin/Llama-3-Taiwan-70B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."},
    ]
)
print("Chat response:", chat_response)

💻 使用示例

多輪對話

System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.

User: 嗨，你好！

Assistant: 你好！我今天可以如何協助你？

User: 就想聊天而已....

Assistant: 當然，我很樂意和你聊天！有什麼特別想談論的嗎？

RAG（檢索增強生成）

System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.

User: 明天台大天氣如何？

Assistant: 根據天氣預報，明天台大天氣為陣雨。低溫 24°C。10 到 15 公里/小時 的 東 風 。降雨機率為 40%。

格式化輸出, 語言理解, 實體識別, 函數調用

System: You are an AI assistant called Twllm, created by TAME (TAiwan Mixture of Expert) project.

User: 您是電影評論分析師，負責情緒分析。
分析每個被評論的句子，評估句子中每個字和標點符號的重要性，
並且將結果使用 tuple 的結構列出。
tuple 值的範圍介於 -1 (負向) 到 1 (正向) 之間，
然後，將評論分類為正向 (1) 或負向 (-1)。
這些需要被評論的句子，包含在 <review> 標籤中間。

輸出範例:
[(<單字或標點符號>, <float importance>), ...]
<int classification>
<review> 根據外媒報導，宏都拉斯在去年3月與臺灣斷交，並轉往與中國建交之後，卻遲遲等不到中國的金援，談好要進口宏都拉斯的白蝦也拖延許久，甚至還將白蝦價格殺價至過往與臺灣交易的半價，再加上墨西哥同時也中斷了與宏國的白蝦貿易，對當地蝦農可說是雪上加霜。 <review>
  

Assistant: [('根據', 0.2), ('外媒', 0.1), ('報導', -0.1), (',', 0.05), ('宏都拉斯', -0.3), ('在去年3月與臺灣斷交， ', -0.5), ('並轉往與中國建交之後， ', -0.4), ('卻遲遲等不到中國的金援， ', -1),
('談好要進口宏都拉斯的白蝦也拖延許久， ', -1),
(',甚至還將白蝦價格殺價至過往與臺灣交易的半價，再加上墨西哥同時也中斷了與宏國的白蝦貿易，對當地蝦農可說是雪上加霜。 ',-1)]
-1

📚 詳細文檔

訓練細節

訓練框架：NVIDIA NeMo、NVIDIA NeMo Megatron
推理框架：NVIDIA TensorRT-LLM
基礎模型：Llama-3 70B
硬件：臺北-1上的NVIDIA DGX H100
上下文長度：8K tokens（另有128k版本）
批次大小：每步2M tokens

評估

查看Open TW LLM Leaderboard以獲取完整和最新的列表。

模型	TMLU	臺灣在地化測試	Legal Eval	TW MT-Bench	長文本支援	函數呼叫	TMMLU+
	學科知識		臺灣法律考題	中文多輪對答
yentinglin/Llama-3-Taiwan-70B-Instruct	74.76%	80.95%	68.42%	7.54	128k版本	✅	67.53%
yentinglin/Llama-3-Taiwan-70B-Instruct-DPO	74.60%	81.75%	70.33%	-	-	✅	-
yentinglin/Llama-3-Taiwan-70B-Instruct-128k	73.01%	80.16%	63.64%	-	-	✅	-
yentinglin/Llama-3-Taiwan-8B-Instruct	59.50%	61.11%	53.11%	7.21	128k版本	✅	52.28%
yentinglin/Llama-3-Taiwan-8B-Instruct-DPO	59.88%	59.52%	52.63%	-	-	✅	-
yentinglin/Llama-3-Taiwan-8B-Instruct-128k	-	-	-	-	-	✅	-
Claude-3-Opus	73.59% (5-shot)	69.84%	60.29%	-	200k	✅	-
GPT4-o	65.56% (0-shot), 69.88% (5-shot)	76.98%	53.59%	-	128k	✅	-
GPT4-turbo	70.42% (5-shot)	-	-	-	128k	✅	60.34%^
Gemini-Pro	61.40% (5-shot)	-	-	-	1000k	✅	49.92%^
GPT-3.5-turbo-1106	49.37% (5-shot)	-	-	7.1	128k	✅	41.76%^
Qwen1.5-110B-Chat	75.69%	66.67%	49.28%	-	32k	✅	65.81%
Yi-34B-Chat	73.59%	71.43%	55.02%	6.9	200k	✅	64.10%
Meta-Llama-3-70B-Instruct	70.95%	65.08%	52.63%	-	8k	✅	62.75%
Mixtral-8x22B-Instruct-v0.1	55.57%	52.38%	44.98%	-	64k	✅	52.16%
Breexe-8x7B-Instruct-v0_1	-	-	-	7.2	8k	❓	48.92%
c4ai-command-r-plus	62.87%	64.29%	34.45%	-	128k	✅	49.75%
Meta-Llama-3-8B-Instruct	55.81%	46.83%	35.89%	-	8k	✅	43.38%
Breeze-7B-Instruct-v1_0	55.57%	52.38%	39.23%	6.0	32k	❓	41.77%
Llama3-TAIDE-LX-8B-Chat-Alpha1	47.30%	50.79%	37.80%	-	8k	❓	39.03%
Phi-3-mini-4k-instruct	40.97%	37.30%	27.27%	-	4k	❓	33.02%

數字默認為0-shot。評估實現

^ 從原始數據集中獲取最接近的匹配數字。

《出師表》細節查找評估

「《出師表》細節查找」評估測試模型定位和回憶嵌入在大量文本中的重要信息的能力，使用諸葛亮的經典中文文本《出師表》。

要運行評估，請使用腳本。

TW MT-Bench得分

平均分：7.5375
最高分：10
最低分：1
中位數：9.0
標準差：3.0349783771882133
總得分數：160
模型回覆
GPT-4評估
從mtkresearch/TCEval分叉並修復了錯誤的代碼

TW MT-Bench得分圖

🔧 技術細節

本模型使用NVIDIA NeMo™ Framework，在基於NVIDIA DGX H100系統構建的NVIDIA臺北-1上進行訓練。訓練和推理的具體框架和參數如下：

訓練框架：NVIDIA NeMo、NVIDIA NeMo Megatron
推理框架：NVIDIA TensorRT-LLM
基礎模型：Llama-3 70B
硬件：NVIDIA DGX H100 on Taipei-1
上下文長度：8K tokens（另有128k版本）
批次大小：每步2M tokens

📄 許可證

本模型基於Llama-3許可證發布。

貢獻者

Yun-Nung (Vivian) Chen教授，在整個項目中提供指導和建議。
Wei-Lin Chen，負責預訓練數據管道。
Tzu-Han Lin，負責合成數據生成。
Chang-Sheng Kao，負責提升合成數據質量。
Kang-Chieh Chen，負責清理指令跟隨數據。
Min-Yi Chen 和 Shao-Heng Hsu，負責收集化學工程數據和基準。
Chung-Yao Ma、Jonathan Guo 和 Kai-Chun Chang，負責收集製造和電氣工程數據和基準，以及項目進度管理。

引用

@article{DBLP:journals/corr/abs-2311-17487,
  author       = {Yen{-}Ting Lin and
                  Yun{-}Nung Chen},
  title        = {Taiwan {LLM:} Bridging the Linguistic Divide with a Culturally Aligned
                  Language Model},
  journal      = {CoRR},
  volume       = {abs/2311.17487},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2311.17487},
  doi          = {10.48550/ARXIV.2311.17487},
  eprinttype    = {arXiv},
  eprint       = {2311.17487},
  timestamp    = {Tue, 05 Dec 2023 14:40:42 +0100},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2311-17487.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
@article{DBLP:journals/corr/abs-2403-20180,
  author       = {Po{-}Heng Chen and
                  Sijia Cheng and
                  Wei{-}Lin Chen and
                  Yen{-}Ting Lin and
                  Yun{-}Nung Chen},
  title        = {Measuring Taiwanese Mandarin Language Understanding},
  journal      = {CoRR},
  volume       = {abs/2403.20180},
  year         = {2024},
  url          = {https://doi.org/10.48550/arXiv.2403.20180},
  doi          = {10.48550/ARXIV.2403.20180},
  eprinttype    = {arXiv},
  eprint       = {2403.20180},
  timestamp    = {Wed, 10 Apr 2024 17:37:45 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2403-20180.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}