Phi-3-small-8k-instruct開源模型 - 輕量級高效推理，支持英語商研應用

首頁

Phi 3 Small 8k Instruct

由microsoft開發

Phi-3-Small-8K-Instruct是一個70億參數的輕量級開源模型，專注於高質量推理能力，支持8K上下文長度，適用於英語環境下的商業和研究用途。

大型語言模型

Transformers

其他開源協議:MIT #輕量級推理 #多語言代碼生成 #8K長文本處理

下載量 22.92k

發布時間 : 5/7/2024

模型概述

基於Phi-3數據集訓練的輕量級尖端模型，優化了推理能力，特別適合資源受限環境和對延遲敏感的場景。

模型特點

輕量高效

70億參數設計，適合資源受限環境和延遲敏感場景

強大推理能力

在常識、語言理解、數學、代碼和邏輯推理方面表現優異

安全對齊

經過監督微調和直接偏好優化(DPO)訓練，確保指令遵循和安全措施

長上下文支持

提供8K和128K兩種上下文長度變體

模型能力

文本生成

代碼生成

數學推理

邏輯推理

常識問答

語言理解

使用案例

商業應用

客戶服務助手

用於生成快速準確的客戶服務響應

提高響應速度和服務質量

內容生成

自動生成營銷文案、產品描述等內容

提升內容創作效率

研究開發

AI研究

作為語言模型研究的構建模塊

加速AI技術發展

教育工具

輔助編程和數學學習

提供個性化學習體驗

🚀 Phi-3-Small-8K-Instruct

Phi-3-Small-8K-Instruct 是一款輕量級的先進開源模型，具備 70 億參數。它基於 Phi-3 數據集訓練，涵蓋合成數據與高質量公開網頁數據，在常識、語言理解、數學、代碼、長上下文和邏輯推理等基準測試中表現出色。

🚀 快速開始

安裝依賴

Phi-3-Small-8K-Instruct 已集成在 transformers 的開發版本（4.40.2）中。在通過 pip 發佈官方版本之前，請確保執行以下操作之一：

安裝 tiktoken (0.6.0) 和 triton (2.3.0)。
加載模型時，確保在 from_pretrained() 函數中傳入 trust_remote_code=True 參數。
將本地的 transformers 更新到開發版本：pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers。此命令是克隆並從源代碼安裝的替代方法。

可以使用 pip list | grep transformers 驗證當前的 transformers 版本。

運行模型

Phi-3-Small-8K-Instruct 也可在 Azure AI 中使用。以下是在 GPU 上運行模型的示例代碼：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)
model_id = "microsoft/Phi-3-small-8k-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
assert torch.cuda.is_available(), "This model needs a GPU to run ..."
device = torch.cuda.current_device()
model = model.to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=device
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

⚠️ 重要提示

一些應用程序或框架可能不會在對話開始時包含 BOS 標記 (<|endoftext|>)。請確保包含該標記，因為這樣可以獲得更可靠的結果。

✨ 主要特性

輕量級設計：具備 70 億參數，適用於內存/計算受限的環境和低延遲場景。
強大推理能力：在代碼、數學和邏輯推理方面表現出色，可用於通用人工智能系統和應用。
多語言支持：支持最多 100352 個標記的詞彙量，訓練數據包含 10% 的多語言數據。
上下文長度靈活：有 8K 和 128K 兩種上下文長度變體可供選擇。

📦 安裝指南

在官方版本通過 pip 發佈之前，需要進行以下操作：

安裝 tiktoken (0.6.0) 和 triton (2.3.0)。
加載模型時，確保在 from_pretrained() 函數中傳入 trust_remote_code=True 參數。
更新本地的 transformers 到開發版本：pip uninstall -y transformers && pip install git+https://github.com/huggingface/transformers。

💻 使用示例

基礎用法

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

torch.random.manual_seed(0)
model_id = "microsoft/Phi-3-small-8k-instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    torch_dtype="auto", 
    trust_remote_code=True, 
)
assert torch.cuda.is_available(), "This model needs a GPU to run ..."
device = torch.cuda.current_device()
model = model.to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together: 1. Banana and dragonfruit smoothie: Blend bananas and dragonfruits together with some milk and honey. 2. Banana and dragonfruit salad: Mix sliced bananas and dragonfruits together with some lemon juice and honey."},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=device
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

高級用法

在不同的應用場景中，可以調整 generation_args 中的參數，如 max_new_tokens、temperature 和 do_sample 等，以獲得不同的生成效果。

📚 詳細文檔

模型概述

Phi-3-Small-8K-Instruct 屬於 Phi-3 系列的小型版本，有 8K 和 128K 兩種上下文長度變體。該模型經過監督微調（SFT）和直接偏好優化（DPO）的後訓練過程，以確保遵循指令和安全措施。

預期用途

主要用例

適用於英語的廣泛商業和研究用途。
可用於內存/計算受限的環境、低延遲場景以及需要強大推理能力的通用人工智能系統和應用。

使用案例考慮

模型並非專門為所有下游用途設計或評估。開發者在選擇用例時應考慮語言模型的常見限制，並在特定下游用例中使用之前評估和緩解準確性、安全性和公平性問題，特別是在高風險場景中。
開發者應瞭解並遵守與其用例相關的適用法律或法規（包括隱私、貿易合規法律等）。

分詞器

Phi-3-Small-8K-Instruct 支持最多 100352 個標記的詞彙量。

聊天格式

由於訓練數據的性質，Phi-3-Small-8K-Instruct 模型最適合使用以下聊天格式的提示：

<|endoftext|><|user|>\nQuestion <|end|>\n<|assistant|>

例如：

<|endoftext|><|user|>
How to explain Internet for a medieval knight?<|end|>
<|assistant|>

在少樣本提示的情況下，提示可以格式化為以下形式：

<|endoftext|><|user|>
I am going to Paris, what should I see?<|end|>
<|assistant|>
Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:\n\n1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.\n2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.\n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.\n\nThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."<|end|>
<|user|>
What is so great about #1?<|end|>
<|assistant|>

🔧 技術細節

模型

架構：Phi-3 Small-8K-Instruct 具有 70 億參數，是一個密集的僅解碼器 Transformer 模型，採用交替的密集和塊稀疏注意力機制。
輸入：文本，最適合使用聊天格式的提示。
上下文長度：8K 標記。
GPU：1024 個 NVIDIA H100-80G。
訓練時間：18 天。
訓練數據：4.8 萬億標記。
輸出：對輸入的生成文本。
日期：模型於 2024 年 2 月至 4 月進行訓練。
狀態：這是一個基於 2023 年 10 月截止日期的離線數據集訓練的靜態模型。隨著模型的改進，未來可能會發布調優模型的新版本。
發佈日期：模型權重於 2024 年 5 月 21 日發佈。

數據集

訓練數據包括多種來源，總計 4.8 萬億標記（包括 10% 的多語言數據），是以下數據的組合：

經過嚴格質量過濾的公開可用文檔、精選的高質量教育數據和代碼。
為教授數學、編碼、常識推理、世界常識（科學、日常活動、心智理論等）而新創建的合成“教科書式”數據。
涵蓋各種主題的高質量聊天格式監督數據，以反映人類在遵循指令、真實性、誠實性和有用性等不同方面的偏好。

基準測試

在標準開源基準測試中，對 Phi-3-Small-8K-Instruct 的推理能力（包括常識推理和邏輯推理）進行了評估，並與 Mixtral-8x7b、Gemini-Pro、Gemma 7B、Llama-3-8B-Instruct、GPT-3.5-Turbo-1106 和 GPT-4-Turbo-1106 進行了比較。具體結果如下：

基準測試	Phi-3-Small-8K-Instruct 7b	Gemma 7B	Mixtral 8x7B	Llama-3-Instruct 8b	GPT-3.5-Turbo version 1106	Gemini Pro	GPT-4-Turbo version 1106 (Chat)
AGI Eval 5-shot	45.1	42.1	45.2	42.0	48.4	49.0	59.6
MMLU 5-shot	75.7	63.6	70.5	66.5	71.4	66.7	84.0
BigBench Hard 3-shot	79.1	59.6	69.7	51.5	68.3	75.6	87.7
ANLI 7-shot	58.1	48.7	55.2	57.3	58.1	64.2	71.7
HellaSwag 5-shot	77.0	49.8	70.4	71.1	78.8	76.2	88.3
ARC Challenge 10-shot	90.7	78.3	87.3	82.8	87.4	88.3	95.6
ARC Easy 10-shot	97.0	91.4	95.6	93.4	96.3	96.1	98.8
BoolQ 2-shot	84.8	66.0	76.6	80.9	79.1	86.4	91.3
CommonsenseQA 10-shot	80.0	76.2	78.1	79.0	79.6	81.8	86.7
MedQA 2-shot	65.4	49.6	62.2	60.5	63.4	58.2	83.7
OpenBookQA 10-shot	88.0	78.6	85.8	82.6	86.0	86.4	93.4
PIQA 5-shot	86.9	78.1	86.0	75.7	86.6	86.2	90.1
Social IQA 5-shot	79.2	65.5	75.9	73.9	68.3	75.4	81.7
TruthfulQA (MC2) 10-shot	70.2	52.1	60.1	63.2	67.7	72.6	85.2
WinoGrande 5-shot	81.5	55.6	62.0	65.0	68.8	72.2	86.7
TriviaQA 5-shot	58.1	72.3	82.2	67.7	85.8	80.2	73.3
GSM8K Chain of Thought 8-shot	89.6	59.8	64.7	77.4	78.1	80.4	94.2
HumanEval 0-shot	61.0	34.1	37.8	60.4	62.2	64.4	79.9
MBPP 3-shot	71.7	51.5	60.2	67.7	77.8	73.2	86.7
平均	75.7	61.8	69.8	69.4	74.3	75.4	85.2

不同類別表現

基準測試	Phi-3-Small-8K-Instruct 7b	Gemma 7B	Mixtral 8x7B	Llama-3-Instruct 8b	GPT-3.5-Turbo version 1106	Gemini Pro	GPT-4-Turbo version 1106 (Chat)
流行聚合基準測試	71.1	59.4	66.2	59.9	67.0	67.5	80.5
推理	82.4	69.1	77.0	75.7	78.3	80.4	89.3
語言理解	70.6	58.4	64.9	65.4	70.4	75.3	81.6
代碼生成	60.7	45.6	52.7	56.4	70.4	66.7	76.1
數學	51.6	35.8	40.3	41.1	52.8	50.9	67.1
事實知識	38.6	46.7	58.6	43.1	63.4	54.6	45.9
多語言	62.5	63.2	63.4	65.0	69.1	76.5	82.0
魯棒性	72.9	38.4	51.0	64.5	69.3	69.7	84.6

軟件

硬件

默認情況下，Phi-3-Small 模型使用 Flash Attention 2 和 Triton 塊稀疏注意力，需要特定類型的 GPU 硬件才能運行。已在以下 GPU 類型上進行測試：

NVIDIA A100
NVIDIA A6000
NVIDIA H100

如果要在 GPU、CPU 和移動設備上進行優化推理，可以使用 ONNX 模型 8K。

跨平臺支持

ONNX 運行時生態系統現在支持 Phi3 小型模型跨平臺和硬件運行。優化後的 phi-3 模型也以 ONNX 格式發佈，可在 CPU 和 GPU 上跨設備運行，包括服務器平臺、Windows、Linux 和 Mac 桌面以及移動 CPU，並針對每個目標採用最合適的精度。Windows 桌面 GPU（AMD、Intel 和 NVIDIA）支持 DirectML GPU 加速。ONNX 運行時除了 DML 之外，還為 Phi3 Small 提供了跨 CPU、GPU 和移動設備的跨平臺支持。以下是添加的一些優化配置：

用於 int4 DML 的 ONNX 模型：通過 AWQ 量化為 int4。
用於 fp16 CUDA 的 ONNX 模型。
用於 int4 CUDA 的 ONNX 模型：通過 RTN 量化為 int4。
用於 int4 CPU 和移動設備的 ONNX 模型：通過 RTN 量化為 int4。

📄 許可證

該模型根據 MIT 許可證發佈。

商標

本項目可能包含項目、產品或服務的商標或徽標。對 Microsoft 商標或徽標的授權使用需遵循並必須遵守 Microsoft 的商標和品牌指南。在本項目的修改版本中使用 Microsoft 商標或徽標不得造成混淆或暗示 Microsoft 的贊助。任何第三方商標或徽標的使用均需遵循這些第三方的政策。