ARIA-70B-V2-GGUF開源大模型 - 免費支持英法雙語文本生成任務

首頁

ARIA 70B V2 GGUF

由TheBloke開發

ARIA 70B V2 是一個基於 Llama 2 架構的大規模語言模型，支持法語和英語，專注於文本生成任務。

大型語言模型支持多種語言#多語言文本生成 #大模型推理 #教育輔助

下載量 1,100

發布時間 : 9/20/2023

模型概述

ARIA 70B V2 是一個 700 億參數的大語言模型，基於 Meta 的 Llama 2 架構開發。該模型經過優化，能夠生成高質量的文本內容，適用於多種自然語言處理任務。

模型特點

多語言支持

同時支持法語和英語的文本生成

大規模參數

700 億參數的強大語言理解能力

安全生成

內置安全機制，避免生成有害或不適當內容

模型能力

文本生成

對話系統

內容創作

語言理解

使用案例

教育

語言學習助手

幫助學生學習法語和英語

提供準確的語言解釋和示例

內容創作

文章寫作

輔助創作者生成高質量文章

流暢、連貫的文本輸出

🚀 ARIA 70B V2 - GGUF

本項目提供了 Faradaylab's ARIA 70B V2 的 GGUF 格式模型文件，可用於文本生成等任務，為用戶提供了多種量化版本選擇，以適應不同的硬件和使用場景。

🚀 快速開始

下載模型

可以通過以下幾種方式下載 GGUF 文件：

自動下載客戶端：LM Studio、LoLLMS Web UI、Faraday.dev 等客戶端會自動提供可用模型列表供你選擇下載。
text-generation-webui：在 Download Model 中輸入模型倉庫地址 TheBloke/ARIA-70B-V2-GGUF，並指定具體文件名（如 aria-70b-v2.Q4_K_M.gguf），然後點擊 Download。
命令行：推薦使用 huggingface-hub Python 庫進行下載。

pip3 install huggingface-hub
huggingface-cli download TheBloke/ARIA-70B-V2-GGUF aria-70b-v2.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

運行模型

llama.cpp 示例命令

./main -ngl 32 -m aria-70b-v2.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]"

-ngl 32：指定要卸載到 GPU 的層數，若沒有 GPU 加速可移除該參數。
-c 4096：指定所需的序列長度。

在 text-generation-webui 中運行

具體說明請參考 text-generation-webui/docs/llama.cpp.md。

在 Python 代碼中運行

可以使用 llama-cpp-python 或 ctransformers 庫加載和運行 GGUF 模型。

from ctransformers import AutoModelForCausalLM

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/ARIA-70B-V2-GGUF", model_file="aria-70b-v2.Q4_K_M.gguf", model_type="llama", gpu_layers=50)

print(llm("AI is going to"))

✨ 主要特性

多格式支持：提供多種量化格式的 GGUF 文件，如 Q2_K、Q3_K、Q4_K 等，可根據不同的硬件和需求進行選擇。
廣泛兼容性：與 llama.cpp 及眾多第三方 UI 和庫兼容，方便用戶在不同環境中使用。
擴展功能：支持 RoPE 縮放，可擴展上下文長度，處理更大的文件。

📦 安裝指南

安裝依賴庫

pip3 install huggingface-hub

若要加速下載，可安裝 hf_transfer：

pip3 install hf_transfer

並設置環境變量：

HF_HUB_ENABLE_HF_TRANSFER=1

下載模型文件

huggingface-cli download TheBloke/ARIA-70B-V2-GGUF aria-70b-v2.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

💻 使用示例

基礎用法

from ctransformers import AutoModelForCausalLM

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/ARIA-70B-V2-GGUF", model_file="aria-70b-v2.Q4_K_M.gguf", model_type="llama", gpu_layers=50)

print(llm("AI is going to"))

高級用法

在命令行中使用 llama.cpp 運行模型，並進行參數調整：

./main -ngl 32 -m aria-70b-v2.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]"

📚 詳細文檔

模型信息

模型創建者：Faradaylab
原始模型：ARIA 70B V2

關於 GGUF

GGUF 是 llama.cpp 團隊於 2023 年 8 月 21 日引入的新格式，用於替代不再受支持的 GGML 格式。以下是一些已知支持 GGUF 的客戶端和庫：

可用倉庫

提示模板

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]

兼容性

這些量化的 GGUFv2 文件與 2023 年 8 月 27 日之後的 llama.cpp 版本兼容（提交記錄 d0cee0d），也與許多第三方 UI 和庫兼容。

量化方法說明

點擊查看詳情

新的量化方法包括：

GGML_TYPE_Q2_K：“type-1” 2 位量化，超級塊包含 16 個塊，每個塊有 16 個權重。塊的縮放和最小值用 4 位量化，最終每個權重有效使用 2.5625 位（bpw）。
GGML_TYPE_Q3_K：“type-0” 3 位量化，超級塊包含 16 個塊，每個塊有 16 個權重。縮放用 6 位量化，最終使用 3.4375 bpw。
GGML_TYPE_Q4_K：“type-1” 4 位量化，超級塊包含 8 個塊，每個塊有 32 個權重。縮放和最小值用 6 位量化，最終使用 4.5 bpw。
GGML_TYPE_Q5_K：“type-1” 5 位量化，與 GGML_TYPE_Q4_K 具有相同的超級塊結構，最終使用 5.5 bpw。
GGML_TYPE_Q6_K：“type-0” 6 位量化，超級塊有 16 個塊，每個塊有 16 個權重。縮放用 8 位量化，最終使用 6.5625 bpw。

請參考下面的“提供的文件”表格，瞭解哪些文件使用了哪些方法以及如何使用。

提供的文件

名稱	量化方法	位數	大小	所需最大 RAM	使用場景
aria-70b-v2.Q2_K.gguf	Q2_K	2	29.28 GB	31.78 GB	最小，但質量損失顯著，不建議用於大多數情況
aria-70b-v2.Q3_K_S.gguf	Q3_K_S	3	29.92 GB	32.42 GB	非常小，但質量損失高
aria-70b-v2.Q3_K_M.gguf	Q3_K_M	3	33.19 GB	35.69 GB	非常小，但質量損失高
aria-70b-v2.Q3_K_L.gguf	Q3_K_L	3	36.15 GB	38.65 GB	小，但質量損失較大
aria-70b-v2.Q4_0.gguf	Q4_0	4	38.87 GB	41.37 GB	舊版；小，但質量損失非常高，建議使用 Q3_K_M
aria-70b-v2.Q4_K_S.gguf	Q4_K_S	4	39.07 GB	41.57 GB	小，但質量損失更大
aria-70b-v2.Q4_K_M.gguf	Q4_K_M	4	41.42 GB	43.92 GB	中等，質量平衡，推薦使用
aria-70b-v2.Q5_0.gguf	Q5_0	5	47.46 GB	49.96 GB	舊版；中等，質量平衡，建議使用 Q4_K_M
aria-70b-v2.Q5_K_S.gguf	Q5_K_S	5	47.46 GB	49.96 GB	大，質量損失低，推薦使用
aria-70b-v2.Q5_K_M.gguf	Q5_K_M	5	48.75 GB	51.25 GB	大，質量損失非常低，推薦使用
aria-70b-v2.Q6_K.gguf	Q6_K	6	56.59 GB	59.09 GB	非常大，質量損失極低
aria-70b-v2.Q8_0.gguf	Q8_0	8	73.29 GB	75.79 GB	非常大，質量損失極低，但不建議使用

注意：上述 RAM 數字假設沒有進行 GPU 卸載。如果將層卸載到 GPU，將減少 RAM 使用並使用 VRAM。

Q6_K 和 Q8_0 文件拆分及合併說明

由於 HF 不支持上傳大於 50GB 的文件，因此 Q6_K 和 Q8_0 文件已拆分為多個文件。

點擊查看 Q6_K 和 Q8_0 文件的合併說明

q6_K

請下載：

aria-70b-v2.Q6_K.gguf-split-a
aria-70b-v2.Q6_K.gguf-split-b

q8_0

請下載：

aria-70b-v2.Q8_0.gguf-split-a
aria-70b-v2.Q8_0.gguf-split-b

合併文件的方法如下：

Linux 和 macOS：

cat aria-70b-v2.Q6_K.gguf-split-* > aria-70b-v2.Q6_K.gguf && rm aria-70b-v2.Q6_K.gguf-split-*
cat aria-70b-v2.Q8_0.gguf-split-* > aria-70b-v2.Q8_0.gguf && rm aria-70b-v2.Q8_0.gguf-split-*

Windows 命令行：

COPY /B aria-70b-v2.Q6_K.gguf-split-a + aria-70b-v2.Q6_K.gguf-split-b aria-70b-v2.Q6_K.gguf
del aria-70b-v2.Q6_K.gguf-split-a aria-70b-v2.Q6_K.gguf-split-b

COPY /B aria-70b-v2.Q8_0.gguf-split-a + aria-70b-v2.Q8_0.gguf-split-b aria-70b-v2.Q8_0.gguf
del aria-70b-v2.Q8_0.gguf-split-a aria-70b-v2.Q8_0.gguf-split-b

🔧 技術細節

模型架構：ARIA 是一種自迴歸語言模型，採用優化的變壓器架構。微調版本使用監督微調（SFT）和基於人類反饋的強化學習（RLHF）來使模型輸出符合人類對有用性和安全性的偏好。
訓練數據：在 50,000 個法語公開數據令牌上進行訓練，預訓練數據截止到 2022 年 9 月，部分微調數據更新至 2023 年 8 月。
RoPE 縮放：採用實驗性的 RoPE 縮放方法，可將 ARIA 的上下文長度從 4,096 擴展到超過 6,000 個令牌，但默認未激活，需要在參數中添加一行代碼來激活。