ARIA-70B-V2-GGUF开源大模型 - 免费支持英法双语文本生成任务

首页

ARIA 70B V2 GGUF

由 TheBloke 开发

ARIA 70B V2 是一个基于 Llama 2 架构的大规模语言模型，支持法语和英语，专注于文本生成任务。

大型语言模型支持多种语言#多语言文本生成 #大模型推理 #教育辅助

下载量 1,100

发布时间 : 9/20/2023

模型简介

ARIA 70B V2 是一个 700 亿参数的大语言模型，基于 Meta 的 Llama 2 架构开发。该模型经过优化，能够生成高质量的文本内容，适用于多种自然语言处理任务。

模型特点

多语言支持

同时支持法语和英语的文本生成

大规模参数

700 亿参数的强大语言理解能力

安全生成

内置安全机制，避免生成有害或不适当内容

模型能力

文本生成

对话系统

内容创作

语言理解

使用案例

教育

语言学习助手

帮助学生学习法语和英语

提供准确的语言解释和示例

内容创作

文章写作

辅助创作者生成高质量文章

流畅、连贯的文本输出

🚀 ARIA 70B V2 - GGUF

本项目提供了 Faradaylab's ARIA 70B V2 的 GGUF 格式模型文件，可用于文本生成等任务，为用户提供了多种量化版本选择，以适应不同的硬件和使用场景。

🚀 快速开始

下载模型

可以通过以下几种方式下载 GGUF 文件：

自动下载客户端：LM Studio、LoLLMS Web UI、Faraday.dev 等客户端会自动提供可用模型列表供你选择下载。
text-generation-webui：在 Download Model 中输入模型仓库地址 TheBloke/ARIA-70B-V2-GGUF，并指定具体文件名（如 aria-70b-v2.Q4_K_M.gguf），然后点击 Download。
命令行：推荐使用 huggingface-hub Python 库进行下载。

pip3 install huggingface-hub
huggingface-cli download TheBloke/ARIA-70B-V2-GGUF aria-70b-v2.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

运行模型

llama.cpp 示例命令

./main -ngl 32 -m aria-70b-v2.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]"

-ngl 32：指定要卸载到 GPU 的层数，若没有 GPU 加速可移除该参数。
-c 4096：指定所需的序列长度。

在 text-generation-webui 中运行

具体说明请参考 text-generation-webui/docs/llama.cpp.md。

在 Python 代码中运行

可以使用 llama-cpp-python 或 ctransformers 库加载和运行 GGUF 模型。

from ctransformers import AutoModelForCausalLM

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/ARIA-70B-V2-GGUF", model_file="aria-70b-v2.Q4_K_M.gguf", model_type="llama", gpu_layers=50)

print(llm("AI is going to"))

✨ 主要特性

多格式支持：提供多种量化格式的 GGUF 文件，如 Q2_K、Q3_K、Q4_K 等，可根据不同的硬件和需求进行选择。
广泛兼容性：与 llama.cpp 及众多第三方 UI 和库兼容，方便用户在不同环境中使用。
扩展功能：支持 RoPE 缩放，可扩展上下文长度，处理更大的文件。

📦 安装指南

安装依赖库

pip3 install huggingface-hub

若要加速下载，可安装 hf_transfer：

pip3 install hf_transfer

并设置环境变量：

HF_HUB_ENABLE_HF_TRANSFER=1

下载模型文件

huggingface-cli download TheBloke/ARIA-70B-V2-GGUF aria-70b-v2.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

💻 使用示例

基础用法

from ctransformers import AutoModelForCausalLM

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/ARIA-70B-V2-GGUF", model_file="aria-70b-v2.Q4_K_M.gguf", model_type="llama", gpu_layers=50)

print(llm("AI is going to"))

高级用法

在命令行中使用 llama.cpp 运行模型，并进行参数调整：

./main -ngl 32 -m aria-70b-v2.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]"

📚 详细文档

模型信息

模型创建者：Faradaylab
原始模型：ARIA 70B V2

关于 GGUF

GGUF 是 llama.cpp 团队于 2023 年 8 月 21 日引入的新格式，用于替代不再受支持的 GGML 格式。以下是一些已知支持 GGUF 的客户端和库：

可用仓库

提示模板

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
{prompt}[/INST]

兼容性

这些量化的 GGUFv2 文件与 2023 年 8 月 27 日之后的 llama.cpp 版本兼容（提交记录 d0cee0d），也与许多第三方 UI 和库兼容。

量化方法说明

点击查看详情

新的量化方法包括：

GGML_TYPE_Q2_K：“type-1” 2 位量化，超级块包含 16 个块，每个块有 16 个权重。块的缩放和最小值用 4 位量化，最终每个权重有效使用 2.5625 位（bpw）。
GGML_TYPE_Q3_K：“type-0” 3 位量化，超级块包含 16 个块，每个块有 16 个权重。缩放用 6 位量化，最终使用 3.4375 bpw。
GGML_TYPE_Q4_K：“type-1” 4 位量化，超级块包含 8 个块，每个块有 32 个权重。缩放和最小值用 6 位量化，最终使用 4.5 bpw。
GGML_TYPE_Q5_K：“type-1” 5 位量化，与 GGML_TYPE_Q4_K 具有相同的超级块结构，最终使用 5.5 bpw。
GGML_TYPE_Q6_K：“type-0” 6 位量化，超级块有 16 个块，每个块有 16 个权重。缩放用 8 位量化，最终使用 6.5625 bpw。

请参考下面的“提供的文件”表格，了解哪些文件使用了哪些方法以及如何使用。

提供的文件

名称	量化方法	位数	大小	所需最大 RAM	使用场景
aria-70b-v2.Q2_K.gguf	Q2_K	2	29.28 GB	31.78 GB	最小，但质量损失显著，不建议用于大多数情况
aria-70b-v2.Q3_K_S.gguf	Q3_K_S	3	29.92 GB	32.42 GB	非常小，但质量损失高
aria-70b-v2.Q3_K_M.gguf	Q3_K_M	3	33.19 GB	35.69 GB	非常小，但质量损失高
aria-70b-v2.Q3_K_L.gguf	Q3_K_L	3	36.15 GB	38.65 GB	小，但质量损失较大
aria-70b-v2.Q4_0.gguf	Q4_0	4	38.87 GB	41.37 GB	旧版；小，但质量损失非常高，建议使用 Q3_K_M
aria-70b-v2.Q4_K_S.gguf	Q4_K_S	4	39.07 GB	41.57 GB	小，但质量损失更大
aria-70b-v2.Q4_K_M.gguf	Q4_K_M	4	41.42 GB	43.92 GB	中等，质量平衡，推荐使用
aria-70b-v2.Q5_0.gguf	Q5_0	5	47.46 GB	49.96 GB	旧版；中等，质量平衡，建议使用 Q4_K_M
aria-70b-v2.Q5_K_S.gguf	Q5_K_S	5	47.46 GB	49.96 GB	大，质量损失低，推荐使用
aria-70b-v2.Q5_K_M.gguf	Q5_K_M	5	48.75 GB	51.25 GB	大，质量损失非常低，推荐使用
aria-70b-v2.Q6_K.gguf	Q6_K	6	56.59 GB	59.09 GB	非常大，质量损失极低
aria-70b-v2.Q8_0.gguf	Q8_0	8	73.29 GB	75.79 GB	非常大，质量损失极低，但不建议使用

注意：上述 RAM 数字假设没有进行 GPU 卸载。如果将层卸载到 GPU，将减少 RAM 使用并使用 VRAM。

Q6_K 和 Q8_0 文件拆分及合并说明

由于 HF 不支持上传大于 50GB 的文件，因此 Q6_K 和 Q8_0 文件已拆分为多个文件。

点击查看 Q6_K 和 Q8_0 文件的合并说明

q6_K

请下载：

aria-70b-v2.Q6_K.gguf-split-a
aria-70b-v2.Q6_K.gguf-split-b

q8_0

请下载：

aria-70b-v2.Q8_0.gguf-split-a
aria-70b-v2.Q8_0.gguf-split-b

合并文件的方法如下：

Linux 和 macOS：

cat aria-70b-v2.Q6_K.gguf-split-* > aria-70b-v2.Q6_K.gguf && rm aria-70b-v2.Q6_K.gguf-split-*
cat aria-70b-v2.Q8_0.gguf-split-* > aria-70b-v2.Q8_0.gguf && rm aria-70b-v2.Q8_0.gguf-split-*

Windows 命令行：

COPY /B aria-70b-v2.Q6_K.gguf-split-a + aria-70b-v2.Q6_K.gguf-split-b aria-70b-v2.Q6_K.gguf
del aria-70b-v2.Q6_K.gguf-split-a aria-70b-v2.Q6_K.gguf-split-b

COPY /B aria-70b-v2.Q8_0.gguf-split-a + aria-70b-v2.Q8_0.gguf-split-b aria-70b-v2.Q8_0.gguf
del aria-70b-v2.Q8_0.gguf-split-a aria-70b-v2.Q8_0.gguf-split-b

🔧 技术细节

模型架构：ARIA 是一种自回归语言模型，采用优化的变压器架构。微调版本使用监督微调（SFT）和基于人类反馈的强化学习（RLHF）来使模型输出符合人类对有用性和安全性的偏好。
训练数据：在 50,000 个法语公开数据令牌上进行训练，预训练数据截止到 2022 年 9 月，部分微调数据更新至 2023 年 8 月。
RoPE 缩放：采用实验性的 RoPE 缩放方法，可将 ARIA 的上下文长度从 4,096 扩展到超过 6,000 个令牌，但默认未激活，需要在参数中添加一行代码来激活。