Mistral-7B-Instruct-Aya-101-GGUF開源對話模型

Home

Mistral 7B Instruct Aya 101 GGUF

Developed by MaziyarPanahi

這是一個基於Mistral-7B-Instruct-v0.2的多語言對話模型，支持101種語言，採用GGUF格式量化。

大型語言模型 Supports Multiple LanguagesOpen Source License:Apache-2.0 #多語言對話 #7B參數高效 #GGUF量化

Downloads 214

Release Time : 2/28/2024

Model Overview

該模型是Mistral-7B-Instruct-Aya-101的GGUF量化版本，專為多語言文本生成任務優化，特別適合對話式應用場景。

Model Features

多語言支持

支持101種語言的文本生成，特別適合多語言應用場景。

GGUF量化格式

提供2-8比特多種量化選項，便於在不同硬件上高效部署。

對話優化

基於Mistral-7B-Instruct微調，特別適合對話式交互場景。

Model Capabilities

多語言文本生成

對話式交互

指令跟隨

Use Cases

多語言應用

多語言客服機器人

構建支持多種語言的智能客服系統

可同時為不同語言的用戶提供自然流暢的對話體驗

教育應用

語言學習助手

幫助學習者練習多種語言的對話和寫作

提供自然流暢的語言練習環境

🚀 Mistral-7B-Instruct-Aya-101-GGUF

Mistral-7B-Instruct-Aya-101-GGUF 是基於 Mistral-7B-Instruct-Aya-101 模型的 GGUF 格式模型，可用於文本生成任務，支持多種語言和量化方式，能在不同客戶端和庫中使用。

🚀 快速開始

模型信息

模型創建者：MaziyarPanahi
原始模型：MaziyarPanahi/Mistral-7B-Instruct-Aya-101

模型描述

MaziyarPanahi/Mistral-7B-Instruct-Aya-101-GGUF 包含 MaziyarPanahi/Mistral-7B-Instruct-Aya-101 的 GGUF 格式模型文件。

✨ 主要特性

支持多種量化方式：包括 2-bit、3-bit、4-bit、5-bit、6-bit、8-bit 等量化方式。
多語言支持：支持如英語、中文、法語、德語等眾多語言。
多客戶端和庫支持：支持 llama.cpp、text-generation-webui、KoboldCpp 等多種客戶端和庫。

📦 安裝指南

安裝 huggingface-hub

pip3 install huggingface-hub

加速下載（可選）

pip3 install hf_transfer

下載模型文件

huggingface-cli download MaziyarPanahi/Mistral-7B-Instruct-Aya-101-GGUF Mistral-7B-Instruct-Aya-101-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

下載多個文件（可選）

huggingface-cli download [MaziyarPanahi/Mistral-7B-Instruct-Aya-101-GGUF](https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-Aya-101-GGUF) --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'

設置環境變量加速下載（可選）

HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MaziyarPanahi/Mistral-7B-Instruct-Aya-101-GGUF Mistral-7B-Instruct-Aya-101-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

💻 使用示例

基礎用法

llama.cpp 命令示例

./main -ngl 35 -m Mistral-7B-Instruct-Aya-101-GGUF.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"

Python 代碼示例（使用 llama-cpp-python）

from llama_cpp import Llama

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
  model_path="./Mistral-7B-Instruct-Aya-101-GGUF.Q4_K_M.gguf",  # Download the model file first
  n_ctx=32768,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
)

# Simple inference example
output = llm(
  "<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant", # Prompt
  max_tokens=512,  # Generate up to 512 tokens
  stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
  echo=True        # Whether to echo the prompt
)

# Chat Completion API

llm = Llama(model_path="./Mistral-7B-Instruct-Aya-101-GGUF.Q4_K_M.gguf", chat_format="llama-2")  # Set chat_format according to the model you are using
llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You are a story writing assistant."},
        {
            "role": "user",
            "content": "Write a story about llamas."
        }
    ]
)

高級用法

在 text-generation-webui 中使用

可參考 text-generation-webui/docs/04 ‐ Model Tab.md 中的說明。

與 LangChain 結合使用

📚 詳細文檔

關於 GGUF

GGUF 是 llama.cpp 團隊在 2023 年 8 月 21 日引入的新格式，它取代了不再受 llama.cpp 支持的 GGML 格式。

以下是已知支持 GGUF 的客戶端和庫的不完全列表：

llama.cpp。GGUF 的源項目，提供 CLI 和服務器選項。
text-generation-webui，最廣泛使用的 Web UI，具有許多功能和強大的擴展，支持 GPU 加速。
KoboldCpp，功能齊全的 Web UI，支持所有平臺和 GPU 架構的 GPU 加速，特別適合講故事。
GPT4All，一個免費開源的本地運行 GUI，支持 Windows、Linux 和 macOS，具有完整的 GPU 加速。
LM Studio，一個易於使用且功能強大的 Windows 和 macOS（Silicon）本地 GUI，支持 GPU 加速。截至 2023 年 11 月 27 日，Linux 版本處於測試階段。
LoLLMS Web UI，一個很棒的 Web UI，具有許多有趣和獨特的功能，包括一個完整的模型庫，便於模型選擇。
Faraday.dev，一個有吸引力且易於使用的基於角色的聊天 GUI，適用於 Windows 和 macOS（Silicon 和 Intel），支持 GPU 加速。
llama-cpp-python，一個支持 GPU 加速、LangChain 支持和 OpenAI 兼容 API 服務器的 Python 庫。
candle，一個注重性能的 Rust ML 框架，包括 GPU 支持和易用性。
ctransformers，一個支持 GPU 加速、LangChain 支持和 OpenAI 兼容 AI 服務器的 Python 庫。請注意，截至 2023 年 11 月 27 日，ctransformers 已經很長時間沒有更新，不支持許多最新的模型。

量化方法說明

新的量化方法如下：

GGML_TYPE_Q2_K - “type-1” 2-bit 量化，超級塊包含 16 個塊，每個塊有 16 個權重。塊的縮放和最小值用 4 位量化。最終每個權重有效使用 2.5625 位（bpw）。
GGML_TYPE_Q3_K - “type-0” 3-bit 量化，超級塊包含 16 個塊，每個塊有 16 個權重。縮放用 6 位量化。最終使用 3.4375 bpw。
GGML_TYPE_Q4_K - “type-1” 4-bit 量化，超級塊包含 8 個塊，每個塊有 32 個權重。縮放和最小值用 6 位量化。最終使用 4.5 bpw。
GGML_TYPE_Q5_K - “type-1” 5-bit 量化。與 GGML_TYPE_Q4_K 具有相同的超級塊結構，最終使用 5.5 bpw。
GGML_TYPE_Q6_K - “type-0” 6-bit 量化。超級塊有 16 個塊，每個塊有 16 個權重。縮放用 8 位量化。最終使用 6.5625 bpw。

下載 GGUF 文件

手動下載注意事項

幾乎不需要克隆整個倉庫！提供了多種不同的量化格式，大多數用戶只需要選擇並下載單個文件。

以下客戶端/庫將自動為您下載模型，並提供可用模型列表供您選擇：

LM Studio
LoLLMS Web UI
Faraday.dev

在 text-generation-webui 中下載

在“Download Model”下，輸入模型倉庫地址：MaziyarPanahi/Mistral-7B-Instruct-Aya-101-GGUF，並在其下方輸入要下載的特定文件名，例如：Mistral-7B-Instruct-Aya-101-GGUF.Q4_K_M.gguf。然後點擊“Download”。

在命令行下載（包括一次下載多個文件）

推薦使用 huggingface-hub Python 庫進行下載。

🔧 技術細節

運行 llama.cpp 命令時，確保使用的是 d0cee0d 或更高版本的 llama.cpp。

./main -ngl 35 -m Mistral-7B-Instruct-Aya-101-GGUF.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"

-ngl 32：將其更改為要卸載到 GPU 的層數。如果沒有 GPU 加速，請刪除該參數。
-c 32768：將其更改為所需的序列長度。對於擴展序列模型（如 8K、16K、32K），必要的 RoPE 縮放參數會從 GGUF 文件中讀取，並由 llama.cpp 自動設置。請注意，更長的序列長度需要更多的資源，因此可能需要減小該值。
如果要進行聊天式對話，請將 -p <PROMPT> 參數替換為 -i -ins。