rank_zephyr_7b_v1_full-GGUF開源文本排序模型

首頁

Rank Zephyr 7b V1 Full GGUF

由MaziyarPanahi開發

基於 Mistral-7B 的文本排序模型，提供多種量化版本，適用於高效推理。

大型語言模型英語開源協議:MIT #文本排序優化 #多比特量化 #長序列處理

下載量 708

發布時間 : 2/3/2024

模型概述

該模型是 castorini/rank_zephyr_7b_v1_full 的 GGUF 格式版本，專門用於文本排序任務，支持多種量化級別以優化性能。

模型特點

多種量化選項

提供 2 比特到 8 比特的多種量化版本，可根據需求平衡精度和性能。

高效推理

採用 GGUF 格式，優化了推理效率，適合在各種硬件上運行。

文本排序能力

專門針對文本排序任務優化，能夠高效處理相關任務。

模型能力

文本排序

高效推理

支持多種量化級別

使用案例

信息檢索

搜索結果排序

對搜索引擎返回的結果進行排序，提升結果相關性。

推薦系統

推薦內容排序

對推薦系統中的候選內容進行排序，優化用戶體驗。

🚀 MaziyarPanahi/rank_zephyr_7b_v1_full - GGUF

本項目提供了 castorini/rank_zephyr_7b_v1_full 模型的 GGUF 格式文件，助力文本排序任務的高效執行。

🚀 快速開始

模型信息

模型創建者：castorini
原始模型：castorini/rank_zephyr_7b_v1_full

模型標籤信息

屬性	詳情
模型類型	量化模型，包含 2 - 8 位量化、GGUF 格式、transformers 架構、safetensors 存儲等特性
訓練數據	未提及
基礎模型	mistralai/Mistral - 7B - v0.1、castorini/rank_zephyr_7b_v1_full
許可證	MIT、Apache - 2.0
推理狀態	不支持推理
管道標籤	文本排序
量化者	MaziyarPanahi

✨ 主要特性

多種量化支持：涵蓋 2 - 8 位量化方法，滿足不同場景的性能與精度需求。
GGUF 格式：新一代模型格式，替代 GGML，獲眾多客戶端和庫支持。
多庫兼容：與 llama.cpp、text - generation - webui 等多種工具和庫兼容。

📦 安裝指南

安裝依賴庫

若要使用 Python 調用模型，需安裝相應庫：

# Base ctransformers with no GPU acceleration
pip install llama-cpp-python
# With NVidia CUDA acceleration
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
# Or with OpenBLAS acceleration
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
# Or with CLBLast acceleration
CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
# Or with AMD ROCm GPU acceleration (Linux only)
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
# Or with Metal GPU acceleration for macOS systems only
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python

# In windows, to set the variables CMAKE_ARGS in PowerShell, follow this format; eg for NVidia CUDA:
$env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on"
pip install llama-cpp-python

下載 GGUF 文件

手動下載注意事項

不建議克隆整個倉庫，可按需選擇量化格式文件下載。

自動下載工具

LM Studio、LoLLMS Web UI、Faraday.dev 等工具可自動下載模型。

在 text - generation - webui 中下載

在 Download Model 處輸入模型倉庫地址 [MaziyarPanahi/rank_zephyr_7b_v1_full - GGUF](https://huggingface.co/MaziyarPanahi/rank_zephyr_7b_v1_full - GGUF)，並指定文件名（如 rank_zephyr_7b_v1_full - GGUF.Q4_K_M.gguf），點擊 Download。

命令行下載

使用 huggingface - hub 庫下載單個文件：

pip3 install huggingface-hub
huggingface-cli download MaziyarPanahi/rank_zephyr_7b_v1_full-GGUF rank_zephyr_7b_v1_full-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

下載多個文件：

huggingface-cli download [MaziyarPanahi/rank_zephyr_7b_v1_full-GGUF](https://huggingface.co/MaziyarPanahi/rank_zephyr_7b_v1_full-GGUF) --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'

加速下載：

pip3 install hf_transfer
HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MaziyarPanahi/rank_zephyr_7b_v1_full-GGUF rank_zephyr_7b_v1_full-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

Windows 用戶可先運行 set HF_HUB_ENABLE_HF_TRANSFER = 1 再執行下載命令。

💻 使用示例

基礎用法

llama.cpp 命令示例

確保使用 d0cee0d 或更新版本的 llama.cpp：

./main -ngl 35 -m rank_zephyr_7b_v1_full-GGUF.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant"

參數說明：

-ngl：指定卸載到 GPU 的層數，無 GPU 加速可移除。
-c：設置序列長度，更長序列需更多資源，可按需調整。
-p：設置提示語，若要進行對話式交互，可替換為 -i -ins。

Python 代碼示例

使用 llama - cpp - python 庫加載模型：

from llama_cpp import Llama

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
  model_path="./rank_zephyr_7b_v1_full-GGUF.Q4_K_M.gguf",  # Download the model file first
  n_ctx=32768,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=8,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=35         # The number of layers to offload to GPU, if you have GPU acceleration available
)

# Simple inference example
output = llm(
  "<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant", # Prompt
  max_tokens=512,  # Generate up to 512 tokens
  stop=["</s>"],   # Example stop token - not necessarily correct for this specific model! Please check before using.
  echo=True        # Whether to echo the prompt
)

# Chat Completion API

llm = Llama(model_path="./rank_zephyr_7b_v1_full-GGUF.Q4_K_M.gguf", chat_format="llama-2")  # Set chat_format according to the model you are using
llm.create_chat_completion(
    messages = [
        {"role": "system", "content": "You are a story writing assistant."},
        {
            "role": "user",
            "content": "Write a story about llamas."
        }
    ]
)

高級用法

使用 LangChain 集成

📚 詳細文檔

GGUF 格式說明

GGUF 是 llama.cpp 團隊於 2023 年 8 月 21 日推出的新格式，用於替代不再受支持的 GGML。以下是已知支持 GGUF 的部分客戶端和庫：

llama.cpp：GGUF 源項目，提供 CLI 和服務器選項。
[text - generation - webui](https://github.com/oobabooga/text - generation - webui)：廣泛使用的 Web UI，功能豐富，支持 GPU 加速。
KoboldCpp：全功能 Web UI，跨平臺支持 GPU 加速，適合故事創作。
GPT4All：免費開源的本地運行 GUI，支持 Windows、Linux 和 macOS，全 GPU 加速。
LM Studio：適用於 Windows 和 macOS（Silicon）的易用強大本地 GUI，支持 GPU 加速，Linux 版處於測試階段。
[LoLLMS Web UI](https://github.com/ParisNeo/lollms - webui)：功能獨特的 Web UI，含完整模型庫，便於模型選擇。
Faraday.dev：美觀易用的基於角色的聊天 GUI，支持 Windows 和 macOS（Silicon 和 Intel），支持 GPU 加速。
[llama - cpp - python](https://github.com/abetlen/llama - cpp - python)：支持 GPU 加速、LangChain 和 OpenAI 兼容 API 服務器的 Python 庫。
candle：注重性能的 Rust ML 框架，支持 GPU，易於使用。
ctransformers：支持 GPU 加速、LangChain 和 OpenAI 兼容 AI 服務器的 Python 庫，但截至 2023 年 11 月 27 日，更新不及時，不支持部分新模型。

量化方法說明

點擊查看詳情

新的量化方法如下：

GGML_TYPE_Q2_K - “type - 1” 2 位量化，超級塊含 16 個塊，每個塊 16 個權重。塊尺度和最小值用 4 位量化，實際每位權重使用 2.5625 位（bpw）。
GGML_TYPE_Q3_K - “type - 0” 3 位量化，超級塊含 16 個塊，每個塊 16 個權重。尺度用 6 位量化，最終每位權重使用 3.4375 bpw。
GGML_TYPE_Q4_K - “type - 1” 4 位量化，超級塊含 8 個塊，每個塊 32 個權重。尺度和最小值用 6 位量化，最終每位權重使用 4.5 bpw。
GGML_TYPE_Q5_K - “type - 1” 5 位量化，與 GGML_TYPE_Q4_K 超級塊結構相同，每位權重使用 5.5 bpw。
GGML_TYPE_Q6_K - “type - 0” 6 位量化，超級塊含 16 個塊，每個塊 16 個權重。尺度用 8 位量化，最終每位權重使用 6.5625 bpw。

在 text - generation - webui 中運行

更多說明可參考 [text - generation - webui 文檔](https://github.com/oobabooga/text - generation - webui/blob/main/docs/04%20%E2%80%90%20Model%20Tab.md#llamacpp)。