LlongOrca-7B-16K-GGUF開源大語言模型 - 支持長文本生成任務

首頁

Llongorca 7B 16K GGUF

由TheBloke開發

LlongOrca 7B 16K 是由 Open-Orca 開發的一個基於 Llama 架構的大語言模型，支持 16K 上下文長度，適用於文本生成任務。

大型語言模型英語#長文本處理 #高效推理 #多輪對話

下載量 1,304

發布時間 : 9/5/2023

模型概述

LlongOrca 7B 16K 是一個基於 Llama 架構的大語言模型，支持 16K 上下文長度，適用於文本生成任務。模型採用 ChatML 提示模板，支持對話式交互。

模型特點

16K 上下文長度

支持長達 16K 的上下文長度，適合處理長文本任務。

ChatML 提示模板

使用 ChatML 提示模板，支持對話式交互，適合聊天機器人等應用。

多種量化選項

提供多種量化選項（如 Q2_K、Q3_K、Q4_K 等），適合不同硬件需求。

模型能力

文本生成

對話式交互

長文本處理

使用案例

聊天機器人

智能客服

用於構建智能客服系統，支持多輪對話和長文本處理。

內容生成

文章生成

用於生成長篇文章或報告，支持 16K 上下文長度。

🚀 LlongOrca 7B 16K - GGUF

本項目提供了 Open-Orca的LlongOrca 7B 16K 模型的GGUF格式文件，方便用戶進行文本生成任務。

聊天與支持：TheBloke的Discord服務器

想要貢獻？TheBloke的Patreon頁面

TheBloke的大語言模型工作得到了 andreessen horowitz (a16z) 的慷慨資助

🚀 快速開始

本項目提供了多種格式的模型文件，適用於不同的使用場景和硬件環境。你可以根據自己的需求選擇合適的模型文件進行下載和使用。

✨ 主要特性

多種量化格式：提供了2、3、4、5、6和8位的GGUF模型，適用於CPU+GPU推理。
廣泛的兼容性：與llama.cpp、text-generation-webui、KoboldCpp等多種客戶端和庫兼容。
高性能：GGUF格式具有更好的分詞效果和對特殊標記的支持，性能優於GGML格式。

📦 安裝指南

下載GGUF文件

自動下載：LM Studio、LoLLMS Web UI、Faraday.dev等客戶端/庫會自動下載模型，並提供可用模型列表供你選擇。
手動下載：不建議克隆整個倉庫，因為提供了多種不同的量化格式，大多數用戶只需要選擇並下載單個文件。

在 `text-generation-webui` 中下載

在“Download Model”下，輸入模型倉庫地址 TheBloke/LlongOrca-7B-16K-GGUF，並在下方輸入要下載的具體文件名，如 llongorca-7b-16k.q4_K_M.gguf，然後點擊“Download”。

在命令行下載

推薦使用 huggingface-hub Python庫：

pip3 install huggingface-hub>=0.17.1

然後使用以下命令將單個模型文件高速下載到當前目錄：

huggingface-cli download TheBloke/LlongOrca-7B-16K-GGUF llongorca-7b-16k.q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

你還可以使用通配符一次性下載多個文件：

huggingface-cli download TheBloke/LlongOrca-7B-16K-GGUF --local-dir . --local-dir-use-symlinks False --include='*Q4_K*gguf'

要在高速連接（1Gbit/s或更高）上加速下載，請安裝 hf_transfer：

pip3 install hf_transfer

並將環境變量 HF_HUB_ENABLE_HF_TRANSFER 設置為 1：

HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/LlongOrca-7B-16K-GGUF llongorca-7b-16k.q4_K_M.gguf --local-dir . --local-dir-use-symlinks False

Windows命令行用戶：在運行下載命令前使用 set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1。

💻 使用示例

示例 `llama.cpp` 命令

確保你使用的是 d0cee0d36d5be95a0d9088b674dbb27354107221 或更高版本的 llama.cpp：

./main -ngl 32 -m llongorca-7b-16k.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"

-ngl 32：將其更改為要卸載到GPU的層數。如果沒有GPU加速，請刪除此參數。
-c 4096：將其更改為所需的序列長度。對於擴展序列模型（如8K、16K、32K），必要的RoPE縮放參數會從GGUF文件中讀取，並由llama.cpp自動設置。

如果你想進行聊天式對話，將 -p <PROMPT> 參數替換為 -i -ins。

其他參數的使用方法請參考 llama.cpp文檔。

在 `text-generation-webui` 中運行

更多說明請參考 text-generation-webui/docs/llama.cpp.md。

從Python代碼運行

你可以使用 llama-cpp-python 或 ctransformers 庫從Python中使用GGUF模型。

使用 `ctransformers` 從Python加載此模型

首先安裝包：

# 無GPU加速的基礎ctransformers
pip install ctransformers>=0.2.24
# 或使用CUDA GPU加速
pip install ctransformers[cuda]>=0.2.24
# 或使用ROCm GPU加速
CT_HIPBLAS=1 pip install ctransformers>=0.2.24 --no-binary ctransformers
# 或為macOS系統使用Metal GPU加速
CT_METAL=1 pip install ctransformers>=0.2.24 --no-binary ctransformers

加載GGUF模型的簡單示例代碼：

from ctransformers import AutoModelForCausalLM

# 將gpu_layers設置為要卸載到GPU的層數。如果你的系統沒有GPU加速，請將其設置為0。
llm = AutoModelForCausalLM.from_pretrained("TheBloke/LlongOrca-7B-16K-GGUF", model_file="llongorca-7b-16k.q4_K_M.gguf", model_type="llama", gpu_layers=50)

print(llm("AI is going to"))

與LangChain一起使用

以下是使用 llama-cpp-python 或 ctransformers 與LangChain的指南：

📚 詳細文檔

關於GGUF

GGUF是llama.cpp團隊在2023年8月21日引入的一種新格式，它取代了不再受llama.cpp支持的GGML格式。GGUF具有許多優於GGML的優點，如更好的分詞效果和對特殊標記的支持，還支持元數據，並且設計為可擴展的。

以下是已知支持GGUF的客戶端和庫列表：

llama.cpp：GGUF的源項目，提供CLI和服務器選項。
text-generation-webui：最廣泛使用的Web UI，具有許多功能和強大的擴展，支持GPU加速。
KoboldCpp：功能齊全的Web UI，支持所有平臺和GPU架構的GPU加速，特別適合講故事。
LM Studio：適用於Windows和macOS（Silicon）的易於使用且功能強大的本地GUI，支持GPU加速。
LoLLMS Web UI：一個很棒的Web UI，具有許多有趣和獨特的功能，包括一個完整的模型庫，方便選擇模型。
Faraday.dev：一個有吸引力且易於使用的基於角色的聊天GUI，適用於Windows和macOS（Silicon和Intel），支持GPU加速。
ctransformers：一個Python庫，支持GPU加速、LangChain集成和OpenAI兼容的AI服務器。
llama-cpp-python：一個Python庫，支持GPU加速、LangChain集成和OpenAI兼容的API服務器。
candle：一個Rust機器學習框架，注重性能，包括GPU支持和易用性。

可用的倉庫

提示模板：ChatML

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

兼容性

這些量化的GGUFv2文件與2023年8月27日之後的llama.cpp兼容，對應提交版本為 d0cee0d36d5be95a0d9088b674dbb27354107221。

它們還與許多第三方UI和庫兼容，請參閱本README頂部的列表。

量化方法說明

點擊查看詳情

新的量化方法如下：

GGML_TYPE_Q2_K：“type-1” 2位量化，超級塊包含16個塊，每個塊有16個權重。塊的縮放因子和最小值用4位量化，最終每個權重有效使用2.5625位（bpw）。
GGML_TYPE_Q3_K：“type-0” 3位量化，超級塊包含16個塊，每個塊有16個權重。縮放因子用6位量化，最終使用3.4375 bpw。
GGML_TYPE_Q4_K：“type-1” 4位量化，超級塊包含8個塊，每個塊有32個權重。縮放因子和最小值用6位量化，最終使用4.5 bpw。
GGML_TYPE_Q5_K：“type-1” 5位量化，與GGML_TYPE_Q4_K具有相同的超級塊結構，最終使用5.5 bpw。
GGML_TYPE_Q6_K：“type-0” 6位量化，超級塊包含16個塊，每個塊有16個權重。縮放因子用8位量化，最終使用6.5625 bpw。

請參考下面的“提供的文件”表，瞭解哪些文件使用了哪些方法以及如何使用。

提供的文件

名稱	量化方法	位數	大小	所需最大內存	使用場景
llongorca-7b-16k.Q2_K.gguf	Q2_K	2	2.83 GB	5.33 GB	最小，但質量損失顯著，不建議用於大多數場景
llongorca-7b-16k.Q3_K_S.gguf	Q3_K_S	3	2.95 GB	5.45 GB	非常小，但質量損失高
llongorca-7b-16k.Q3_K_M.gguf	Q3_K_M	3	3.30 GB	5.80 GB	非常小，但質量損失高
llongorca-7b-16k.Q3_K_L.gguf	Q3_K_L	3	3.60 GB	6.10 GB	小，但質量損失較大
llongorca-7b-16k.Q4_0.gguf	Q4_0	4	3.83 GB	6.33 GB	舊版本；小，但質量損失非常高，建議使用Q3_K_M
llongorca-7b-16k.Q4_K_S.gguf	Q4_K_S	4	3.86 GB	6.36 GB	小，但質量損失較大
llongorca-7b-16k.Q4_K_M.gguf	Q4_K_M	4	4.08 GB	6.58 GB	中等，質量平衡，推薦使用
llongorca-7b-16k.Q5_0.gguf	Q5_0	5	4.65 GB	7.15 GB	舊版本；中等，質量平衡，建議使用Q4_K_M
llongorca-7b-16k.Q5_K_S.gguf	Q5_K_S	5	4.65 GB	7.15 GB	大，質量損失低，推薦使用
llongorca-7b-16k.Q5_K_M.gguf	Q5_K_M	5	4.78 GB	7.28 GB	大，質量損失非常低，推薦使用
llongorca-7b-16k.Q6_K.gguf	Q6_K	6	5.53 GB	8.03 GB	非常大，質量損失極低
llongorca-7b-16k.Q8_0.gguf	Q8_0	8	7.16 GB	9.66 GB	非常大，質量損失極低，不建議使用

注意：上述內存數字假設沒有進行GPU卸載。如果將層卸載到GPU，將減少內存使用並使用顯存。

🔧 技術細節

原始模型信息

模型創建者：Open-Orca
基礎模型：Open-Orca/LlongOrca-7B-16k
模型類型：llama
訓練數據：使用了來自OpenOrca數據集的經過篩選的GPT-4增強數據的大部分，該數據集旨在重現Orca研究論文的數據集。
訓練參數：使用8個A6000-48GB（第一代）GPU訓練37小時，在一次訓練運行中對數據集完成4個完整的微調週期，成本約為200美元。Axolotl訓練參數可在 configs/oo7b.yml 中找到，訓練期間使用了Axolotl的 packing-attn 分支。

評估結果

AGIEval性能：與基礎Llama2-7B和 Llongma2-7b-16k 相比，在英語寫作性能上有近3倍的顯著提升，證明了在現有模型上堆疊OpenOrca數據集訓練的好處。
BigBench-Hard性能：與基礎Llama2-7B和 Llongma2-7b-16k 相比，有明顯提升，證明了在現有模型上堆疊OpenOrca數據集訓練的好處。
HuggingFaceH4開放大語言模型排行榜性能：預計在發佈時在所有7B模型中排名第4，在長上下文7B模型中排名第1，性能達到第一名的99%以上。

提示模板

使用 OpenAI的聊天標記語言（ChatML）格式，並添加了 <|im_start|> 和 <|im_end|> 標記以支持該格式。

示例提示交換

<|im_start|>system
You are LlongOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!
<|im_end|>
<|im_start|>user
How are you<|im_end|>
<|im_start|>assistant
I am doing well!<|im_end|>
<|im_start|>user
How are you now?<|im_end|>

📄 許可證

本項目使用Llama2許可證。

💬 Discord

如需進一步支持，以及討論這些模型和人工智能相關話題，請加入我們的 TheBloke AI的Discord服務器。

🙏 感謝與貢獻

感謝 chirper.ai 團隊！感謝來自 gpus.llm-utils.org 的Clay！

很多人詢問是否可以貢獻。我喜歡提供模型並幫助他人，也希望能花更多時間做這些事，以及開展新的項目，如微調/訓練。

如果你有能力且願意貢獻，我將不勝感激，這將幫助我繼續提供更多模型，並開展新的人工智能項目。

捐贈者將在所有人工智能/大語言模型/模型問題和請求上獲得優先支持，訪問私人Discord房間，以及其他福利。

Patreon: https://patreon.com/TheBlokeAI
Ko-Fi: https://ko-fi.com/TheBlokeAI

特別感謝：Aemon Algiz。

Patreon特別提及：Alicia Loh, Stephen Murray, K, Ajan Kanaga, RoA, Magnesian, Deo Leter, Olakabola, Eugene Pentland, zynix, Deep Realms, Raymond Fosdick, Elijah Stavena, Iucharbius, Erik Bjäreholt, Luis Javier Navarrete Lozano, Nicholas, theTransient, John Detwiler, alfie_i, knownsqashed, Mano Prime, Willem Michiel, Enrico Ros, LangChain4j, OG, Michael Dempsey, Pierre Kircher, Pedro Madruga, James Bentley, Thomas Belote, Luke @flexchar, Leonard Tan, Johann-Peter Hartmann, Illia Dulskyi, Fen Risland, Chadd, S_X, Jeff Scroggin, Ken Nordquist, Sean Connelly, Artur Olbinski, Swaroop Kallakuri, Jack West, Ai Maven, David Ziegler, Russ Johnson, transmissions 11, John Villwock, Alps Aficionado, Clay Pascal, Viktor Bowallius, Subspace Studios, Rainer Wilmers, Trenton Dambrowitz, vamX, Michael Levine, Ï§ÄÍµê ÍπÄ, Brandon Frisco, Kalila, Trailburnt, Randy H, Talal Aujan, Nathan Dryer, Vadim, ÈòøÊòé, ReadyPlayerEmma, Tiffany J. Kim, George Stoitzev, Spencer Kim, Jerry Meng, Gabriel Tamborski, Cory Kujawski, Jeffrey Morgan, Spiking Neurons AB, Edmond Seymore, Alexandros Triantafyllidis, Lone Striker, Cap'n Zoog, Nikolai Manek, danny, ya boyyy, Derek Yates, usrbinkat, Mandus, TL, Nathan LeClaire, subjectnull, Imad Khwaja, webtim, Raven Klaugh, Asp the Wyvern, Gabriel Puliatti, Caitlyn Gatomon, Joseph William Delisle, Jonathan Leane, Luke Pendergrass, SuperWojo, Sebastain Graf, Will Dee, Fred von Graf, Andrey, Dan Guido, Daniel P. Andersen, Nitin Borwankar, Elle, Vitor Caleffi, biorpg, jjj, NimbleBox.ai, Pieter, Matthew Berman, terasurfer, Michael Davis, Alex, Stanislav Ovsiannikov

感謝所有慷慨的贊助者和捐贈者！再次感謝a16z的慷慨資助。

📖 引用

@software{lian2023llongorca7b,
  title = {LlongOrca7B: Llama2-7B Model Instruct-tuned for Long Context on Filtered OpenOrcaV1 GPT-4 Dataset},
  author = {Wing Lian and Bleys Goodson and Guan Wang and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
  howpublished = {\url{https://https://huggingface.co/Open-Orca/LlongOrca-7B-16k},
}
@software{openchat,
  title = {{OpenChat: Advancing Open-source Language Models with Imperfect Data}},
  author = {Wang, Guan and Cheng, Sijie and Yu, Qiying and Liu, Changling},
  doi = {10.5281/zenodo.8105775},
  url = {https://github.com/imoneoi/openchat},
  version = {pre-release},
  year = {2023},
  month = {7},
}
@misc{mukherjee2023orca,
      title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4},
      author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
      year={2023},
      eprint={2306.02707},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@misc{longpre2023flan,
      title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning},
      author={Shayne Longpre and Le Hou and Tu Vu and Albert Webson and Hyung Won Chung and Yi Tay and Denny Zhou and Quoc V. Le and Barret Zoph and Jason Wei and Adam Roberts},
      year={2023},
      eprint={2301.13688},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}
@misc{touvron2023llama,
    title={Llama 2: Open Foundation and Fine-Tuned Chat Models},
    author={Hugo Touvron and Louis Martin and Kevin Stone and Peter Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov and Soumya Batra and Prajjwal Bhargava and Shruti Bhosale and Dan Bikel and Lukas Blecher and Cristian Canton Ferrer and Moya Chen and Guillem Cucurull and David Esiobu and Jude Fernandes and Jeremy Fu and Wenyin Fu and Brian Fuller and Cynthia Gao and Vedanuj Goswami and Naman Goyal and Anthony Hartshorn and Saghar Hosseini and Rui Hou and Hakan Inan and Marcin Kardas and Viktor Kerkez and Madian Khabsa and Isabel Kloumann and Artem Korenev and Punit Singh Koura and Marie-Anne Lachaux and Thibaut Lavril and Jenya Lee and Diana Liskovich and Yinghai Lu and Yuning Mao and Xavier Martinet and Todor Mihaylov and Pushkar Mishra and Igor Molybog and Yixin Nie and Andrew Poulton and Jeremy Reizenstein and Rashi Rungta and Kalyan Saladi and Alan Schelten and Ruan Silva and Eric Michael Smith and Ranjan Subramanian and Xiaoqing Ellen Tan and Binh Tang and Ross Taylor and Adina Williams and Jian Xiang Kuan and Puxin Xu and Zheng Yan and Iliyan Zarov and Yuchen Zhang and Angela Fan and Melanie Kambadur and Sharan Narang and Aurelien Rodriguez and Robert Stojnic and Sergey Edunov and Thomas Scialom},
    year={2023},
    eprint={2307.09288},
    archivePrefix={arXiv},
}