NVIDIA Llama-3_1-Nemotron-Ultra-253B-v1-GGUF開源模型

首頁

Nvidia Llama 3 1 Nemotron Ultra 253B V1 GGUF

由bartowski開發

這是NVIDIA Llama-3_1-Nemotron-Ultra-253B-v1模型的量化版本，使用llama.cpp進行量化，支持多種量化類型，適用於不同硬件環境。

大型語言模型英語開源協議:其他 #超大規模參數 #多輪對話優化 #高精度量化

下載量 1,607

發布時間 : 4/8/2025

模型概述

基於NVIDIA Llama-3_1-Nemotron-Ultra-253B-v1模型的量化版本，通過llama.cpp工具進行優化，提供多種量化選項以適應不同計算資源需求。

模型特點

多種量化選項

提供從Q8_0到IQ2_M等多種量化類型，滿足不同性能和存儲需求。

高性能推理

優化後的模型在保持高質量輸出的同時，顯著降低計算資源需求。

廣泛兼容性

支持在LM Studio、llama.cpp及基於llama.cpp的項目中運行。

模型能力

文本生成

自然語言處理

對話系統

使用案例

文本生成

對話系統

用於構建智能對話助手，提供自然流暢的交互體驗。

內容創作

輔助生成文章、故事、詩歌等創意內容。

研究與開發

模型優化研究

用於研究大型語言模型的量化技術和性能優化。

🚀 Llama-3_1-Nemotron-Ultra-253B-v1的Llamacpp imatrix量化版本

本項目是對nvidia的Llama-3_1-Nemotron-Ultra-253B-v1模型進行量化處理，旨在提升模型在不同硬件上的運行效率和性能。通過使用特定的量化工具和方法，生成了多種不同量化類型的模型文件，以滿足不同用戶的需求。

🚀 快速開始

運行環境

可以在 LM Studio 中運行這些量化模型，也可以直接使用 llama.cpp 或基於 llama.cpp 的其他項目來運行。

下載模型

使用 huggingface-cli 下載

首先，確保你已經安裝了 hugginface-cli：

pip install -U "huggingface_hub[cli]"

然後，你可以指定要下載的特定文件：

huggingface-cli download bartowski/nvidia_Llama-3_1-Nemotron-Ultra-253B-v1-GGUF --include "nvidia_Llama-3_1-Nemotron-Ultra-253B-v1-Q4_K_M.gguf" --local-dir ./

如果模型文件大於 50GB，它會被分割成多個文件。若要將它們全部下載到本地文件夾，請運行：

huggingface-cli download bartowski/nvidia_Llama-3_1-Nemotron-Ultra-253B-v1-GGUF --include "nvidia_Llama-3_1-Nemotron-Ultra-253B-v1-Q8_0/*" --local-dir ./

你可以指定一個新的本地目錄（如 nvidia_Llama-3_1-Nemotron-Ultra-253B-v1-Q8_0），也可以將它們全部下載到當前目錄（./）。

✨ 主要特性

多種量化類型：提供了豐富的量化類型，如 Q8_0、Q6_K、Q5_K_M 等，以滿足不同的性能和質量需求。
優化性能：部分量化模型採用了特殊的處理方式，如將嵌入和輸出權重量化為 Q8_0，以提升性能。
在線重打包：支持在線重打包權重，可根據硬件情況自動優化性能。

📦 安裝指南

安裝 huggingface-cli

pip install -U "huggingface_hub[cli]"

💻 使用示例

基礎用法

# 此處可根據實際使用情況添加基礎使用代碼示例

高級用法

# 此處可根據實際使用情況添加高級使用代碼示例

📚 詳細文檔

提示格式

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>

{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

下載文件

文件名	量化類型	文件大小	是否分割	描述
Llama-3_1-Nemotron-Ultra-253B-v1-Q8_0.gguf	Q8_0	269.25GB	true	極高質量，通常不需要，但為最大可用量化。
Llama-3_1-Nemotron-Ultra-253B-v1-Q6_K.gguf	Q6_K	207.88GB	true	非常高質量，接近完美，推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-Q5_K_M.gguf	Q5_K_M	178.55GB	true	高質量，推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-Q5_K_S.gguf	Q5_K_S	174.51GB	true	高質量，推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-Q4_1.gguf	Q4_1	158.80GB	true	舊格式，性能與 Q4_K_S 相似，但在 Apple silicon 上的每瓦令牌數有所提高。
Llama-3_1-Nemotron-Ultra-253B-v1-Q4_K_L.gguf	Q4_K_L	152.50GB	true	使用 Q8_0 進行嵌入和輸出權重。質量良好，推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-Q4_K_M.gguf	Q4_K_M	150.94GB	true	質量良好，適用於大多數用例的默認大小，推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-Q4_K_S.gguf	Q4_K_S	144.38GB	true	質量略低，但節省更多空間，推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-Q4_0.gguf	Q4_0	143.74GB	true	舊格式，提供在線重打包以用於 ARM 和 AVX CPU 推理。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ4_NL.gguf	IQ4_NL	143.23GB	true	與 IQ4_XS 相似，但略大。提供在線重打包以用於 ARM CPU 推理。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ4_XS.gguf	IQ4_XS	135.41GB	true	質量不錯，比 Q4_K_S 小，性能相似，推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-Q3_K_XL.gguf	Q3_K_XL	134.54GB	true	使用 Q8_0 進行嵌入和輸出權重。質量較低但可用，適用於低內存情況。
Llama-3_1-Nemotron-Ultra-253B-v1-Q3_K_L.gguf	Q3_K_L	132.70GB	true	質量較低但可用，適用於低內存情況。
Llama-3_1-Nemotron-Ultra-253B-v1-Q3_K_M.gguf	Q3_K_M	121.88GB	true	低質量。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ3_M.gguf	IQ3_M	113.50GB	true	中低質量，新方法，性能與 Q3_K_M 相當。
Llama-3_1-Nemotron-Ultra-253B-v1-Q3_K_S.gguf	Q3_K_S	109.72GB	true	低質量，不推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ3_XS.gguf	IQ3_XS	103.32GB	true	質量較低，新方法，性能不錯，略優於 Q3_K_S。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ3_XXS.gguf	IQ3_XXS	97.62GB	true	質量較低，新方法，性能不錯，與 Q3 量化相當。
Llama-3_1-Nemotron-Ultra-253B-v1-Q2_K_L.gguf	Q2_K_L	95.45GB	true	使用 Q8_0 進行嵌入和輸出權重。質量非常低，但意外地可用。
Llama-3_1-Nemotron-Ultra-253B-v1-Q2_K.gguf	Q2_K	93.40GB	true	質量非常低，但意外地可用。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ2_M.gguf	IQ2_M	85.44GB	true	質量相對較低，採用了最先進的技術，意外地可用。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ2_S.gguf	IQ2_S	78.55GB	true	質量低，採用了最先進的技術，可用。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ2_XS.gguf	IQ2_XS	74.88GB	true	質量低，採用了最先進的技術，可用。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ2_XXS.gguf	IQ2_XXS	67.44GB	true	質量非常低，採用了最先進的技術，可用。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ1_M.gguf	IQ1_M	58.82GB	true	質量極低，不推薦。
Llama-3_1-Nemotron-Ultra-253B-v1-IQ1_S.gguf	IQ1_S	53.65GB	true	質量極低，不推薦。