Granite-3.2-2B-Instruct開源AI模型 - 免費部署助力思維推理任務

Granite 3.2 2b Instruct GGUF

由Mungert開發

Granite-3.2-2B-Instruct是一個20億參數的長上下文AI模型，專為思維推理能力微調。基於Granite-3.1-2B-Instruct構建，通過混合使用寬鬆許可的開源數據集和內部生成的合成數據訓練，旨在提升推理任務表現。

大型語言模型開源協議:Apache-2.0 #長上下文推理 #多語言指令 #商業AI助手

下載量 754

發布時間 : 3/18/2025

模型概述

該模型設計用於處理通用指令跟隨任務，可集成到包括商業應用在內的各種AI助手中。支持對其思維能力的可控性，確保僅在需要時應用。

模型特點

長上下文支持

支持處理長文檔/會議摘要、長文檔問答等長上下文任務。

多語言能力

支持12種語言，包括英語、中文、日語等，並可針對其他語言進行微調。

思維推理優化

專為思維推理能力微調，支持可控的思維能力應用。

商業友好許可

採用Apache 2.0許可證，適合商業應用集成。

模型能力

思維推理

摘要生成

文本分類

文本提取

問答

檢索增強生成(RAG)

代碼相關任務

函數調用任務

多語言對話

長上下文處理

使用案例

商業助手

智能客服

集成到商業客服系統中，提供多語言客戶支持。

提升客服效率，降低人力成本。

會議紀要生成

自動生成會議摘要和行動項。

節省會議記錄時間，提高工作效率。

開發者工具

代碼輔助

幫助開發者理解、生成和優化代碼。

提高開發效率，減少編碼錯誤。

內容創作

多語言內容生成

生成不同語言的營銷文案、產品描述等內容。

簡化多語言內容創作流程。

🚀 Granite-3.2-2B-Instruct GGUF模型

Granite-3.2-2B-Instruct是一款經過微調的AI模型，擁有20億參數和長上下文處理能力。它基於Granite-3.1-2B-Instruct構建，使用了開源數據集和內部合成數據進行訓練，可靈活控制思維能力，適用於多種指令跟隨任務。

🚀 快速開始

安裝依賴庫

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

代碼示例

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch

model_path="ibm-granite/granite-3.2-2b-instruct"
device="cuda"
model = AutoModelForCausalLM.from_pretrained(
        model_path,
        device_map=device,
        torch_dtype=torch.bfloat16,
    )
tokenizer = AutoTokenizer.from_pretrained(
        model_path
)

conv = [{"role": "user", "content":"You have 10 liters of a 30% acid solution. How many liters of a 70% acid solution must be added to achieve a 50% acid mixture?"}]

input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=True, return_dict=True, add_generation_prompt=True).to(device)

set_seed(42)
output = model.generate(
    **input_ids,
    max_new_tokens=8192,
)

prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
print(prediction)

✨ 主要特性

思維能力可控：可根據需求控制模型的思維能力，僅在必要時啟用。
多語言支持：支持英語、德語、西班牙語、法語、日語、葡萄牙語、阿拉伯語、捷克語、意大利語、韓語、荷蘭語和中文等多種語言。
廣泛的任務適用性：適用於總結、文本分類、文本提取、問答、檢索增強生成（RAG）、代碼相關任務、函數調用任務、多語言對話和長上下文任務等。

📦 安裝指南

安裝以下庫：

pip install torch torchvision torchaudio
pip install accelerate
pip install transformers

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
import torch

model_path="ibm-granite/granite-3.2-2b-instruct"
device="cuda"
model = AutoModelForCausalLM.from_pretrained(
        model_path,
        device_map=device,
        torch_dtype=torch.bfloat16,
    )
tokenizer = AutoTokenizer.from_pretrained(
        model_path
)

conv = [{"role": "user", "content":"You have 10 liters of a 30% acid solution. How many liters of a 70% acid solution must be added to achieve a 50% acid mixture?"}]

input_ids = tokenizer.apply_chat_template(conv, return_tensors="pt", thinking=True, return_dict=True, add_generation_prompt=True).to(device)

set_seed(42)
output = model.generate(
    **input_ids,
    max_new_tokens=8192,
)

prediction = tokenizer.decode(output[0, input_ids["input_ids"].shape[1]:], skip_special_tokens=True)
print(prediction)

示例輸出

thinking=True

Here is my thought process:

1. First, let's understand the composition of the initial 10 liters of 30% acid solution:
   - Acid content = 30% of 10 liters = 0.30 * 10 = 3 liters

2. We want to end up with a 50% acid solution. Let's denote the volume of the 70% acid solution to be added as 'x' liters.

3. The amount of acid in the final mixture (50% solution) should equal the initial acid content plus the acid from the added 70% solution.
   - Total acid in final mixture = 0.50 * (10 + x)

4. The equation representing the total acid content is:
   3 (from initial solution) + 0.70 * x (from added solution) = 0.50 * (10 + x)

5. Now, we solve for 'x':
   3 + 0.70x = 5 + 0.50x
   0.70x - 0.50x = 5 - 3
   0.20x = 2
   x = 2 / 0.20
   x = 10 liters

6. Therefore, 10 liters of a 70% acid solution must be added to the initial 10 liters of 30% acid solution to achieve a 50% acid mixture.

Here is my response:

To solve this problem, we need to determine how much of a 70% acid solution must be added to a 10-liter solution that is already 30% acid to create a new mixture that is 50% acid. Here's how to calculate it:

1. **Initial Acid Content**:
   - The initial 10-liter solution contains 3 liters of acid (since 30% of 10 liters).

2. **Desired Final Acid Content**:
   - We want the final mixture to be 50% acid.

3. **Equation Setup**:
   - Let 'x' be the volume (in liters) of the 70% acid solution to be added.
   - The equation representing the total acid content in the final mixture (which is 10 liters from the initial solution plus 'x' liters of the added solution) should equal 50% of the total volume (10 + x).
   - This gives us: 3 (acid from initial solution) + 0.70x (acid from added solution) = 0.50(10 + x)

4. **Solving the Equation**:
   - First, simplify the right side of the equation: 0.50(10 + x) = 5 + 0.50x
   - Now, the equation is: 3 + 0.70x = 5 + 0.50x
   - Subtract 0.50x from both sides: 0.20x = 2
   - Divide both sides by 0.20: x = 2 / 0.20
   - x = 10 liters

5. **Conclusion**:
   - To achieve a 50% acid mixture from the initial 30% solution, you need to add 10 liters of a 70% acid solution.

So, the final answer is that 10 liters of a 70% acid solution must be added to the initial 10 liters of 30% acid solution.

thinking=False

To solve this problem, we'll use the concept of volume and concentration. Let's denote the amount of 70% acid solution we need to add as "x" liters.

First, let's find out how much acid is in the initial 10-liter solution:

Acid in initial solution = 30% of 10 liters = 0.30 * 10 = 3 liters

Now, we want to end up with a 50% acid solution in a total volume of (10 + x) liters. Let's denote the final volume as V.

Final acid concentration = 50%
Final acid amount = 50% of V = 0.50 * V

We know the initial acid amount and the final acid amount, so we can set up an equation:

Initial acid amount + Acid from added solution = Final acid amount
3 liters + (70% of x) = 0.50 * (10 + x)

Now, let's solve for x:

0.70x + 3 = 0.50 * 10 + 0.50x
0.70x - 0.50x = 0.50 * 10 - 3
0.20x = 5 - 3
0.20x = 2
x = 2 / 0.20
x = 10 liters

So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30% acid solution to achieve a 50% acid mixture.

📚 詳細文檔

選擇合適的模型格式

選擇正確的模型格式取決於您的硬件能力和內存限制。

BF16（Brain Float 16） – 若支持BF16加速則使用

一種16位浮點格式，專為更快的計算而設計，同時保持良好的精度。
提供與FP32 相似的動態範圍，但內存使用更低。
如果您的硬件支持BF16加速（請檢查設備規格），建議使用。
與FP32相比，適用於高性能推理且內存佔用減少。

📌 使用BF16的情況： ✔ 您的硬件具有原生BF16支持（例如，較新的GPU、TPU）。 ✔ 您希望在節省內存的同時獲得更高的精度。 ✔ 您計劃將模型重新量化為其他格式。

📌 避免使用BF16的情況： ❌ 您的硬件不支持BF16（可能會回退到FP32並運行較慢）。 ❌ 您需要與缺乏BF16優化的舊設備兼容。

F16（Float 16） – 比BF16更廣泛支持

一種16位浮點格式，精度較高，但取值範圍比BF16小。
適用於大多數支持FP16加速的設備（包括許多GPU和一些CPU）。
數值精度略低於BF16，但通常足以進行推理。

📌 使用F16的情況： ✔ 您的硬件支持FP16但不支持BF16。 ✔ 您需要在速度、內存使用和準確性之間取得平衡。 ✔ 您在GPU或其他針對FP16計算優化的設備上運行。

📌 避免使用F16的情況： ❌ 您的設備缺乏原生FP16支持（可能會比預期運行得慢）。 ❌ 您有內存限制。

量化模型（Q4_K、Q6_K、Q8等） – 用於CPU和低VRAM推理

量化可在儘可能保持準確性的同時減小模型大小和內存使用。

低比特模型（Q4_K） → 內存使用最少，但精度可能較低。
高比特模型（Q6_K、Q8_0） → 準確性更好，但需要更多內存。

📌 使用量化模型的情況： ✔ 您在CPU上運行推理，需要優化的模型。 ✔ 您的設備VRAM較低，無法加載全精度模型。 ✔ 您希望在保持合理準確性的同時減少內存佔用。

📌 避免使用量化模型的情況： ❌ 您需要最高的準確性（全精度模型更適合）。 ❌ 您的硬件有足夠的VRAM用於更高精度的格式（BF16/F16）。

極低比特量化（IQ3_XS、IQ3_S、IQ3_M、Q4_K、Q4_0）

這些模型針對極端內存效率進行了優化，非常適合低功耗設備或內存是關鍵限制因素的大規模部署。

IQ3_XS：超低比特量化（3位），具有極端的內存效率。
- 用例：最適合超低內存設備，即使Q4_K也太大的情況。
- 權衡：與高比特量化相比，準確性較低。
IQ3_S：小塊大小，實現最大內存效率。
- 用例：最適合低內存設備，當IQ3_XS過於激進時。
IQ3_M：中等塊大小，比IQ3_S具有更好的準確性。
- 用例：適用於低內存設備，當IQ3_S限制過多時。
Q4_K：4位量化，具有逐塊優化，以提高準確性。
- 用例：最適合低內存設備，當Q6_K太大時。
Q4_0：純4位量化，針對ARM設備進行了優化。
- 用例：最適合基於ARM的設備或低內存環境。

模型格式選擇總結表

模型格式	精度	內存使用	設備要求	最佳用例
BF16	最高	高	支持BF16的GPU/CPU	減少內存的高速推理
F16	高	高	支持FP16的設備	當BF16不可用時的GPU推理
Q4_K	中低	低	CPU或低VRAM設備	內存受限環境的最佳選擇
Q6_K	中	適中	內存較多的CPU	量化模型中準確性較好的選擇
Q8_0	高	適中	有足夠VRAM的CPU或GPU	量化模型中準確性最高的選擇
IQ3_XS	非常低	非常低	超低內存設備	極端內存效率和低準確性
Q4_0	低	低	ARM或低內存設備	llama.cpp可針對ARM設備進行優化

包含的文件及詳情

`granite-3.2-2b-instruct-bf16.gguf`

模型權重以BF16保存。
如果您想將模型重新量化為不同的格式，請使用此文件。
如果您的設備支持BF16加速，則是最佳選擇。

`granite-3.2-2b-instruct-f16.gguf`

模型權重以F16存儲。
如果您的設備支持FP16，尤其是當BF16不可用時，請使用此文件。

`granite-3.2-2b-instruct-bf16-q8_0.gguf`

輸出和嵌入保持為BF16。
所有其他層量化為Q8_0。
如果您的設備支持BF16，並且您想要一個量化版本，請使用此文件。

`granite-3.2-2b-instruct-f16-q8_0.gguf`

輸出和嵌入保持為F16。
所有其他層量化為Q8_0。

`granite-3.2-2b-instruct-q4_k.gguf`

輸出和嵌入量化為Q8_0。
所有其他層量化為Q4_K。
適用於內存有限的CPU推理。

`granite-3.2-2b-instruct-q4_k_s.gguf`

最小的Q4_K變體，以犧牲準確性為代價使用更少的內存。
最適合極低內存設置。

`granite-3.2-2b-instruct-q6_k.gguf`

輸出和嵌入量化為Q8_0。
所有其他層量化為Q6_K。

`granite-3.2-2b-instruct-q8_0.gguf`

完全Q8量化的模型，以獲得更好的準確性。
需要更多內存，但提供更高的精度。

`granite-3.2-2b-instruct-iq3_xs.gguf`

IQ3_XS量化，針對極端內存效率進行了優化。
最適合超低內存設備。

`granite-3.2-2b-instruct-iq3_m.gguf`

IQ3_M量化，提供中等塊大小以提高準確性。
適用於低內存設備。

`granite-3.2-2b-instruct-q4_0.gguf`

純Q4_0量化，針對ARM設備進行了優化。
最適合低內存環境。
若追求更高準確性，建議選擇IQ4_NL。

測試模型

如果您發現這些模型有用，請幫忙測試我的AI網絡監控助手，進行量子就緒安全檢查： 👉 免費網絡監控器

💬 測試方法：

點擊聊天圖標（任何頁面的右下角）。
選擇一個AI助手類型：
- TurboLLM（GPT-4-mini）
- FreeLLM（開源）
- TestLLM（僅支持CPU的實驗性模型）

測試內容

我正在探索小型開源模型在AI網絡監控中的極限，具體包括：

針對即時網絡服務的函數調用。
模型可以多小，同時仍能處理：
- 自動化Nmap掃描。
- 量子就緒檢查。
- Metasploit集成。

🟡 TestLLM – 當前的實驗性模型（在6個CPU線程上運行llama.cpp）：

✅ 零配置設置
⏳ 30秒加載時間（推理較慢，但無API成本）
🔧 尋求幫助！ 如果您對邊緣設備AI感興趣，讓我們合作吧！

其他助手

🟢 TurboLLM – 使用gpt-4-mini進行：

即時網絡診斷
自動化滲透測試（Nmap/Metasploit）
🔑 通過下載我們的免費網絡監控代理獲取更多令牌。

🔵 HugLLM – 開源模型（約80億參數）：

比TurboLLM多2倍的令牌
AI日誌分析
🌐 在Hugging Face推理API上運行。

測試的AI命令示例

"Give me info on my websites SSL certificate"
"Check if my server is using quantum safe encyption for communication"
"Run a quick Nmap vulnerability test"

評估結果

模型	ArenaHard	Alpaca-Eval-2	MMLU	PopQA	TruthfulQA	BigBenchHard	DROP	GSM8K	HumanEval	HumanEval+	IFEval	AttaQ
Llama-3.1-8B-Instruct	36.43	27.22	69.15	28.79	52.79	72.66	61.48	83.24	85.32	80.15	79.10	83.43
DeepSeek-R1-Distill-Llama-8B	17.17	21.85	45.80	13.25	47.43	65.71	44.46	72.18	67.54	62.91	66.50	42.87
Qwen-2.5-7B-Instruct	25.44	30.34	74.30	18.12	63.06	70.40	54.71	84.46	93.35	89.91	74.90	81.90
DeepSeek-R1-Distill-Qwen-7B	10.36	15.35	50.72	9.94	47.14	65.04	42.76	78.47	79.89	78.43	59.10	42.45
Granite-3.1-8B-Instruct	37.58	30.34	66.77	28.7	65.84	68.55	50.78	79.15	89.63	85.79	73.20	85.73
Granite-3.1-2B-Instruct	23.3	27.17	57.11	20.55	59.79	54.46	18.68	67.55	79.45	75.26	63.59	84.7
Granite-3.2-8B-Instruct	55.25	61.19	66.79	28.04	66.92	64.77	50.95	81.65	89.35	85.72	74.31	85.42
Granite-3.2-2B-Instruct	24.86	34.51	57.18	20.56	59.8	52.27	21.12	67.02	80.13	73.39	61.55	83.23