Qwen3-30B-A3B-abliterated-fp4開源模型 - 適合文本生成任務，小參數量大作用

首頁

Qwen3 30B A3B Abliterated Fp4

由huihui-ai開發

這是Qwen3-30B-A3B-abliterated的4位量化模型，參數規模相當於8B，適合文本生成任務。

大型語言模型

Transformers

開源協議:Apache-2.0 #4位量化 #大模型輕量化 #無審查對話

下載量 103

發布時間 : 6/3/2025

模型概述

基於Qwen3-30B-A3B-abliterated的4位量化版本，主要用於文本生成任務，支持聊天應用。

模型特點

4位量化

採用FP4量化技術，顯著減少模型大小和內存佔用

高效推理

量化後模型推理效率更高，適合資源有限的環境

聊天優化

特別優化了聊天交互體驗，支持流式輸出

內容自由度

安全過濾機制較弱，生成內容限制較少

模型能力

文本生成

對話交互

內容創作

使用案例

聊天應用

智能對話

用於構建聊天機器人

可生成流暢自然的對話響應

內容創作

文本生成

用於輔助寫作和創意內容生成

可生成多樣化的文本內容

🚀 huihui-ai/Qwen3-30B-A3B-abliterated-fp4

這是 huihui-ai/Qwen3-30B-A3B-abliterated 的4位量化（"bnb_4bit_quant_type": "fp4"）模型。config.json 中包含了量化參數，因此在加載時無需額外添加量化參數。4位量化後的結果相當於8B參數。

🚀 快速開始

你可以使用Hugging Face的 transformers 庫加載此模型，並將其應用到你的項目中。

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
import torch
import os
import signal

cpu_count = os.cpu_count()
print(f"Number of CPU cores in the system: {cpu_count}")
half_cpu_count = cpu_count // 2
os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
torch.set_num_threads(half_cpu_count)

print(f"PyTorch threads: {torch.get_num_threads()}")
print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")

# Load the model and tokenizer
NEW_MODEL_ID = "huihui-ai/Qwen3-30B-A3B-abliterated-fp4"
print(f"Load Model {NEW_MODEL_ID} ... ")
model = AutoModelForCausalLM.from_pretrained(
    NEW_MODEL_ID,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id

messages = []
enable_thinking = True
skip_prompt=True
skip_special_tokens=True

class CustomTextStreamer(TextStreamer):
    def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
        super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
        self.generated_text = ""
        self.stop_flag = False

    def on_finalized_text(self, text: str, stream_end: bool = False):
        self.generated_text += text
        print(text, end="", flush=True)
        if self.stop_flag:
            raise StopIteration

    def stop_generation(self):
        self.stop_flag = True

def generate_stream(model, tokenizer, messages, enable_thinking, skip_prompt, skip_special_tokens, max_new_tokens):
    input_ids = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        enable_thinking = enable_thinking,
        add_generation_prompt=True,
        return_tensors="pt"
    )
    attention_mask = torch.ones_like(input_ids, dtype=torch.long)
    tokens = input_ids.to(model.device) 
    attention_mask = attention_mask.to(model.device)

    streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)

    def signal_handler(sig, frame):
        streamer.stop_generation()
        print("\n[Generation stopped by user with Ctrl+C]")

    signal.signal(signal.SIGINT, signal_handler)
    
    print("Response: ", end="", flush=True)
    try:
        generated_ids = model.generate(
            tokens,
            attention_mask=attention_mask,
            use_cache=False,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            pad_token_id=tokenizer.pad_token_id,
            streamer=streamer
        )
        del generated_ids
    except StopIteration:
        print("\n[Stopped by user]")

    del input_ids, attention_mask
    torch.cuda.empty_cache()
    signal.signal(signal.SIGINT, signal.SIG_DFL)

    return streamer.generated_text, streamer.stop_flag

while True:
    user_input = input("User: ").strip()
    if user_input.lower() == "/exit":
        print("Exiting chat.")
        break
    if user_input.lower() == "/clear":
        messages = []
        print("Chat history cleared. Starting a new conversation.")
        continue
    if user_input.lower() == "/no_think":
        if enable_thinking:
            enable_thinking = False
            print("Thinking = False.")
        else:
            enable_thinking = True
            print("Thinking = True.")        
        continue
    if user_input.lower() == "/skip_prompt":
        if skip_prompt:
            skip_prompt = False
            print("skip_prompt = False.")
        else:
            skip_prompt = True
            print("skip_prompt = True.")        
        continue
    if user_input.lower() == "/skip_special_tokens":
        if skip_special_tokens:
            skip_special_tokens = False
            print("skip_special_tokens = False.")
        else:
            skip_special_tokens = True
            print("skip_special_tokens = True.")        
        continue
    if not user_input:
        print("Input cannot be empty. Please enter something.")
        continue
    messages.append({"role": "user", "content": user_input})
    response, stop_flag = generate_stream(model, tokenizer, messages, enable_thinking, skip_prompt, skip_special_tokens, 8096)
    print("", flush=True)
    if stop_flag:
        continue
    messages.append({"role": "assistant", "content": response})

捐贈

如果你喜歡這個項目，請點擊“點贊”並關注我們以獲取更多更新。你可以關注 x.com/support_huihui 以獲取 huihui.ai 的最新模型信息。

你的捐贈將幫助我們持續進行開發和改進，一杯咖啡的錢就可以做到。

比特幣（BTC）：

  bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge

📄 許可證

本項目採用 Apache-2.0 許可證。

⚠️ 重要提示

敏感或爭議性輸出風險：此模型的安全過濾機制已被大幅削弱，可能會生成敏感、有爭議或不適當的內容。用戶應謹慎使用，並嚴格審查生成的輸出。
不適合所有受眾：由於內容過濾有限，該模型的輸出可能不適合公開場合、未成年人或對安全性要求較高的應用場景。
法律和道德責任：用戶必須確保其使用行為符合當地法律和道德標準。生成的內容可能存在法律或道德風險，用戶需對任何後果承擔全部責任。
研究和實驗用途：建議將此模型用於研究、測試或可控環境，避免直接用於生產或面向公眾的商業應用。
監控和審查建議：強烈建議用戶即時監控模型輸出，並在必要時進行人工審查，以防止不適當內容的傳播。
無默認安全保障：與標準模型不同，此模型未經過嚴格的安全優化。huihui.ai 對其使用產生的任何後果不承擔責任。

屬性	詳情
模型類型	文本生成
基礎模型	huihui-ai/Qwen3-30B-A3B-abliterated
標籤	chat、abliterated、uncensored

精選推薦AI模型

Llama 3 Typhoon V1.5x 8b Instruct

專為泰語設計的80億參數指令模型，性能媲美GPT-3.5-turbo，優化了應用場景、檢索增強生成、受限生成和推理任務

Cadet-Tiny是一個基於SODA數據集訓練的超小型對話模型，專為邊緣設備推理設計，體積僅為Cosmo-3B模型的2%左右。

Roberta Base Chinese Extractive Qa

基於RoBERTa架構的中文抽取式問答模型，適用於從給定文本中提取答案的任務。

智啟未來，您的人工智能解決方案智庫