Qwen3 30B A3B Abliterated Fp4
This is a 4-bit quantized model of Qwen3-30B-A3B-abliterated, with a parameter scale equivalent to 8B, suitable for text generation tasks.
Downloads 103
Release Time : 6/3/2025
Model Overview
Based on the 4-bit quantized version of Qwen3-30B-A3B-abliterated, mainly used for text generation tasks and supports chat applications.
Model Features
4-bit quantization
Adopts FP4 quantization technology to significantly reduce model size and memory usage
Efficient inference
The quantized model has higher inference efficiency, suitable for environments with limited resources
Chat optimization
Specifically optimizes the chat interaction experience and supports streaming output
Content freedom
The security filtering mechanism is weak, with fewer restrictions on generated content
Model Capabilities
Text generation
Dialogue interaction
Content creation
Use Cases
Chat applications
Intelligent dialogue
Used to build chatbots
Can generate smooth and natural dialogue responses
Content creation
Text generation
Used for auxiliary writing and creative content generation
Can generate diverse text content
🚀 huihui-ai/Qwen3-30B-A3B-abliterated-fp4
This is a 4-bit quantized model for text generation, offering efficient performance with reduced safety filtering.
🚀 Quick Start
This is the 4-bit quantized ("bnb_4bit_quant_type": "fp4") model of huihui-ai/Qwen3-30B-A3B-abliterated. The config.json contains quantization parameters, so there is no need to add quantization parameters when loading it.
The result after 4-bit quantization is equivalent to 8B parameters.
✨ Features
- Text Generation: Ideal for generating text based on user input.
- 4-bit Quantization: Offers efficient performance with reduced parameter size.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
import torch
import os
import signal
cpu_count = os.cpu_count()
print(f"Number of CPU cores in the system: {cpu_count}")
half_cpu_count = cpu_count // 2
os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
torch.set_num_threads(half_cpu_count)
print(f"PyTorch threads: {torch.get_num_threads()}")
print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
# Load the model and tokenizer
NEW_MODEL_ID = "huihui-ai/Qwen3-30B-A3B-abliterated-fp4"
print(f"Load Model {NEW_MODEL_ID} ... ")
model = AutoModelForCausalLM.from_pretrained(
NEW_MODEL_ID,
device_map="auto",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
messages = []
enable_thinking = True
skip_prompt=True
skip_special_tokens=True
class CustomTextStreamer(TextStreamer):
def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
self.generated_text = ""
self.stop_flag = False
def on_finalized_text(self, text: str, stream_end: bool = False):
self.generated_text += text
print(text, end="", flush=True)
if self.stop_flag:
raise StopIteration
def stop_generation(self):
self.stop_flag = True
def generate_stream(model, tokenizer, messages, enable_thinking, skip_prompt, skip_special_tokens, max_new_tokens):
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
enable_thinking = enable_thinking,
add_generation_prompt=True,
return_tensors="pt"
)
attention_mask = torch.ones_like(input_ids, dtype=torch.long)
tokens = input_ids.to(model.device)
attention_mask = attention_mask.to(model.device)
streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
def signal_handler(sig, frame):
streamer.stop_generation()
print("\n[Generation stopped by user with Ctrl+C]")
signal.signal(signal.SIGINT, signal_handler)
print("Response: ", end="", flush=True)
try:
generated_ids = model.generate(
tokens,
attention_mask=attention_mask,
use_cache=False,
max_new_tokens=max_new_tokens,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
streamer=streamer
)
del generated_ids
except StopIteration:
print("\n[Stopped by user]")
del input_ids, attention_mask
torch.cuda.empty_cache()
signal.signal(signal.SIGINT, signal.SIG_DFL)
return streamer.generated_text, streamer.stop_flag
while True:
user_input = input("User: ").strip()
if user_input.lower() == "/exit":
print("Exiting chat.")
break
if user_input.lower() == "/clear":
messages = []
print("Chat history cleared. Starting a new conversation.")
continue
if user_input.lower() == "/no_think":
if enable_thinking:
enable_thinking = False
print("Thinking = False.")
else:
enable_thinking = True
print("Thinking = True.")
continue
if user_input.lower() == "/skip_prompt":
if skip_prompt:
skip_prompt = False
print("skip_prompt = False.")
else:
skip_prompt = True
print("skip_prompt = True.")
continue
if user_input.lower() == "/skip_special_tokens":
if skip_special_tokens:
skip_special_tokens = False
print("skip_special_tokens = False.")
else:
skip_special_tokens = True
print("skip_special_tokens = True.")
continue
if not user_input:
print("Input cannot be empty. Please enter something.")
continue
messages.append({"role": "user", "content": user_input})
response, stop_flag = generate_stream(model, tokenizer, messages, enable_thinking, skip_prompt, skip_special_tokens, 8096)
print("", flush=True)
if stop_flag:
continue
messages.append({"role": "assistant", "content": response})
Advanced Usage
No advanced usage code is provided in the original document.
📚 Documentation
Usage Warnings
- Risk of Sensitive or Controversial Outputs: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
- Not Suitable for All Audiences: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
- Legal and Ethical Responsibilities: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
- Research and Experimental Use: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
- Monitoring and Review Recommendations: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
- No Default Safety Guarantees: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
Donation
If you like it, please click 'like' and follow us for more updates.
You can follow x.com/support_huihui to get the latest model information from huihui.ai.
Your donation helps us continue our further development and improvement, a cup of coffee can do it.
- bitcoin(BTC):
bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
📄 License
This project is licensed under the Apache-2.0 license.
Phi 2 GGUF
Other
Phi-2 is a small yet powerful language model developed by Microsoft, featuring 2.7 billion parameters, focusing on efficient inference and high-quality text generation.
Large Language Model Supports Multiple Languages
P
TheBloke
41.5M
205
Roberta Large
MIT
A large English language model pre-trained with masked language modeling objectives, using improved BERT training methods
Large Language Model English
R
FacebookAI
19.4M
212
Distilbert Base Uncased
Apache-2.0
DistilBERT is a distilled version of the BERT base model, maintaining similar performance while being more lightweight and efficient, suitable for natural language processing tasks such as sequence classification and token classification.
Large Language Model English
D
distilbert
11.1M
669
Llama 3.1 8B Instruct GGUF
Meta Llama 3.1 8B Instruct is a multilingual large language model optimized for multilingual dialogue use cases, excelling in common industry benchmarks.
Large Language Model English
L
modularai
9.7M
4
Xlm Roberta Base
MIT
XLM-RoBERTa is a multilingual model pretrained on 2.5TB of filtered CommonCrawl data across 100 languages, using masked language modeling as the training objective.
Large Language Model Supports Multiple Languages
X
FacebookAI
9.6M
664
Roberta Base
MIT
An English pre-trained model based on Transformer architecture, trained on massive text through masked language modeling objectives, supporting text feature extraction and downstream task fine-tuning
Large Language Model English
R
FacebookAI
9.3M
488
Opt 125m
Other
OPT is an open pre-trained Transformer language model suite released by Meta AI, with parameter sizes ranging from 125 million to 175 billion, designed to match the performance of the GPT-3 series while promoting open research in large-scale language models.
Large Language Model English
O
facebook
6.3M
198
1
A pretrained model based on the transformers library, suitable for various NLP tasks
Large Language Model
Transformers

1
unslothai
6.2M
1
Llama 3.1 8B Instruct
Llama 3.1 is Meta's multilingual large language model series, featuring 8B, 70B, and 405B parameter scales, supporting 8 languages and code generation, with optimized multilingual dialogue scenarios.
Large Language Model
Transformers Supports Multiple Languages

L
meta-llama
5.7M
3,898
T5 Base
Apache-2.0
The T5 Base Version is a text-to-text Transformer model developed by Google with 220 million parameters, supporting multilingual NLP tasks.
Large Language Model Supports Multiple Languages
T
google-t5
5.4M
702
Featured Recommended AI Models