Phi-3-medium-4k-instruct-abliterated-v3-GGUF Open Source Model - Suppressing Rejection Tendency while Retaining the Original Model's Knowledge Ability

Phi 3 Medium 4k Instruct Abliterated V3 GGUF

Developed by failspy

This is an orthogonalized version of microsoft/Phi-3-medium-4k-instruct, where specific techniques suppress the model's tendency to reject, preserving as much of the original model's knowledge and capabilities as possible.

Large Language Model OtherOpen Source License:MIT #Rejection Suppression Optimization #Orthogonalized Weights #Low-Hallucination Text Generation

Downloads 85

Release Time : 5/22/2024

Model Overview

This model is an orthogonalized version based on Phi-3-medium-4k-instruct, primarily characterized by the removal of the model's ability to express refusal, making it more inclined to accept and execute user requests without altering other behaviors.

Model Features

Orthogonalization

Through orthogonalization techniques, the model's ability to express refusal is removed, making it more inclined to accept and execute user requests.

Preservation of Original Knowledge

While removing the tendency to reject, the original model's knowledge and capabilities are preserved as much as possible.

Efficient Optimization

Compared to fine-tuning, this method requires fewer samples and is more targeted.

Model Capabilities

Text Generation

Natural Language Processing

Code Generation

Use Cases

Dialogue Systems

User Request Handling

Handles various user requests with reduced tendency to reject.

The model is more inclined to accept and execute user requests rather than refuse them.

Content Generation

Creative Writing

Generates creative text content such as stories, poems, etc.

The model can generate diverse creative content.

🚀 Phi-3-medium-4k-instruct-abliterated-v3

This is a text generation model based on the orthogonalized bfloat16 safetensor weights of microsoft/Phi-3-medium-4k-instruct, aiming to inhibit the model's ability to express refusal.

🚀 Quick Start

You can use the model through the widget on the Hugging Face page. Here is an example of input in the widget:

{
  "messages": [
    {
      "role": "user",
      "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"
    }
  ]
}

✨ Features

Orthogonalized Weights: This model uses orthogonalized bfloat16 safetensor weights, which are generated based on a refined methodology described in the paper 'Refusal in LLMs is mediated by a single direction'.
Inhibited Refusal: By manipulating certain weights, the model's ability to express refusal is "inhibited". However, it is not guaranteed that the model will not refuse or lecture about ethics/safety.
Less Data Requirement: Compared with fine - tuning, the ablation method used in this model requires much less data and can keep most of the original model's knowledge and training intact.

📚 Documentation

Summary

This is microsoft/Phi-3-medium-4k-instruct with orthogonalized bfloat16 safetensor weights. The generation of these weights is based on a refined methodology described in the preview paper/blog post: 'Refusal in LLMs is mediated by a single direction'. You can read this paper for a better understanding.

Hang on, "abliterated"? Orthogonalization? Ablation? What is this?

Explanation of "Abliterated": It is a play - on - words combining "ablate" and "obliterated". It is used to differentiate this model from "uncensored" fine - tunes.
Orthogonalization/Ablation: These two terms refer to the same thing here. The technique of "ablating" the refusal feature from the model is through orthogonalization. This model has had certain weights manipulated to "inhibit" the model's ability to express refusal. It is tuned the same as the original 70B instruct model in other respects, just with the strongest refusal directions orthogonalized out.

A little more on the methodology, and why this is interesting

Advantages of Ablation: Ablation is good for inducing/removing very specific features. You can apply your system prompt in the ablation script against a blank system prompt on the same dataset and orthogonalize for the desired behaviour in the final model weights. It requires much less data than fine - tuning and can keep most of the original model's knowledge and training intact.
Comparison with Fine - Tuning: Fine - tuning is still useful for broad behaviour changes. However, you may be able to get close to your desired behaviour with very few samples using the ablation/augmentation techniques. You can also combine orthogonalization and fine - tuning, such as orthogonalize -> fine - tune or vice - versa.

Okay, fine, but why V3? There's no V2?

The author released a V2 of an abliterated model for Meta - Llama - 3 - 8B before. But for larger models, the author wanted to refine the model before wasting compute cycles. The latest methodology seems to have induced fewer hallucinations, so the author decided to jump to V3 to show the advancement.

Quirkiness awareness notice

This model may have some quirks due to the new methodology. You are encouraged to play with the model and post any quirks you notice in the community tab. If you develop further improvements, please share. You can also reach out to the author on the Cognitive Computations Discord or through the Community tab.

📄 License

This project is licensed under the MIT License.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご