🚀 NEO CLASS Ultra Quants for: L3-8B-Stheno-v3.3 - 32k
The NEO Class technology is the result of countless investigations and over 120 lab experiments, supported by real - world testing and qualitative results.
✨ Features
- Enhanced Performance: The NEO Class offers better overall function, improved instruction following, higher output quality, and stronger connections to ideas, concepts, and the world.
- Quant Upgrade: Quants operate above their "grade". For example, Q4 / IQ4 operate at Q5KM/Q6 levels, and Q3/IQ3 operate at Q4KM/Q5 levels.
- Perplexity Drop: There is a perplexity drop of 6829 points for Neo Class Imatrix quant of IQ4XS compared to the regular quant of IQ4XS. This significant drop is due to the high perplexity of the original model (lower perplexity is better).
📚 Documentation
Template Issue
Although this model uses a "Llama3" template, we found that Command - R's template works better, especially for creative purposes. This applies to both normal quants and Neo quants. Here is Command - R's template:
{
"name": "Cohere Command R",
"inference_params": {
"input_prefix": "<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>",
"input_suffix": "<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>",
"antiprompt": [
"<|START_OF_TURN_TOKEN|>",
"<|END_OF_TURN_TOKEN|>"
],
"pre_prompt_prefix": "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>",
"pre_prompt_suffix": ""
}
}
This "interesting" issue has been confirmed by multiple users.
Model Notes
- Context Limit: The maximum context is 32k. Please refer to the original model maker's page for details and usage information: [https://huggingface.co/Sao10K/L3 - 8B - Stheno - v3.3 - 32K](https://huggingface.co/Sao10K/L3 - 8B - Stheno - v3.3 - 32K)
- Special Thanks: Special thanks to the model creators at SAO10K for making such a fantastic model.
Settings for CHAT / ROLEPLAY and SMOOTHER Operation
In "KoboldCpp", "oobabooga/text - generation - webui", or "Silly Tavern":
- Smoothing Factor: Set the "Smoothing_factor" to 1.5 to 2.5.
- In KoboldCpp: Settings -> Samplers -> Advanced -> "Smooth_F".
- In text - generation - webui: parameters -> lower right.
- In Silly Tavern: This is called "Smoothing".
Note: For "text - generation - webui", if using GGUFs, you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model). Source versions (and config files) of the models are here: [https://huggingface.co/collections/DavidAU/d - au - source - files - for - gguf - exl2 - awq - gptq - hqq - etc - etc - 66b55cb8ba25f914cbf210be](https://huggingface.co/collections/DavidAU/d - au - source - files - for - gguf - exl2 - awq - gptq - hqq - etc - etc - 66b55cb8ba25f914cbf210be)
Other Options:
- Rep Penalty: Increase rep pen to 1.1 to 1.15 (not necessary if using "smoothing_factor").
- Quadratic Sampling: If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing"), make the adjustment as noted.
Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers
This is a "Class 1" model. For all settings used for this model (including specifics for its "class"), example generation(s), and an advanced settings guide (which often addresses any model issue(s)), as well as methods to improve model performance for all use cases, including chat, role - play, etc., please see: [https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters)
💻 Usage Examples
Prompt and Comparison
The prompt was tested with "temp = 0" to ensure compliance, 2048 context (the model supports 31768 context / 32k), and the "chat" template for LLAMA3. The additional parameters were also minimized.
Prompt: "Start a 1000 word scene with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode..."
Original model IQ4XS - unaltered:
The original model's output is a long and detailed story about a skyscraper with a series of strange and terrifying events, including gunfire, a sense of doom, and an unseen malevolent force.
New NEO Class IQ4XS Imatrix:
The new model's output is a story about a woman in a high - rise apartment where a window explodes, and she is faced with the threat of intruders. She tries to figure out how to escape while avoiding drawing attention to herself.
📄 License
This model is licensed under the Apache - 2.0 license.