L3-8B-Stheno-v3.3-32K-Ultra-NEO-V1-IMATRIX-GGUF Open-source Large Language Model - Strong Instruction Following with Ultra-long Context

L3 8B Stheno V3.3 32K Ultra NEO V1 IMATRIX GGUF

Developed by DavidAU

An 8B parameter large language model based on NEO CLASS technology, with a 32k context window and enhanced instruction-following ability

Large Language Model EnglishOpen Source License:Apache-2.0 #Highly relevant reasoning #Ultra-long context processing #Low perplexity quantization

Downloads 1,086

Release Time : 6/25/2024

Model Overview

This model is optimized using NEO CLASS technology and performs excellently in instruction following, output quality, and concept relevance. It is suitable for various text generation tasks such as creative writing and role-playing

Model Features

Optimized by NEO CLASS technology

Verified through over 120 laboratory experiments, significantly improving model performance and output quality

Improved quantization level

Q4/IQ4 can reach the Q5KM/Q6 level, and Q3/IQ3 can reach the Q4KM/Q5 level

Low perplexity

The IQ4XS quantized by Neo Class Imatrix has a 6829-point lower perplexity compared to the regular IQ4XS

Long context support

Supports a 32k ultra-long context window

Model Capabilities

Creative text generation

Role-playing dialogue

Coherent long text writing

Instruction following

Use Cases

Creative writing

Scene generation

Generate a coherent scene description of over 1000 words based on the starting sentence

The example shows detailed scene descriptions and plot development

Role-playing

Character dialogue

Use the Command-R template for more creative character interactions

🚀 NEO CLASS Ultra Quants for: L3-8B-Stheno-v3.3 - 32k

The NEO Class technology is the result of countless investigations and over 120 lab experiments, supported by real - world testing and qualitative results.

✨ Features

Enhanced Performance: The NEO Class offers better overall function, improved instruction following, higher output quality, and stronger connections to ideas, concepts, and the world.
Quant Upgrade: Quants operate above their "grade". For example, Q4 / IQ4 operate at Q5KM/Q6 levels, and Q3/IQ3 operate at Q4KM/Q5 levels.
Perplexity Drop: There is a perplexity drop of 6829 points for Neo Class Imatrix quant of IQ4XS compared to the regular quant of IQ4XS. This significant drop is due to the high perplexity of the original model (lower perplexity is better).

📚 Documentation

Template Issue

Although this model uses a "Llama3" template, we found that Command - R's template works better, especially for creative purposes. This applies to both normal quants and Neo quants. Here is Command - R's template:

{
  "name": "Cohere Command R",
  "inference_params": {
    "input_prefix": "<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>",
    "input_suffix": "<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>",
    "antiprompt": [
      "<|START_OF_TURN_TOKEN|>",
      "<|END_OF_TURN_TOKEN|>"
    ],
    "pre_prompt_prefix": "<|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>",
    "pre_prompt_suffix": ""
  }
}

This "interesting" issue has been confirmed by multiple users.

Model Notes

Context Limit: The maximum context is 32k. Please refer to the original model maker's page for details and usage information: [https://huggingface.co/Sao10K/L3 - 8B - Stheno - v3.3 - 32K](https://huggingface.co/Sao10K/L3 - 8B - Stheno - v3.3 - 32K)
Special Thanks: Special thanks to the model creators at SAO10K for making such a fantastic model.

Settings for CHAT / ROLEPLAY and SMOOTHER Operation

In "KoboldCpp", "oobabooga/text - generation - webui", or "Silly Tavern":

Smoothing Factor: Set the "Smoothing_factor" to 1.5 to 2.5.
- In KoboldCpp: Settings -> Samplers -> Advanced -> "Smooth_F".
- In text - generation - webui: parameters -> lower right.
- In Silly Tavern: This is called "Smoothing".

Note: For "text - generation - webui", if using GGUFs, you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model). Source versions (and config files) of the models are here: [https://huggingface.co/collections/DavidAU/d - au - source - files - for - gguf - exl2 - awq - gptq - hqq - etc - etc - 66b55cb8ba25f914cbf210be](https://huggingface.co/collections/DavidAU/d - au - source - files - for - gguf - exl2 - awq - gptq - hqq - etc - etc - 66b55cb8ba25f914cbf210be)

Other Options:

Rep Penalty: Increase rep pen to 1.1 to 1.15 (not necessary if using "smoothing_factor").
Quadratic Sampling: If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing"), make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This is a "Class 1" model. For all settings used for this model (including specifics for its "class"), example generation(s), and an advanced settings guide (which often addresses any model issue(s)), as well as methods to improve model performance for all use cases, including chat, role - play, etc., please see: [https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters](https://huggingface.co/DavidAU/Maximizing - Model - Performance - All - Quants - Types - And - Full - Precision - by - Samplers_Parameters)

💻 Usage Examples

Prompt and Comparison

The prompt was tested with "temp = 0" to ensure compliance, 2048 context (the model supports 31768 context / 32k), and the "chat" template for LLAMA3. The additional parameters were also minimized.

Prompt: "Start a 1000 word scene with: The sky scraper swayed, as she watched the window in front of her on the 21 floor explode..."

Original model IQ4XS - unaltered: The original model's output is a long and detailed story about a skyscraper with a series of strange and terrifying events, including gunfire, a sense of doom, and an unseen malevolent force.

New NEO Class IQ4XS Imatrix: The new model's output is a story about a woman in a high - rise apartment where a window explodes, and she is faced with the threat of intruders. She tries to figure out how to escape while avoiding drawing attention to herself.

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご