đ QwQ-32B-ArliAI-RpR-v4
The best role - playing and creative writing model from ArliAI, featuring reduced repetitions, increased training sequence length, and high - quality reasoning capabilities.
đ Quick Start
You can access the model at https://arliai.com and we also have a models ranking page at [https://www.arliai.com/models - ranking](https://www.arliai.com/models - ranking).
Ask questions in our new Discord Server https://discord.com/invite/t75KbPgwhk or on our subreddit https://www.reddit.com/r/ArliAI/.
⨠Features
RpR v4 Changes
- Reduced repetitions and impersonation: To enhance the creativity of RpR v3, a more advanced filtering method was employed to eliminate instances where the LLM repeated similar phrases or spoke on behalf of the user. Any remaining repetition or impersonation is due to the training of the base QwQ model, not the RpR dataset.
- Increased training sequence length: The training sequence length was extended to 16K to improve awareness and memory, even during longer chats.
RpR Series Overview
RpR (RolePlay with Reasoning) is a new model series from ArliAI, building directly on the successful dataset curation and training methods of the RPMax series. These models use a curated, deduplicated RP and creative writing dataset, emphasizing variety to ensure high creativity and minimize cross - context repetition.
đ Documentation
RpR Series: Building on RPMax with Reasoning
The RpR series uses the same curated dataset as RPMax, focusing on variety. The release of QwQ, an easily trainable high - performing open - source reasoning model, revealed limitations in existing instruct and creative writing reasoning datasets. To address this, Arli AI created a reasoning RP dataset by re - processing the RPMax dataset. The training process was carefully designed to ensure the model is trained in a way that mimics its inference usage, resulting in coherent and engaging outputs in long multi - turn RP chats.
Model Description
QwQ - 32B - ArliAI - RpR - v4 is the third release in the RpR series. It is a 32 - billion parameter model fine - tuned using the RpR dataset, which combines the RPMax dataset with techniques to maintain reasoning abilities in long multi - turn chats.
Recommended Samplers
- RpR models do not work well with repetition penalty samplers, even advanced ones like XTC or DRY.
- They perform best with simple sampler settings and a high
max tokens
value to allow for reasoning.
- You can also download the ST master export uploaded in the files section of this repo.
Recommended starting parameters:
- Temperature: 1.0
- MinP: 0.02
- TopK: 40
- Response Tokens: 2048+
Specs
Property |
Details |
Base Model |
QwQ - 32B |
Max Context Length |
Max 128K with Yarn (Natively 32K like the base QwQ) |
Parameters |
32B |
Reasoning Model |
Yes |
Training Details
- Sequence Length: 16384
- Epochs: 1 epoch training (Inherited from RPMax methods)
- Fine - tuning Method: RS - QLORA+ (Rank - Stabilized LoRA + LoRA Plus 8x)
- Rank/Alpha: 128 - rank 128 - alpha
- Learning Rate: 0.00001
- Scheduler: Rex
- Gradient accumulation: 32
Quantization
- BF16: [https://huggingface.co/ArliAI/QwQ - 32B - ArliAI - RpR - v4](https://huggingface.co/ArliAI/QwQ - 32B - ArliAI - RpR - v4)
- GGUF: [https://huggingface.co/ArliAI/QwQ - 32B - ArliAI - RpR - v4 - GGUF](https://huggingface.co/ArliAI/QwQ - 32B - ArliAI - RpR - v4 - GGUF)
How to use reasoning models correctly in ST
For reasoning models in ST:
- Set the prefix to ONLY
<think>
and the suffix to ONLY </think>
without spaces or newlines.
- Ensure the reply starts with
<think>
.
- Uncheck "Always add character names".
- Set "Include names" to "never".
- The chat template should conform to the model being used.
Note: Reasoning models work correctly only when "Include names" is set to "never". Otherwise, the model may be confused about whether to respond or reason first.
If you don't see the reasoning wrapped in the thinking block, check your settings or update your ST version. If the whole response is in the reasoning block, check for extra spaces or newlines in the <think>
and </think>
tokens.
The RPMax Foundation (Dataset & Training Philosophy)
The Goal: Reduced Repetition and Higher Creativity
The dataset curation for RPMax and RpR aims to reduce repetition and enhance the model's creative writing ability across different situations.
What is repetition and creativity?
- Creativity: Refers to the variety of outputs the model can generate, not just pleasant prose.
- Repetition:
- In - context repetition: Repeating phrases in a single conversation. RPMax and RpR do not yet focus on eliminating this type of repetition.
- Cross - context repetition: Repeating phrases or tropes in different situations. This is always bad and is the main target of the dataset curation.
Dataset Curation
The dataset for RPMax and RpR is curated from open - source creative writing and RP datasets on Hugging Face. Synthetic datasets are removed, and Llama 3.1 8B is used to de - dupe the dataset.
The Golden Rule of Fine - Tuning
For fine - tuning, quality is more important than quantity. The curated dataset is smaller but results in a more unique model.
Training Parameters and Unconventional Approach
The RPMax and RpR methodology uses one epoch, low gradient accumulation, and a higher - than - normal learning rate. The loss curve is unstable but decreasing over time, allowing the model to learn from each example without over - fitting.
đ License
This project is licensed under the Apache 2.0 license.

Image generated using Arli AI Image Generation [https://www.arliai.com/image - generation](https://www.arliai.com/image - generation)



