QwQ-32B-ArliAI-RpR-v4-GGUF Open Source Model - Supports Role-playing and Long Conversations in Creative Writing

Qwq 32B ArliAI RpR V4 GGUF

Developed by ArliAI

A high-performance roleplay and creative writing reasoning model launched by ArliAI, fine-tuned based on the 32-billion-parameter QwQ-32B foundation model, focusing on long-dialogue coherence and creative output

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Long-text Roleplay #Enhanced Reasoning for Creative Writing #16K Context Memory

Downloads 1,957

Release Time : 5/22/2025

Model Overview

A reasoning model optimized for roleplay and creative writing, enhanced with reconstructed RPMax dataset to improve long-turn dialogue capability while reducing repetition and character proxy issues

Model Features

Long Dialogue Reasoning Ability

Training sequence length extended to 16K, enhancing the model's situational awareness and memory in long conversations

Reduced Repetition & Proxy

Employs advanced filtering methods to reduce model's tendency to repeat similar phrases or speak on behalf of users

Creative Writing Optimization

Based on curated RPMax dataset to ensure highly creative outputs while minimizing cross-context repetition

Extended Context Support

Natively supports 32K context length, extendable to 128K using Yarn technology

Model Capabilities

Long-text Generation

Roleplay Dialogue

Creative Writing

Multi-turn Dialogue Reasoning

Use Cases

Entertainment & Creation

Interactive Roleplay

Engage in deep interactive dialogues with AI characters

Generates coherent and character-consistent long dialogues

Creative Writing Assistance

Generate creative content like novels and scripts

Provides diverse creative expressions and plot developments

🚀 QwQ-32B-ArliAI-RpR-v4

QwQ-32B-ArliAI-RpR-v4 is a fine - tuned model from ArliAI. It focuses on role - playing and creative writing, with features like reduced repetition, increased training sequence length, and better performance in long multi - turn chats.

Image generated using Arli AI Image Generation https://www.arliai.com/image-generation

✨ Features

RpR v4 Changes

Reduced repetitions and impersonation: To enhance the creativity and out - of - the - box thinking of RpR v3, a more advanced filtering method was employed to eliminate examples where the LLM repeated similar phrases or impersonated the user. Any remaining repetition or impersonation is due to the training of the base QwQ model, not the RpR dataset.
Increased training sequence length: The training sequence length was extended to 16K to improve awareness and memory, even in longer chats.

RpR Series Overview: Building on RPMax with Reasoning

RpR (RolePlay with Reasoning) is a new model series from ArliAI, building directly on the successful dataset curation and training methods of the RPMax series. These models use the same curated, deduplicated RP and creative writing dataset as RPMax, emphasizing variety to ensure high creativity and minimize cross - context repetition.

With the release of QwQ, the first high - performing open - source reasoning model that can be easily trained, it was found that existing instruct and creative writing reasoning datasets had only one response per example. This single - response dataset led to degraded output quality in long multi - turn chats. So, Arli AI created a real RP model capable of long multi - turn chat with reasoning.

To create RpR, the existing RPMax dataset was re - processed into a reasoning dataset. The base QwQ Instruct model was used to create the reasoning process for each turn in the RPMax dataset conversation examples, which were then refined. The training run was completed using axolotl with a manual template - free segments dataset to ensure the model was never trained to see the reasoning block in the context, just like during inference.

The result is consistently coherent and interesting outputs, even in long multi - turn RP chats. This is, as far as known, the first correctly - trained reasoning model for RP and creative writing.

You can access the model at https://arliai.com, and there is also a models ranking page at https://www.arliai.com/models - ranking. You can ask questions on the Discord Server https://discord.com/invite/t75KbPgwhk or the subreddit https://www.reddit.com/r/ArliAI/.

📦 Installation

No installation steps were provided in the original README.

💻 Usage Examples

No code examples were provided in the original README.

📚 Documentation

Model Description

QwQ - 32B - ArliAI - RpR - v4 is the third release in the RpR series. It is a 32 - billion parameter model fine - tuned using the RpR dataset, based on the curated RPMax dataset, combined with techniques to maintain reasoning abilities in long multi - turn chats.

Recommended Samplers

RpR models do not work well with repetition penalty type of samplers, even more advanced ones such as XTC or DRY.
They work best with simple sampler settings and a high max tokens value to allow for longer reasoning.
You can download the ST master export uploaded in the files section of this repo.

It is recommended to start with:

Temperature: 1.0
MinP: 0.02
TopK: 40
Response Tokens: 2048+

Specs

Property	Details
Model Type	QwQ - 32B - ArliAI - RpR - v4
Base Model	QwQ - 32B
Max Context Length	Max 128K with Yarn (Natively 32K like base QwQ)
Parameters	32B
Reasoning Model	Yes

Training Details

Property	Details
Sequence Length	16384
Epochs	1 epoch training (Inherited from RPMax methods)
Fine - tuning Method	RS - QLORA+ (Rank - Stabilized LoRA + LoRA Plus 8x)
Rank/Alpha	128 - rank 128 - alpha
Learning Rate	0.00001
Scheduler	Rex
Gradient accumulation	32

Very Nice Training graphs :)

Quantization

Property	Details
BF16	https://huggingface.co/ArliAI/QwQ - 32B - ArliAI - RpR - v4
GGUF	https://huggingface.co/ArliAI/QwQ - 32B - ArliAI - RpR - v4 - GGUF

How to use reasoning models correctly in ST

For reasoning models:

Set the prefix to ONLY and the suffix to ONLY without spaces or newlines.
Ensure the reply starts with .
Uncheck "Always add character names".
Set "Include names" to never.
The chat template should conform to the model being used.

Note: Reasoning models work properly only when "include names" is set to never. If enabled, it appends the character name at the end, confusing the model.

The rest of the sampler parameters can be set as desired.

If the reasoning is not wrapped inside the thinking block, the settings may be incorrect or the ST version may be too old. If the whole response is in the reasoning block, there may be an extra space or newline in the and tokens.

If you set everything up correctly, it should look like this:

The RPMax Foundation (Dataset & Training Philosophy)

The following sections detail the core philosophy behind the dataset and training methodology originally developed for RPMax, which serves as the foundation for the RpR series.

The Goal: Reduced Repetition and Higher Creativity

The goal of the dataset curation for RPMax and RpR is to reduce repetitions and increase the models' ability to write creatively in different situations. This means the models will output responses differently across various scenarios.

What is repetition and creativity?

Creativity refers to the variety in the model's output, not just pleasant writing prose. Repetition and creativity are intertwined. There are two types of repetition:

In - context repetition: This is when a model repeats the same phrases in a single conversation. While it can make the writing seem boring, in some cases, it can be intentional. RPMax and RpR do not yet focus on eliminating this type of repetition.

Cross - context repetition: This is when a model repeats the same phrases or tropes in different situations. It is always bad as it indicates over - fitting. The primary goal of the dataset curation is to reduce cross - context repetition by ensuring the dataset has no repetitions of the same situations or characters.

Dataset Curation

The success of models trained on this dataset is due to the training method and the unique dataset. It includes many open - source creative writing and RP datasets from Hugging Face, with synthetic generations removed. Llama 3.1 8B is used to de - dupe the datasets.

The Golden Rule of Fine - Tuning

For fine - tuning, quality is more important than quantity. The dataset used is smaller but results in a unique model.

Training Parameters and Unconventional Approach

The RPMax and RpR methodology uses one single epoch, low gradient accumulation, and a higher - than - normal learning rate. The loss curve is unstable during training but decreases over time. This approach allows the models to learn from each example without reinforcing single tropes.

🔧 Technical Details

The technical details are covered in the above sections, including the model's architecture, training methods, and dataset curation.

📄 License

The model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご