ARWKV - R1 - 1B5 Open-Source Model - Implementing 2k Context Length Application Based on Distillation Training

ARWKV R1 1B5

Developed by RWKV-Red-Team

ARWKV-R1-1B5 is an early preview version of a 7-billion-parameter model based on RNN, trained through three-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B, with a context length of 2k.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Recurrent Neural Network #Knowledge Distillation #Efficient Inference

Downloads 164

Release Time : 2/7/2025

Model Overview

ARWKV-R1-1B5 is a hybrid design model based on RWKV-7 time mixing and Transformer MLP architecture, showcasing the efficient recurrent mechanism and the advantage of no self-attention in RWKV-7.

Model Features

Efficient Recurrent Mechanism

Based on RWKV-7's efficient recurrent mechanism, with no self-attention and fully O(n) complexity.

Constant Memory Usage

The model maintains constant memory usage during inference, making it suitable for single-GPU training and inference.

Hybrid Architecture Design

Combines RWKV-7 time mixing with Transformer MLP architecture, optimizing model performance and efficiency.

Model Capabilities

Text Generation

Multilingual Support

Efficient Inference

Use Cases

General Q&A

Trivia Q&A

Acts as a world-class trivia AI, providing accurate and concise answers.

Translation

Multilingual Translation

Supports translation tasks between Chinese and English.

Chemical Equations

Chemical Equation Generation

Generates chemical equations.

🚀 ARWKV🪿

This is an early preview of a 7B parameter RNN-based model, leveraging RWKV-7's recurrence mechanism and Transformer MLP, offering efficient performance and constant VRAM usage.

🚀 Quick Start

This is an early preview of our 7B parameter RNN-based model, trained on 2k context length (only stage-2 applied, without SFT or DPO) through 3-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B.

Installation

pip3 install --upgrade rwkv-fla transformers

Before training: export WKV_MODE=chunk

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer


model = AutoModelForCausalLM.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-1B5",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-1B5"
)

system_prompt = "You are a world class trivia AI - provide accurate, succinct responses. "
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
text = text + "<think>"
print(text)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

streamer = TextIteratorStreamer(tokenizer, skip_prompt=False, skip_special_tokens=False)


generation_kwargs = dict(model_inputs, streamer=streamer, max_new_tokens=8192, do_sample=True,tokenizer=tokenizer,stop_strings=["<｜end▁of▁sentence｜>"])
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

print("Streaming output:")
for new_text in streamer:
    print(new_text, end="", flush=True)

thread.join()

The output looks like :

<｜begin▁of▁sentence｜>You are a world class trivia AI - provide accurate, succinct responses. <｜User｜>The world's largest rainforest, home to approximately three million species of plants and animals, is named after which river?<｜Assistant｜><think>
Okay, so I'm trying to solve this question about the world's largest rainforest and which river it's named after. Hmm, first, I think rainforest names often have links related to the region it's in. The most famous rainforest in the world is the Amazon. I remember hearing a lot about it being called that because rainforests are connected to specific river systems. 

Now, I'm trying to recall which river is named after the Amazon. I think it's the Amazon River. But I want to be sure. Let me see... the Amazon is a major rainforest located in South America. The Amazon River flows through it, which is why it's named after it. That makes sense because it's a very important river. I recall reading somewhere that all the rainforests are named after rivers related to their regions. So if the Amazon is named after its River, then the name would naturally be related to its source.

I wonder if it's the Amazon itself that's named after it, or another river named after it. But the official name for the Amazon is the Amazon Rainforest. The most significant rainforest in the world is the Amazon, and its name probably started with river-sounding names.
</think>

The largest rainforest located in South America is the Amazon. It is named after the river named after it, which is the Amazon River. Therefore, the Amazon River is the name given to the Amazon Rain Forest.

✨ Features

✅ RWKV-7's efficient recurrence mechanism
✅ No self-attention, fully O(n)
✅ Constant VRAM usage
✅ Single-GPU trainability

Roadmap Notice: We will soon open-source different enhanced versions with:

🚀 16k+ context capability
🧮 Math-specific improvements
📚 RL enhanced reasoning model

🔧 Technical Details

Key Features Table

Property	Details
Architecture	RWKV-7 TimeMix + SwiGLU (Hybrid design)
Context Window	2048 training CTX (Preview limitation)
Training Tokens	40M (Distillation-focused)
Precision	FP16 inference recommended(16G Vram required) (15%↑ vs BF16)

Architecture Highlights

Core Modification Flow

Transformer Decoder Layer:
- Multi-head Latent Attention(MLA)
+ RWKV-7 Time Mixing (Eq.3)
- RoPE Positional Encoding
+ State Recurrence
= Hybrid Layer Output

📚 Documentation

Use case

📄 License

This project is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご