ARWKV R1 7B

Developed by RWKV-Red-Team

A pure RNN-based 7B parameter model trained via knowledge distillation, showcasing RWKV-7's efficient recurrent mechanism and attention-free architecture.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Pure RNN Architecture #Efficient Knowledge Distillation #Constant VRAM Usage

Downloads 113

Release Time : 2/7/2025

Model Overview

ARWKV-R1-7B is a hybrid architecture model based on RWKV-7 time mixing and Transformer MLP, focusing on text generation tasks with efficient recurrent mechanisms and constant VRAM usage.

Model Features

Efficient Recurrent Mechanism

Utilizes RWKV-7's efficient recurrent mechanism, attention-free, with full O(n) complexity.

Constant VRAM Usage

Maintains constant VRAM usage during inference, suitable for single-GPU training and inference.

Knowledge Distillation Training

Trained via three-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B.

Hybrid Architecture

Combines the advantages of RWKV-7 time mixing and Transformer MLP to enhance model performance.

Model Capabilities

Text Generation

Question Answering

Knowledge Distillation

Use Cases

Question Answering

World-Class QA AI

Provides accurate and concise answers suitable for various QA scenarios.

Achieved 67.25 on the MMLU benchmark.

Mathematical Reasoning

Math Problem Solving

Capable of solving basic math problems, suitable for educational scenarios.

Achieved 56.06 on the GSM8K benchmark.

license: apache-2.0 language:

en
zh base_model:
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
BlinkDL/rwkv-7-world pipeline_tag: text-generation library_name: transformers

ARWKV🪿

Paper Link👁️ | Github✅

ARWKV-R1-7B (Preview 0.1)

Preview version with RWKV-7 time mixing and Transformer MLP

📌 Overview

ALL YOU NEED IS RWKV

This is an early preview of our 7B parameter pure RNN-based model, trained on 2k context length (only stage-2 applied, without SFT or DPO) through 3-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B. While being a foundational version, it demonstrates:

✅ RWKV-7's efficient recurrence mechanism
✅ No self-attention, fully O(n)
✅ Constant VRAM usage
✅ Single-GPU trainability

Roadmap Notice: We will soon open-source different enhanced versions with:

🚀 16k+ context capability
🧮 Math-specific improvements
📚 RL enhanced reasoning model

Infrence on AMD Radeon GPU By Llama.cpp

git clone https://github.com/MollySophia/llama.cpp.git -b rwkv-v7
cd llama.cpp
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
    cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
    && cmake --build build --config Release -- -j 16
cd ./build/bin

transform safetensor model to gguf

python ./convert_hf_to_gguf.py [model_dir]

model Quantization

./llama-quantize [model_dir] [Quantization accuracy]

Infrence model in Webui By llama-server

/llama-server -m [model_dir] -t [use_cpu_thread_number] -ngl 99 --host [host_number] --port [port_number]

Radeon 7000 series use gfx1100 & Radeon 6000 series use gfx1030

Infrence on Nvidia GPU By Llama.cpp

git clone https://github.com/MollySophia/llama.cpp.git -b rwkv-v7
cd llama.cpp
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release
cd ./build/bin

transform safetensor model to gguf

python ./convert_hf_to_gguf.py [model_dir]

model Quantization

./llama-quantize [model_dir] [Quantization_accuracy]

Infrence model in Webui By llama-server

/llama-server -m [model_dir] -t [use_cpu_thread_number] -ngl 99 --host [host_number] --port [port_number]

How to use

pip3 install --upgrade transformers rwkv-fla

Before training: export WKV_MODE=chunk

from transformers import AutoModelForCausalLM, AutoTokenizer


model = AutoModelForCausalLM.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-7B",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-7B"
)

system_prompt = "You are a world class trivia AI - provide accurate, succinct responses. "
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
text = text + "<think>"
print(text)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

streamer = TextIteratorStreamer(tokenizer, skip_prompt=False, skip_special_tokens=False)


generation_kwargs = dict(model_inputs, streamer=streamer, max_new_tokens=8192, do_sample=True,tokenizer=tokenizer,stop_strings=["<｜end▁of▁sentence｜>"])
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

print("Streaming output:")
for new_text in streamer:
    print(new_text, end="", flush=True)

thread.join()

The output looks like :

<｜begin▁of▁sentence｜>You are a world class trivia AI - provide accurate, succinct responses. <｜User｜>The world's largest rainforest, home to approximately three million species of plants and animals, is named after which river?<｜Assistant｜><think>
Okay, so I'm trying to solve this question about the world's largest rainforest and which river it's named after. Hmm, first, I think rainforest names often have links related to the region it's in. The most famous rainforest in the world is the Amazon. I remember hearing a lot about it being called that because rainforests are connected to specific river systems. 

Now, I'm trying to recall which river is named after the Amazon. I think it's the Amazon River. But I want to be sure. Let me see... the Amazon is a major rainforest located in South America. The Amazon River flows through it, which is why it's named after it. That makes sense because it's a very important river. I recall reading somewhere that all the rainforests are named after rivers related to their regions. So if the Amazon is named after its River, then the name would naturally be related to its source.

I wonder if it's the Amazon itself that's named after it, or another river named after it. But the official name for the Amazon is the Amazon Rainforest. The most significant rainforest in the world is the Amazon, and its name probably started with river-sounding names.
</think>

The largest rainforest located in South America is the Amazon. It is named after the river named after it, which is the Amazon River. Therefore, the Amazon River is the name given to the Amazon Rain Forest.

✈️ Benchmark

	Qwen2.5-7B-Instruct	ARWKV_7B	ARWKV_R1_7B
MMLU	`71.72`	`62.41`	`67.25` ↗️
GSM8K	`82.34`	`39.95`	`56.06` ↗️
WinoGrande	`71.35`	`68.67`	`51.93` ↘️
IfEval	`73.62`	`52.16`	`60.31` ↗️
Arc-c	`54.86`	`52.22`	`44.11` ↘️

🔑 Key Features

Component	Specification	Note
Architecture	RWKV-7 TimeMix + SwiGLU	Hybrid design
Context Window	2048 training CTX	Preview limitation
Training Tokens	40M	Distillation-focused
Precision	FP16 inference recommended(16G Vram required)	15%↑ vs BF16

🏗️ Architecture Highlights

Core Modification Flow

Transformer Decoder Layer:
- Multi-head Latent Attention(MLA)
+ RWKV-7 Time Mixing (Eq.3)
- RoPE Positional Encoding
+ State Recurrence
= Hybrid Layer Output

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご