A

ARWKV R1 7B

Developed by RWKV-Red-Team
A pure RNN-based 7B parameter model trained via knowledge distillation, showcasing RWKV-7's efficient recurrent mechanism and attention-free architecture.
Downloads 113
Release Time : 2/7/2025

Model Overview

ARWKV-R1-7B is a hybrid architecture model based on RWKV-7 time mixing and Transformer MLP, focusing on text generation tasks with efficient recurrent mechanisms and constant VRAM usage.

Model Features

Efficient Recurrent Mechanism
Utilizes RWKV-7's efficient recurrent mechanism, attention-free, with full O(n) complexity.
Constant VRAM Usage
Maintains constant VRAM usage during inference, suitable for single-GPU training and inference.
Knowledge Distillation Training
Trained via three-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B.
Hybrid Architecture
Combines the advantages of RWKV-7 time mixing and Transformer MLP to enhance model performance.

Model Capabilities

Text Generation
Question Answering
Knowledge Distillation

Use Cases

Question Answering
World-Class QA AI
Provides accurate and concise answers suitable for various QA scenarios.
Achieved 67.25 on the MMLU benchmark.
Mathematical Reasoning
Math Problem Solving
Capable of solving basic math problems, suitable for educational scenarios.
Achieved 56.06 on the GSM8K benchmark.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase