ARWKV-R1-1B5 is an early preview version of a 7-billion-parameter model based on RNN, trained through three-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B, with a context length of 2k.
Large Language Model
Transformers Supports Multiple Languages