🚀 Model card for RWKV-4 | 7B parameters chat version (Raven)
RWKV is a project led by Bo Peng. It offers a unique approach to language models, combining the best of RNN and transformer architectures. You can learn more about the model architecture through blogposts by Johan Wind here and here. Join the RWKV discord server to get more insights into the project.

📚 Table of contents
- TL;DR
- Model Details
- Usage
- Citation
⚡ TL;DR
The following description is from the original repository:
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). It combines the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
📖 Model Details
You can find the detailed architecture information in the aforementioned blogposts and the Hugging Face blogpost about the integration.
💻 Usage
Convert the raw weights to the HF format
You can utilize the convert_rwkv_checkpoint_to_hf.py
script. Specify the repo_id of the original weights, the filename, and the output directory. Optionally, you can directly push the converted model to the Hub by passing the --push_to_hub
flag and the --model_name
argument to indicate where to push the converted weights.
python convert_rwkv_checkpoint_to_hf.py --repo_id RAW_HUB_REPO --checkpoint_file RAW_FILE --output_dir OUTPUT_DIR --push_to_hub --model_name dummy_user/converted-rwkv
Generate text
You can use the AutoModelForCausalLM
and AutoTokenizer
classes to generate text from the model. Expand the sections below to understand how to run the model in different scenarios. The "Raven" models require specific prompting, and you can learn more about it in the integration blogpost.
🖥️ Running the model on a CPU
Click to expand
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-raven-7b")
tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-raven-7b")
prompt = "\nIn a shocking finding, scientist discovered a herd of dragons living in a remote, previously unexplored valley, in Tibet. Even more surprising to the researchers was the fact that the dragons spoke perfect Chinese."
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(inputs["input_ids"], max_new_tokens=40)
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))
</details>
#### �GPU Running the model on a single GPU
<details>
<summary> Click to expand </summary>
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-raven-7b").to(0)
tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-raven-7b")
prompt = "\nIn a shocking finding, scientist discovered a herd of dragons living in a remote, previously unexplored valley, in Tibet. Even more surprising to the researchers was the fact that the dragons spoke perfect Chinese."
inputs = tokenizer(prompt, return_tensors="pt").to(0)
output = model.generate(inputs["input_ids"], max_new_tokens=40)
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))
🎯 Running the model in half-precision, on GPU
Click to expand
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-raven-7b", torch_dtype=torch.float16).to(0)
tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-raven-7b")
prompt = "\nIn a shocking finding, scientist discovered a herd of dragons living in a remote, previously unexplored valley, in Tibet. Even more surprising to the researchers was the fact that the dragons spoke perfect Chinese."
inputs = tokenizer(prompt, return_tensors="pt").to(0)
output = model.generate(inputs["input_ids"], max_new_tokens=40)
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))
</details>
#### 🤖 Running the model multiple GPUs
<details>
<summary> Click to expand </summary>
```python
# pip install accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-raven-7b", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-raven-7b")
prompt = "\nIn a shocking finding, scientist discovered a herd of dragons living in a remote, previously unexplored valley, in Tibet. Even more surprising to the researchers was the fact that the dragons spoke perfect Chinese."
inputs = tokenizer(prompt, return_tensors="pt").to(0)
output = model.generate(inputs["input_ids"], max_new_tokens=40)
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))
📄 Citation
If you use this model, please consider citing the original work from the original repo here