RWKV - Raven - 7B Open - source Model - High - performance and Fast Inference, Supports Infinite Context Length

Rwkv Raven 7b

Developed by RWKV

RWKV is a recurrent neural network that combines the advantages of RNN and Transformer. It features high performance, fast inference, and memory-saving, and supports unlimited context length.

Large Language Model

Transformers

#Hybrid RNN-Transformer Architecture #Unlimited Context Length #Optimized for Chinese Conversations

Downloads 699

Release Time : 5/5/2023

Model Overview

RWKV is a recurrent neural network with Transformer-level performance, suitable for text generation tasks. It combines the advantages of RNN and Transformer.

Model Features

High Performance

It has Transformer-level performance for large language models.

Fast Inference

It has fast inference speed and saves video memory.

Unlimited Context Length

It supports unlimited context length, suitable for long text generation tasks.

Fast Training

It has fast training speed and supports parallel training.

Model Capabilities

Text Generation

Chat Conversation

Long Text Processing

Use Cases

Text Generation

Story Generation

Generate coherent story text based on prompts.

Generate story content that fits the context.

Chat Conversation

Used to build chatbots for natural language conversations.

Generate smooth conversation responses.

🚀 Model card for RWKV-4 | 7B parameters chat version (Raven)

RWKV is a project led by Bo Peng. It offers a unique approach to language models, combining the best of RNN and transformer architectures. You can learn more about the model architecture through blogposts by Johan Wind here and here. Join the RWKV discord server to get more insights into the project.

⚡ TL;DR

The following description is from the original repository:

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). It combines the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

📖 Model Details

You can find the detailed architecture information in the aforementioned blogposts and the Hugging Face blogpost about the integration.

💻 Usage

Convert the raw weights to the HF format

You can utilize the convert_rwkv_checkpoint_to_hf.py script. Specify the repo_id of the original weights, the filename, and the output directory. Optionally, you can directly push the converted model to the Hub by passing the --push_to_hub flag and the --model_name argument to indicate where to push the converted weights.

python convert_rwkv_checkpoint_to_hf.py --repo_id RAW_HUB_REPO --checkpoint_file RAW_FILE --output_dir OUTPUT_DIR --push_to_hub --model_name dummy_user/converted-rwkv

Generate text

You can use the AutoModelForCausalLM and AutoTokenizer classes to generate text from the model. Expand the sections below to understand how to run the model in different scenarios. The "Raven" models require specific prompting, and you can learn more about it in the integration blogpost.

🖥️ Running the model on a CPU

Click to expand

```python from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-raven-7b") tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-raven-7b")

prompt = "\nIn a shocking finding, scientist discovered a herd of dragons living in a remote, previously unexplored valley, in Tibet. Even more surprising to the researchers was the fact that the dragons spoke perfect Chinese."

inputs = tokenizer(prompt, return_tensors="pt") output = model.generate(inputs["input_ids"], max_new_tokens=40) print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))

</details>

#### �GPU Running the model on a single GPU
<details>
<summary> Click to expand </summary>
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-raven-7b").to(0)
tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-raven-7b")

prompt = "\nIn a shocking finding, scientist discovered a herd of dragons living in a remote, previously unexplored valley, in Tibet. Even more surprising to the researchers was the fact that the dragons spoke perfect Chinese."

inputs = tokenizer(prompt, return_tensors="pt").to(0)
output = model.generate(inputs["input_ids"], max_new_tokens=40)
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))

🎯 Running the model in half-precision, on GPU

Click to expand

```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-raven-7b", torch_dtype=torch.float16).to(0) tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-raven-7b")

inputs = tokenizer(prompt, return_tensors="pt").to(0) output = model.generate(inputs["input_ids"], max_new_tokens=40) print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))

</details>

#### 🤖 Running the model multiple GPUs
<details>
<summary> Click to expand </summary>
```python
# pip install accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-raven-7b", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("RWKV/rwkv-raven-7b")

prompt = "\nIn a shocking finding, scientist discovered a herd of dragons living in a remote, previously unexplored valley, in Tibet. Even more surprising to the researchers was the fact that the dragons spoke perfect Chinese."

inputs = tokenizer(prompt, return_tensors="pt").to(0)
output = model.generate(inputs["input_ids"], max_new_tokens=40)
print(tokenizer.decode(output[0].tolist(), skip_special_tokens=True))

📄 Citation

If you use this model, please consider citing the original work from the original repo here

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご