đ rwkv7-2.9B-world
This is an RWKV-7 model in the flash-linear attention format, designed for text generation tasks.
đ Quick Start
Before using this model, you need to install flash-linear-attention
<= 0.1.2 and the latest version of transformers
:
pip install --no-use-pep517 flash-linear-attention==0.1.2
pip install 'transformers>=4.48.0'
⨠Features
- Multilingual Support: Supports multiple languages including English, Chinese, Japanese, Korean, French, Arabic, Spanish, and Portuguese.
- High Metrics: Achieves high accuracy in relevant tasks.
- Based on Strong Base Model: Built upon the BlinkDL/rwkv-7-world base model.
đĻ Installation
Install flash-linear-attention
<= 0.1.2 and the latest version of transformers
:
pip install --no-use-pep517 flash-linear-attention==0.1.2
pip install 'transformers>=4.48.0'
đģ Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-2.9B-world', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-2.9B-world', trust_remote_code=True)
model = model.cuda()
prompt = "What is a large language model?"
messages = [
{"role": "user", "content": "Who are you?"},
{"role": "assistant", "content": "I am a GPT-3 based model."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024,
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=False)[0]
print(response)
đ Documentation
Model Details
Model Description
- Developed by: Bo Peng, Yu Zhang, Songlin Yang, Ruichong Zhang
- Funded by: RWKV Project (Under LF AI & Data Foundation)
- Model type: RWKV7
- Language(s) (NLP): English
- License: Apache-2.0
- Parameter count: 2.9B
- Tokenizer: RWKV World tokenizer
- Vocabulary size: 65,536
Model Sources
Uses
Direct Use
You can use this model just as any other HuggingFace models, as shown in the usage example above.
Training Data
This model is trained on the World v3 with a total of 3.119 trillion tokens.
Training Hyperparameters
- Training regime: bfloat16, lr 4e-4 to 1e-5 "delayed" cosine decay, wd 0.1 (with increasing batch sizes during the middle)
- Final Loss: 1.8745
- Token Count: 3.119 trillion
FAQ
â ī¸ Important Note
If safetensors metadata is none, upgrade transformers to >=4.48.0: pip install 'transformers>=4.48.0'
đ License
This model is released under the Apache-2.0 license.
Property |
Details |
Model Type |
RWKV7 |
Training Data |
Trained on the World v3 with a total of 3.119 trillion tokens. |