đ rwkv7-168M-pile
This is an RWKV-7 model in the flash-linear attention format, designed for text generation tasks.
đ Quick Start
Before using this model, you need to install flash-linear-attention
<= 0.1.2 and the latest version of transformers
:
pip install --no-use-pep517 flash-linear-attention==0.1.2
pip install 'transformers>=4.48.0'
⨠Features
This is an RWKV-7 model under the flash-linear attention format, which can be used for text generation tasks.
đĻ Installation
Install flash-linear-attention
<= 0.1.2 and the latest version of transformers
before using this model:
pip install --no-use-pep517 flash-linear-attention==0.1.2
pip install 'transformers>=4.48.0'
đģ Usage Examples
Basic Usage
You can use this model just as any other HuggingFace models:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
đ Documentation
Model Details
Model Description
- Developed by: Bo Peng, Yu Zhang, Songlin Yang, Ruichong Zhang
- Funded by: RWKV Project (Under LF AI & Data Foundation)
- Model type: RWKV7
- Language(s) (NLP): English
- License: Apache-2.0
- Parameter count: 168M
- Tokenizer: GPT-NeoX 20B tokenizer
Model Sources
- Repository: https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
- Paper: RWKV-7 "Goose" with Expressive Dynamic State Evolution
- Weights: Converted from https://modelscope.cn/models/RWKV/rwkv-7-pile/file/view/master?fileName=RWKV-x070-Pile-168M-20241120-ctx4096.pth
Uses
This model can be used for text generation tasks. You can use it just as any other HuggingFace models.
Training Details
Training Data
This model is trained on the Pile with a total of 332 billion tokens.
Training Hyperparameters
- Training regime: bfloat16, lr 8e-4 to 3e-5 cosine decay, wd 0.1, bsz 8x30x4096
Evaluation
Metrics
lambada_openai
: ppl 14.2 acc 45.6%
piqa
: acc 65.5%
FAQ
Q: safetensors metadata is none.
A: upgrade transformers to >=4.48.0: pip install 'transformers>=4.48.0'
đ§ Technical Details
This model is an RWKV-7 model under the flash-linear attention format. It is trained on the Pile with a total of 332 billion tokens. The training regime is bfloat16, lr 8e-4 to 3e-5 cosine decay, wd 0.1, bsz 8x30x4096.
đ License
This model is licensed under the Apache-2.0 license.
Property |
Details |
Model Type |
RWKV7 |
Training Data |
The model is trained on the Pile with a total of 332 billion tokens. |
License |
Apache-2.0 |