RWKV7-Goose-Pile-168M-HF Open-Source Model - A Free and Practical Tool Supporting English Text Generation

RWKV7 Goose Pile 168M HF

Developed by RWKV

RWKV-7 model using Flash Linear Attention format, trained on the Pile dataset, supporting English text generation tasks.

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #Flash Linear Attention #English Text Generation #Dynamic State Evolution

Downloads 57

Release Time : 3/17/2025

Model Overview

This is a 168M-parameter RWKV-7 model using Flash Linear Attention format, primarily designed for English text generation tasks.

Model Features

Flash Linear Attention Format

Utilizes Flash Linear Attention format to enhance model efficiency.

Dynamic State Evolution

Supports expressive dynamic state evolution to improve model performance.

Efficient Training

Employs bfloat16 format and cosine decay learning rate for efficient training.

Model Capabilities

Text Generation

Language Modeling

Use Cases

Text Generation

Open-Domain Text Generation

Generates coherent and meaningful English text.

Language Understanding

Language Model Evaluation

Evaluates language understanding capabilities on benchmarks like LAMBADA and PIQA.

LAMBADA accuracy 45.6%, PIQA accuracy 65.5%

🚀 rwkv7-168M-pile

This is an RWKV-7 model in the flash-linear attention format, designed for text generation tasks.

🚀 Quick Start

Before using this model, you need to install flash-linear-attention <= 0.1.2 and the latest version of transformers:

pip install --no-use-pep517 flash-linear-attention==0.1.2
pip install 'transformers>=4.48.0'

✨ Features

This is an RWKV-7 model under the flash-linear attention format, which can be used for text generation tasks.

📦 Installation

Install flash-linear-attention <= 0.1.2 and the latest version of transformers before using this model:

pip install --no-use-pep517 flash-linear-attention==0.1.2
pip install 'transformers>=4.48.0'

💻 Usage Examples

Basic Usage

You can use this model just as any other HuggingFace models:

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-168M-pile', trust_remote_code=True)

📚 Documentation

Model Details

Model Description

Developed by: Bo Peng, Yu Zhang, Songlin Yang, Ruichong Zhang
Funded by: RWKV Project (Under LF AI & Data Foundation)
Model type: RWKV7
Language(s) (NLP): English
License: Apache-2.0
Parameter count: 168M
Tokenizer: GPT-NeoX 20B tokenizer

Model Sources

Repository: https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
Paper: RWKV-7 "Goose" with Expressive Dynamic State Evolution
Weights: Converted from https://modelscope.cn/models/RWKV/rwkv-7-pile/file/view/master?fileName=RWKV-x070-Pile-168M-20241120-ctx4096.pth

Uses

This model can be used for text generation tasks. You can use it just as any other HuggingFace models.

Training Details

Training Data

This model is trained on the Pile with a total of 332 billion tokens.

Training Hyperparameters

Training regime: bfloat16, lr 8e-4 to 3e-5 cosine decay, wd 0.1, bsz 8x30x4096

Evaluation

Metrics

lambada_openai: ppl 14.2 acc 45.6% piqa: acc 65.5%

FAQ

Q: safetensors metadata is none. A: upgrade transformers to >=4.48.0: pip install 'transformers>=4.48.0'

🔧 Technical Details

This model is an RWKV-7 model under the flash-linear attention format. It is trained on the Pile with a total of 332 billion tokens. The training regime is bfloat16, lr 8e-4 to 3e-5 cosine decay, wd 0.1, bsz 8x30x4096.

📄 License

This model is licensed under the Apache-2.0 license.

Property	Details
Model Type	RWKV7
Training Data	The model is trained on the Pile with a total of 332 billion tokens.
License	Apache-2.0

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご