RWKV7-0.1B-g1 Open-source Model - Supports multilingual processing and has in-depth thinking ability

Rwkv7 0.1B G1

Developed by fla-hub

The RWKV-7 g1 model based on the Flash linear attention mechanism, supporting multilingual processing and having deep thinking ability

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual deep thinking #Linear attention mechanism #Long text generation

Downloads 377

Release Time : 3/10/2025

Model Overview

This is a multilingual large language model with 191 million parameters, adopting the RWKV7 architecture, supporting multiple languages such as English and Chinese, having deep thinking ability, and suitable for tasks such as text generation.

Model Features

Multilingual support

Supports the processing of multiple languages such as English, Chinese, Japanese, Korean, French, Arabic, Spanish, and Portuguese

Deep thinking ability

The g1 model series incorporates deep thinking ability and can generate higher-quality text

Efficient attention mechanism

Adopts the Flash linear attention mechanism to improve model efficiency

Model Capabilities

Multilingual text generation

Dialogue system

Content creation

Use Cases

Dialogue system

Intelligent assistant

Used to build a multilingual intelligent dialogue assistant

Can generate coherent and logical dialogue responses

Content creation

Multilingual content generation

Generates content such as news and stories in various languages

🚀 rwkv7-0.1B-g1

This is an RWKV-7 g1 model in flash-linear attention format. The g1 model series has incorporated significantly more data and deep thinking capabilities, offering enhanced performance in text generation tasks.

🚀 Quick Start

Before using this model, you need to install flash-linear-attention and the latest version of transformers:

pip install git+https://github.com/fla-org/flash-linear-attention
pip install 'transformers>=4.48.0'

✨ Features

Multilingual Support: Supports multiple languages including English, Chinese, Japanese, Korean, French, Arabic, Spanish, and Portuguese.
Deep Thinking Abilities: The g1 model series has been enhanced with deep thinking capabilities.
Large-Scale Training: Trained on World v3.5 with over 5 trillion tokens.

📦 Installation

Install the necessary libraries using the following commands:

pip install git+https://github.com/fla-org/flash-linear-attention
pip install 'transformers>=4.48.0'

💻 Usage Examples

Basic Usage

You can use this model just as any other HuggingFace models:

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('fla-hub/rwkv7-0.1B-g1', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('fla-hub/rwkv7-0.1B-g1', trust_remote_code=True)
model = model.cuda() # Supported on Nvidia/AMD/Intel eg. model.xpu()
prompt = "What is a large language model?"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True  # Default is True, set to False to disable thinking
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
    do_sample=True,
    temperature=1.0,
    top_p=0.3,
    repetition_penalty=1.2
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=False)[0]
print(response)

📚 Documentation

Model Details

Property	Details
Model Type	RWKV7
Developed by	Bo Peng, Yu Zhang, Songlin Yang, Ruichong Zhang
Funded by	RWKV Project (Under LF AI & Data Foundation)
Language(s) (NLP)	Multilingual
License	Apache-2.0
Parameter count	191M
Tokenizer	RWKV World tokenizer
Vocabulary size	65,536
Repository	https://github.com/fla-org/flash-linear-attention ; https://github.com/BlinkDL/RWKV-LM
Paper	https://arxiv.org/abs/2503.14456

Training Data

This model is trained on the World v3.5 with a total of more than 5 trillion tokens.

FAQ

Q: safetensors metadata is none.

A: upgrade transformers to >=4.48.0: pip install 'transformers>=4.48.0'

Thinking Prompt

<|rwkv_tokenizer_end_of_text|>User: <Your Question Here>

Assistant: <think

Don't close the brackets for <think!

Additional Caveats for Prompting

⚠️ Important Note

Always add <|rwkv_tokenizer_end_of_text|> (Token ID = 0) before your prompt. The model is incapable of attending the first token it receives due to state initialization issues.

Bad prompt example:

Mathews lifted a dark brow. "Are you sure about that? I mean, wouldn't it be better to wait until Dale is home safe and sound?"

"The longer I wait to tell her, the worse it will be for both of us."

"Good luck. You're going to need it," said

The model is unable to recall Mathews because it is the very first token of the input.

Good prompt example:

<|rwkv_tokenizer_end_of_text|>Mathews lifted a dark brow. "Are you sure about that? I mean, wouldn't it be better to wait until Dale is home safe and sound?"

"The longer I wait to tell her, the worse it will be for both of us."

"Good luck. You're going to need it," said

The model will output Mathews as expected.

Without this token: lambada_openai ppl=13.84 acc=48.13%

With this token added: lambada_openai ppl=12.36 acc=49.12%

Note: this phenomenon is very rare for Transformers but significant for RNNs. We speculate that the model uses the first token to pin the states, to better acquire information from later tokens.

📄 License

This model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご