ARWKV - R1 - 1B5オープンソースモデル - 蒸留訓練に基づく2kコンテキスト長アプリケーションの実現

ホーム

ARWKV R1 1B5

RWKV-Red-Teamによって開発

ARWKV-R1-1B5 はRNNベースの70億パラメータモデルの初期プレビュー版で、DeepSeek-R1-Distill-Qwen-1.5Bからの3段階の知識蒸留トレーニングにより作成され、コンテキスト長は2kです。

大規模言語モデル

Transformers

複数言語対応オープンソースライセンス:Apache-2.0 #リカレントニューラルネットワーク #知識蒸留 #効率的な推論

ダウンロード数 164

リリース時間 : 2/7/2025

モデル概要

ARWKV-R1-1B5 はRWKV-7時間混合とTransformer MLPアーキテクチャのハイブリッド設計モデルで、RWKV-7の効率的なリカレントメカニズムとセルフアテンションのない利点を示しています。

モデル特徴

効率的なリカレントメカニズム

RWKV-7ベースの効率的なリカレントメカニズムで、セルフアテンションがなく、完全にO(n)の複雑さです。

一定のメモリ使用量

モデルは推論プロセス中に一定のメモリ使用量を維持し、単一GPUでのトレーニングと推論に適しています。

ハイブリッドアーキテクチャ設計

RWKV-7時間混合とTransformer MLPアーキテクチャを組み合わせ、モデルの性能と効率を最適化しました。

モデル能力

テキスト生成

多言語サポート

効率的な推論

使用事例

一般的な質問応答

トリビアクイズ

世界クラスのトリビアAIとして、正確で簡潔な回答を提供します。

翻訳

多言語翻訳

中英語間の翻訳タスクをサポートします。

化学式

化学式生成

化学式を生成します。

🚀 ARWKV🪿

このモデルは、RWKV-7の時間混合とTransformer MLPを備えたテキスト生成モデルです。RWKV-7の効率的な再帰メカニズムを活用し、VRAM使用量が一定で、シングルGPUでの学習が可能です。

🚀 クイックスタート

このモデルを使用する前に、必要なライブラリをインストールします。

pip3 install --upgrade rwkv-fla transformers

学習を開始する前に、以下の環境変数を設定します。

export WKV_MODE=chunk

以下は、モデルを使用する基本的なPythonコードの例です。

from transformers import AutoModelForCausalLM, AutoTokenizer


model = AutoModelForCausalLM.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-1B5",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-1B5"
)

system_prompt = "You are a world class trivia AI - provide accurate, succinct responses. "
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
text = text + "<think>"
print(text)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

streamer = TextIteratorStreamer(tokenizer, skip_prompt=False, skip_special_tokens=False)


generation_kwargs = dict(model_inputs, streamer=streamer, max_new_tokens=8192, do_sample=True,tokenizer=tokenizer,stop_strings=["<｜end▁of▁sentence｜>"])
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

print("Streaming output:")
for new_text in streamer:
    print(new_text, end="", flush=True)

thread.join()

出力例は以下の通りです。

<｜begin▁of▁sentence｜>You are a world class trivia AI - provide accurate, succinct responses. <｜User｜>The world's largest rainforest, home to approximately three million species of plants and animals, is named after which river?<｜Assistant｜><think>
Okay, so I'm trying to solve this question about the world's largest rainforest and which river it's named after. Hmm, first, I think rainforest names often have links related to the region it's in. The most famous rainforest in the world is the Amazon. I remember hearing a lot about it being called that because rainforests are connected to specific river systems. 

Now, I'm trying to recall which river is named after the Amazon. I think it's the Amazon River. But I want to be sure. Let me see... the Amazon is a major rainforest located in South America. The Amazon River flows through it, which is why it's named after it. That makes sense because it's a very important river. I recall reading somewhere that all the rainforests are named after rivers related to their regions. So if the Amazon is named after its River, then the name would naturally be related to its source.

I wonder if it's the Amazon itself that's named after it, or another river named after it. But the official name for the Amazon is the Amazon Rainforest. The most significant rainforest in the world is the Amazon, and its name probably started with river-sounding names.
</think>

The largest rainforest located in South America is the Amazon. It is named after the river named after it, which is the Amazon River. Therefore, the Amazon River is the name given to the Amazon Rain Forest.

✨ 主な機能

✅ RWKV-7の効率的な再帰メカニズム
✅ セルフアテンションなし、完全にO(n)
✅ VRAM使用量が一定
✅ シングルGPUでの学習可能

また、近い将来、以下の機能を持つ拡張バージョンをオープンソース化する予定です。

🚀 16k以上のコンテキスト対応
🧮 数学に特化した改善
📚 RLによる推論モデルの強化

📦 インストール

必要なライブラリをインストールするには、以下のコマンドを実行します。

pip3 install --upgrade rwkv-fla transformers

学習を開始する前に、以下の環境変数を設定します。

export WKV_MODE=chunk

💻 使用例

基本的な使用法

from transformers import AutoModelForCausalLM, AutoTokenizer


model = AutoModelForCausalLM.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-1B5",
    device_map="auto",
    torch_dtype=torch.float16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "RWKV-Red-Team/ARWKV-R1-1B5"
)

system_prompt = "You are a world class trivia AI - provide accurate, succinct responses. "
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
text = text + "<think>"
print(text)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

streamer = TextIteratorStreamer(tokenizer, skip_prompt=False, skip_special_tokens=False)


generation_kwargs = dict(model_inputs, streamer=streamer, max_new_tokens=8192, do_sample=True,tokenizer=tokenizer,stop_strings=["<｜end▁of▁sentence｜>"])
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

print("Streaming output:")
for new_text in streamer:
    print(new_text, end="", flush=True)

thread.join()

📚 ドキュメント

🔑 主要な仕様

属性	詳情
アーキテクチャ	RWKV-7 TimeMix + SwiGLU（ハイブリッド設計）
コンテキストウィンドウ	2048学習CTX（プレビュー版の制限）
学習トークン	40M（知識蒸留に特化）
精度	FP16推論推奨（16G VRAM必要、BF16より15%向上）

🏗️ アーキテクチャの特徴

コア変更フロー

Transformer Decoder Layer:
- Multi-head Latent Attention(MLA)
+ RWKV-7 Time Mixing (Eq.3)
- RoPE Positional Encoding
+ State Recurrence
= Hybrid Layer Output