BrtGPT-124m-Base無料英語基礎AIモデル - オープンソースモデルの使用難題を簡単に解決

ホーム

Brtgpt 124m Base

Bertug1911によって開発

BrtGPT-124M-Baseは大量の英文コーパスを用いて事前学習された基礎モデルで、無料で利用でき、オープンソースモデルの使用が煩雑で処理能力要求が高いという問題を解決しました。

大規模言語モデル

Transformers

#英文テキスト生成 #軽量モデル #基礎事前学習

ダウンロード数 2,128

リリース時間 : 5/23/2025

モデル概要

このモデルは主に英文テキストの生成に使用され、基礎的な言語モデルであり、質問応答機能は備えていません。

モデル特徴

無料で使いやすい

オープンソースモデルの使用が煩雑で処理能力要求が高いという問題を解決し、ユーザーは無料で簡単に使用およびダウンロードできます。

事前学習基礎モデル

約500万の英文文のタグで事前学習され、テキスト生成タスクに特化しています。

軽量

モデルのパラメータ規模は124Mで、一般的なハードウェアで実行できます。

モデル能力

英文テキスト生成

使用事例

テキスト生成

文章補完

与えられたプロンプトに基づいて完全な文章を生成します。

例えば、「Math is so important because」と入力すると、モデルは関連する補完内容を生成します。

🚀 BrtGPT-124M-Base

BrtGPT-124M-Baseは、大量の英語コーパスで事前学習された基礎モデルです。無料で利用でき、オープンソースモデルの使い勝手が悪いことと、大量の処理能力が必要な問題を解決します。

🚀 クイックスタート

このモデルを使用して英語の文章を生成することができ、無料で利用できます。以下はモデルをロードするコード例です。

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# 加载模型和分词器
model_name = "Bertug1911/BrtGPT-124m-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# 输入
prompt = "Math is so important because"

# 分词
inputs = tokenizer(prompt, return_tensors="pt")

# 模型生成
output = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.01,
    top_k=1,
    do_sample=True
)

# 解码输出
generated_text = tokenizer.decode(output[0], skip_special_tokens=False)
generated_text = generated_text.replace(" ", "")
generated_text = generated_text.replace("Ġ", " ")
print(generated_text)

直接利用

オープンソースモデルは使い勝手が悪く、大量の処理能力が必要ですが、当社のモデルはこの2つの問題を解決しました。以下のリンクから無料で簡単に利用およびダウンロードできます！

Web（Gradio, Spaces）ユーザーインターフェースが完成しました！Hugging Face Spacesで無料で簡単に利用できます。リンクはこちら："https://huggingface.co/spaces/Bertug1911/BrtGPT-Web-UI"

適用範囲外の利用

このモデルは英語のトークンのみを使用して英語の文章を生成（完成）します（一部の日本語/中国語のトークンも含まれます）。他の言語での利用は避けてください！

✨ 主な機能

このモデルは約500万個の英語の文章トークンで学習されています。
ChatGPTやLLamaとは異なり、このモデルは質問応答用ではなく、大規模なコーパスで事前学習された基礎モデルです。
無料で簡単に利用およびダウンロードできます。

📦 インストール

注意：モデルは2025年6月14日に再度500万個のトークンで学習されました。以前に重みをダウンロードした場合は、再インストールしてください！

💻 使用例

基本的な使用法

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# 加载模型和分词器
model_name = "Bertug1911/BrtGPT-124m-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# 输入
prompt = "Math is so important because"

# 分词
inputs = tokenizer(prompt, return_tensors="pt")

# 模型生成
output = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.01,
    top_k=1,
    do_sample=True
)

# 解码输出
generated_text = tokenizer.decode(output[0], skip_special_tokens=False)
generated_text = generated_text.replace(" ", "")
generated_text = generated_text.replace("Ġ", " ")
print(generated_text)

異なる入力による出力例

入力	最大新トークン数	温度	出力
"Today"	50	0.1	"Today, a complex and multifaceted system, is often viewed as a myriad of the intricacies of the human mind and the intricacies of the human condition. It is believed to be a powerful"
"To stop world hunger, we should"	50	0.1	"To stop world hunger, we should be able to find a more stable and healthy relationship with the body. By doing so, we can make a mealtime and easier to start and maintain a healthy and balance."
"Math is so important because"	50	0.1	"MMath is so important because it's essential to carefully consider and address any potential health concerns that may arise from the condition, as it can lead to a range of health issues. By including the bleeding and potentially causing sympt..."
"To be rich, you should,"	50	0.4	"To be rich you should be on the same time, it's essential to consider the various factors that contribute to your unique needs. For instance, it's crucial to consider that you should be taking a black room,..."

📚 ドキュメント

モデル詳細

モデルの説明

非常に重要な更新については、コミュニティをご確認ください！

開発者：Bertug Gunel (Bertuğ Günel)
資金提供元：なし
共有元：なし
モデルタイプ：デコーダーのみのTransformer
言語（NLP）：英語
ライセンス：CC - BY - NC - 4.0
ファインチューニング元モデル：ファインチューニングされていません

モデルの出所

リポジトリ：近日公開！
論文："Attention All You Need", 1706.03762
デモ：このモデル自体がデモモデルです。

学習詳細

学習データ

注意：モデルは2025年6月13日に再度500万個のトークンで学習されました。以前に重みをダウンロードした場合は、再インストールしてください！

モデルはTrain.csv（500万個のトークン、15000行以上）で学習されています。

データタイプ	学習タイプ	トークン総数	状態
生データ（文章）	事前学習	約500万（5000K）	完了
生データ（文章）	ファインチューニング（テストと使用時のモデル性能向上のため！）	約10万（100K）	6月17日に完了
命令（近日公開！）	命令ファインチューニング（IFT）	近日公開！	近日公開！（7月5 - 15日頃）

ファインチューニングの詳細：ファインチューニングされたモデルには、このリンクからアクセスできます："https://huggingface.co/Bertug1911/BrtGPT-124m-FineTuned" 注意：ファインチューニングされたモデルには "model.safetensors" ファイルがあり、これには重みが含まれています（このモデルからファインチューニングされています）。