🚀 Japanese GPT-1B
This repository offers a 1.3B-parameter Japanese GPT model trained by rinna Co., Ltd.. It provides a powerful solution for Japanese text generation tasks in the field of natural language processing.
🚀 Quick Start
The following is a basic guide on how to use the model:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("rinna/japanese-gpt-1b", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("rinna/japanese-gpt-1b")
if torch.cuda.is_available():
model = model.to("cuda")
text = "西田幾多郎は、"
token_ids = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
output_ids = model.generate(
token_ids.to(model.device),
max_length=100,
min_length=100,
do_sample=True,
top_k=500,
top_p=0.95,
pad_token_id=tokenizer.pad_token_id,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
bad_words_ids=[[tokenizer.unk_token_id]]
)
output = tokenizer.decode(output_ids.tolist()[0])
print(output)
✨ Features
- Powerful Text Generation: Capable of generating high - quality Japanese text.
- Transformer - Based: Built on a 24 - layer, 2048 - hidden - size transformer architecture.
📦 Installation
The code examples in the quick start section assume you have installed the necessary libraries. You can install them using the following command:
pip install torch transformers
📚 Documentation
Model Architecture
The model is a 24 - layer, 2048 - hidden - size transformer - based language model.
Training
The model was trained on Japanese C4, [Japanese CC - 100](http://data.statmt.org/cc - 100/ja.txt.xz) and Japanese Wikipedia to optimize a traditional language modelling objective. It reaches around 14 perplexity on a chosen validation set from the same data.
Tokenization
The model uses a sentencepiece - based tokenizer. The vocabulary was first trained on a selected subset from the training data using the official sentencepiece training script, and then augmented with emojis and symbols.
Release Date
January 26, 2022
How to Cite
@misc{rinna-japanese-gpt-1b,
title = {rinna/japanese-gpt-1b},
author = {Zhao, Tianyu and Sawada, Kei},
url = {https://huggingface.co/rinna/japanese-gpt-1b}
}
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\url{https://arxiv.org/abs/2404.01657}}
}
📄 License
This project is licensed under The MIT license.
Additional Information
Property |
Details |
Model Type |
Japanese GPT |
Training Data |
Japanese CC100, Wikipedia, C4 |
License |
MIT |
Tags |
GPT, Text Generation, LM, NLP |