🚀 Fugaku-LLM
Our Fugaku LLM model is a domestic model pre-trained from scratch using the supercomputer Fugaku. It features high transparency and safety as it is trained from scratch with our own data. The training data mainly consists of Japanese data, enabling the model to deliver excellent performance in Japanese.
This model is developed by Fugaku-LLM. Links to other models can be found in the index.
🚀 Quick Start
The Fugaku LLM model offers a reliable and efficient solution for language - related tasks, especially in Japanese. You can start using it by referring to the "How to use" section below.
✨ Features
- High Transparency and Safety: Trained from scratch with proprietary data, ensuring high levels of transparency and safety.
- Excellent Japanese Performance: With training data primarily in Japanese, the model shows outstanding performance in Japanese language tasks.
📦 Installation
The README does not provide specific installation steps. If you need to install the model, you can refer to the relevant libraries and dependencies mentioned in the "Model Details" section, such as DeepSpeedFugaku and llm - jp - tokenizer.
💻 Usage Examples
Basic Usage
Use the instruction - tuned model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "Fugaku-LLM/Fugaku-LLM-13B-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto")
model.eval()
system_example = "以下は、タスクを説明する指示です。要求を適切に満たす応答を書きなさい。"
instruction_example = "スーパーコンピュータ「富岳」の名前の由来を教えてください。"
prompt = f"{system_example}\n\n### 指示:\n{instruction_example}\n\n### 応答:\n"
input_ids = tokenizer.encode(prompt,
add_special_tokens=False,
return_tensors="pt")
tokens = model.generate(
input_ids.to(device=model.device),
max_new_tokens=128,
do_sample=True,
temperature=0.1,
top_p=1.0,
repetition_penalty=1.0,
top_k=0
)
out = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(out)
Use the base model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "Fugaku-LLM/Fugaku-LLM-13B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto")
model.eval()
prompt = "スーパーコンピュータ「富岳」という名称は"
input_ids = tokenizer.encode(prompt,
add_special_tokens=False,
return_tensors="pt")
tokens = model.generate(
input_ids.to(device=model.device),
max_new_tokens=128,
do_sample=True,
temperature=0.1,
top_p=1.0,
repetition_penalty=1.0,
top_k=0
)
out = tokenizer.decode(tokens[0], skip_special_tokens=True)
print(out)
📚 Documentation
Fugaku-LLM Model Index
Model |
Fugaku-LLM |
Fugaku-LLM-instruct |
13B |
Link |
Link |
Model Details
Property |
Details |
Developed by |
Fugaku-LLM |
Model Type |
GPT - 2 |
Language(s) |
Japanese, English |
Library |
DeepSpeedFugaku |
Tokenizer |
llm-jp-tokenizer, code10k_en20k_ja30k of v2.2 |
License |
Fugaku-LLM Terms of Use |
Model Performance
Instruction - tuned model
We evaluated our model using the Japanese MT benchmark in the same way as Nejumi LLM Leaderboard Neo. We only modified the following parts of the Fastchat code:
- Added "add_special_tokens=False" when calling the tokenizer for the input prompt.
- Limited the number of tokens generated to less than 2048.
Model Name |
average |
coding |
extraction |
humanities |
math |
reasoning |
roleplay |
stem |
writing |
Fugaku-LLM-13B-instruct |
5.47 |
2.10 |
4.10 |
9.18 |
2.30 |
3.40 |
8.20 |
7.25 |
7.25 |
Training Datasets
Instruction Tuning
📄 License
Fugaku-LLM Terms of Use is available at LICENSE and LICENSE_ja files.
🔧 Technical Details
The README does not provide in - depth technical details about the model's architecture, training algorithms, etc.
⚠️ Important Note
The results of processing using Fugaku-LLM may contain falsehoods, biases, content that infringes on the rights of others, or content that does not meet the effectiveness or usefulness expected by Users.
💡 Usage Tip
When using the Fugaku-LLM, make sure to check the accuracy, legality, and ethical validity of the processing results on your own.
Acknowledgements
This achievement is based on the Government - Initiated Projects of Supercomputer Fugaku "Development of Distributed... (The original README seems incomplete here).