š mt5_summarize_japanese
This model is a fine - tuned version of [google/mt5 - small](https://huggingface.co/google/mt5 - small) for Japanese text summarization, offering an effective solution for quickly getting summaries of Japanese content.
š Quick Start
This model is a fine - tuned version of [google/mt5 - small](https://huggingface.co/google/mt5 - small) trained for Japanese summarization. It is fine - tuned on BBC news articles (XL - Sum Japanese dataset), where the first sentence (headline sentence) is used as the summary and the rest as the article.
So, please fill news story (including, such as, event, background, result, and comment) as source text in the inferece widget. (Other corpra - such as, conversation, business document, academic paper, or short tale - are not seen in the training set.)
It achieves the following results on the evaluation set:
Property |
Details |
Loss |
1.8952 |
Rouge1 |
0.4625 |
Rouge2 |
0.2866 |
Rougel |
0.3656 |
Rougelsum |
0.3868 |
⨠Features
- Language Focus: Specialized for Japanese text summarization.
- Fine - Tuned: Based on the [google/mt5 - small](https://huggingface.co/google/mt5 - small) model, fine - tuned on specific Japanese news datasets.
š» Usage Examples
Basic Usage
from transformers import pipeline
seq2seq = pipeline("summarization", model="tsmatz/mt5_summarize_japanese")
sample_text = "ćµćć«ć¼ć®ćÆć¼ć«ćć«ććć«ćæć¼ć«å¤§ä¼ćäøēć©ć³ćć³ć°24ä½ć§ć°ć«ć¼ćEć«å±ććę„ę¬ćÆć23ę„ć®1ꬔćŖć¼ć°åę¦ć«ććć¦ćäøē11ä½ć§éå»4åć®åŖåćčŖććć¤ććØåƾę¦ćć¾ććć試åćÆååććć¤ćć®äøę¹ēćŖćć¼ć¹ć§ćÆćć¾ćć¾ććććå¾åćę„ę¬ć®ę£®äæē£ē£ćÆę»ęēćŖéøęćē©ę„µēć«åå”ćć¦ęµććå¤ćć¾ćććēµå±ćę„ę¬ćÆååć«1ē¹ć優ććć¾ććććéäøåŗå “ć®å å®å¾éøęćØęµ
éę磨éøęćå¾åć«ć“ć¼ć«ćę±ŗćć2対1ć§é転åć”ćć¾ćććć²ć¼ć ć®ęµććć¤ććć 森äæéé
ćåćå„ćć¾ććć"
result = seq2seq(sample_text)
print(result)
š Documentation
Training procedure
You can download the source code for fine - tuning from [here](https://github.com/tsmatz/huggingface - finetune - japanese/blob/master/02 - summarize.ipynb).
Training hyperparameters
The following hyperparameters were used during training:
Property |
Details |
learning_rate |
0.0005 |
train_batch_size |
2 |
eval_batch_size |
1 |
seed |
42 |
gradient_accumulation_steps |
16 |
total_train_batch_size |
32 |
optimizer |
Adam with betas=(0.9,0.999) and epsilon = 1e - 08 |
lr_scheduler_type |
linear |
lr_scheduler_warmup_steps |
90 |
num_epochs |
10 |
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Rouge1 |
Rouge2 |
Rougel |
Rougelsum |
4.2501 |
0.36 |
100 |
3.3685 |
0.3114 |
0.1654 |
0.2627 |
0.2694 |
3.6436 |
0.72 |
200 |
3.0095 |
0.3023 |
0.1634 |
0.2684 |
0.2764 |
3.3044 |
1.08 |
300 |
2.8025 |
0.3414 |
0.1789 |
0.2912 |
0.2984 |
3.2693 |
1.44 |
400 |
2.6284 |
0.3616 |
0.1935 |
0.2979 |
0.3132 |
3.2025 |
1.8 |
500 |
2.5271 |
0.3790 |
0.2042 |
0.3046 |
0.3192 |
2.9772 |
2.17 |
600 |
2.4203 |
0.4083 |
0.2374 |
0.3422 |
0.3542 |
2.9133 |
2.53 |
700 |
2.3863 |
0.3847 |
0.2096 |
0.3316 |
0.3406 |
2.9383 |
2.89 |
800 |
2.3573 |
0.4016 |
0.2297 |
0.3361 |
0.3500 |
2.7608 |
3.25 |
900 |
2.3223 |
0.3999 |
0.2249 |
0.3461 |
0.3566 |
2.7864 |
3.61 |
1000 |
2.2293 |
0.3932 |
0.2219 |
0.3297 |
0.3445 |
2.7846 |
3.97 |
1100 |
2.2097 |
0.4386 |
0.2617 |
0.3766 |
0.3826 |
2.7495 |
4.33 |
1200 |
2.1879 |
0.4100 |
0.2449 |
0.3481 |
0.3551 |
2.6092 |
4.69 |
1300 |
2.1515 |
0.4398 |
0.2714 |
0.3787 |
0.3842 |
2.5598 |
5.05 |
1400 |
2.1195 |
0.4366 |
0.2545 |
0.3621 |
0.3736 |
2.5283 |
5.41 |
1500 |
2.0637 |
0.4274 |
0.2551 |
0.3649 |
0.3753 |
2.5947 |
5.77 |
1600 |
2.0588 |
0.4454 |
0.2800 |
0.3828 |
0.3921 |
2.5354 |
6.14 |
1700 |
2.0357 |
0.4253 |
0.2582 |
0.3546 |
0.3687 |
2.5203 |
6.5 |
1800 |
2.0263 |
0.4444 |
0.2686 |
0.3648 |
0.3764 |
2.5303 |
6.86 |
1900 |
1.9926 |
0.4455 |
0.2771 |
0.3795 |
0.3948 |
2.4953 |
7.22 |
2000 |
1.9576 |
0.4523 |
0.2873 |
0.3869 |
0.4053 |
2.4271 |
7.58 |
2100 |
1.9384 |
0.4455 |
0.2811 |
0.3713 |
0.3862 |
2.4462 |
7.94 |
2200 |
1.9230 |
0.4530 |
0.2846 |
0.3754 |
0.3947 |
2.3303 |
8.3 |
2300 |
1.9311 |
0.4519 |
0.2814 |
0.3755 |
0.3887 |
2.3916 |
8.66 |
2400 |
1.9213 |
0.4598 |
0.2897 |
0.3688 |
0.3889 |
2.5995 |
9.03 |
2500 |
1.9060 |
0.4526 |
0.2820 |
0.3733 |
0.3946 |
2.3348 |
9.39 |
2600 |
1.9021 |
0.4595 |
0.2856 |
0.3762 |
0.3988 |
2.4035 |
9.74 |
2700 |
1.8952 |
0.4625 |
0.2866 |
0.3656 |
0.3868 |
Framework versions
Property |
Details |
Transformers |
4.23.1 |
Pytorch |
1.12.1+cu102 |
Datasets |
2.6.1 |
Tokenizers |
0.13.1 |
š License
This model is licensed under the Apache - 2.0 license.