Mt5 Chinese Small
M
Mt5 Chinese Small
Developed by yihsuan
An abstract generation model fine-tuned from mT5-small, supporting Chinese text summarization tasks
Downloads 36
Release Time : 4/26/2022
Model Overview
This model is a fine-tuned version of google/mt5-small on Chinese datasets, specifically designed for generating Chinese text summaries.
Model Features
Chinese Abstract Generation
Optimized summarization capability specifically for Chinese text
Based on mT5 Architecture
Utilizes the multilingual T5 model architecture with strong cross-language transfer capabilities
Lightweight Model
Small model size, suitable for deployment in resource-constrained environments
Model Capabilities
Chinese text summarization
Multi-sentence compression
Key information extraction
Use Cases
News Summarization
News Article Summarization
Automatically generates key summaries of news articles
ROUGE-L score 18.6
Scientific Literature
Research Paper Summarization
Generates brief overviews of research papers
đ best_model_test_0423_small
This model is a fine - tuned text summarization model based on google/mt5 - small
. It can effectively summarize text and has achieved good results on the evaluation set, providing a reliable solution for text summarization tasks.
đ Quick Start
This model is a fine - tuned version of [google/mt5 - small](https://huggingface.co/google/mt5 - small) on the None dataset. It achieves the following results on the evaluation set:
- Loss: 2.6341
- Rouge1: 18.7681
- Rouge2: 6.3762
- Rougel: 18.6081
- Rougelsum: 18.6173
- Gen Len: 22.1086
⨠Features
- Summarization: It can perform text summarization tasks.
- Metrics: Evaluated using ROUGE metrics.
đ Documentation
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9, 0.999) and epsilon = 1e - 08
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
5.8165 | 0.05 | 1000 | 3.6541 | 11.6734 | 3.9865 | 11.5734 | 11.5375 | 18.0056 |
4.306 | 0.1 | 2000 | 3.4291 | 12.0417 | 3.8419 | 11.9231 | 11.9223 | 16.8948 |
4.1091 | 0.16 | 3000 | 3.3643 | 13.661 | 4.5171 | 13.5123 | 13.5076 | 19.4016 |
3.9637 | 0.21 | 4000 | 3.2574 | 13.8443 | 4.1761 | 13.689 | 13.6927 | 18.4288 |
3.8205 | 0.26 | 5000 | 3.2434 | 13.5371 | 4.3639 | 13.3551 | 13.3552 | 21.5776 |
3.7262 | 0.31 | 6000 | 3.1690 | 14.3668 | 4.8048 | 14.2191 | 14.1906 | 21.5548 |
3.6887 | 0.36 | 7000 | 3.0657 | 14.3265 | 4.436 | 14.212 | 14.205 | 20.89 |
3.6337 | 0.42 | 8000 | 3.0318 | 14.6809 | 4.8345 | 14.5378 | 14.5331 | 20.3651 |
3.5443 | 0.47 | 9000 | 3.0554 | 15.3372 | 4.9163 | 15.1794 | 15.1781 | 21.7742 |
3.5203 | 0.52 | 10000 | 2.9793 | 14.9278 | 4.9656 | 14.7491 | 14.743 | 20.8113 |
3.4936 | 0.57 | 11000 | 3.0079 | 15.7705 | 5.1453 | 15.5582 | 15.5756 | 23.4274 |
3.4592 | 0.62 | 12000 | 2.9721 | 15.0201 | 5.1612 | 14.8508 | 14.8198 | 22.7007 |
3.377 | 0.67 | 13000 | 3.0112 | 15.9595 | 5.1133 | 15.78 | 15.7774 | 23.4427 |
3.4158 | 0.73 | 14000 | 2.9239 | 14.7984 | 5.051 | 14.6943 | 14.6581 | 21.6009 |
3.378 | 0.78 | 15000 | 2.8897 | 16.5128 | 5.1923 | 16.3523 | 16.3265 | 22.0828 |
3.3231 | 0.83 | 16000 | 2.9347 | 16.9997 | 5.5524 | 16.8534 | 16.8737 | 22.5807 |
3.3268 | 0.88 | 17000 | 2.9116 | 16.0261 | 5.4226 | 15.9234 | 15.914 | 23.6988 |
3.3127 | 0.93 | 18000 | 2.8610 | 16.6255 | 5.3554 | 16.4729 | 16.4569 | 22.9481 |
3.2664 | 0.99 | 19000 | 2.8606 | 17.7703 | 5.9475 | 17.6229 | 17.6259 | 23.4423 |
3.1718 | 1.04 | 20000 | 2.8764 | 17.301 | 5.6262 | 17.122 | 17.1104 | 23.0093 |
3.0987 | 1.09 | 21000 | 2.8282 | 16.4718 | 5.2077 | 16.3394 | 16.3401 | 20.9697 |
3.1486 | 1.14 | 22000 | 2.8235 | 18.5594 | 5.9469 | 18.3882 | 18.3799 | 22.7291 |
3.1435 | 1.19 | 23000 | 2.8261 | 18.111 | 6.0309 | 17.9593 | 17.9613 | 22.9612 |
3.1049 | 1.25 | 24000 | 2.8068 | 17.124 | 5.5675 | 16.9714 | 16.9876 | 22.5558 |
3.1357 | 1.3 | 25000 | 2.8014 | 17.3916 | 5.8671 | 17.2148 | 17.2502 | 23.0075 |
3.0904 | 1.35 | 26000 | 2.7790 | 17.419 | 5.6689 | 17.3125 | 17.3058 | 22.1492 |
3.0877 | 1.4 | 27000 | 2.7462 | 17.0605 | 5.4735 | 16.9414 | 16.9378 | 21.7522 |
3.0694 | 1.45 | 28000 | 2.7563 | 17.752 | 5.8889 | 17.5967 | 17.619 | 23.2005 |
3.0498 | 1.51 | 29000 | 2.7521 | 17.9056 | 5.7754 | 17.7624 | 17.7836 | 21.9369 |
3.0566 | 1.56 | 30000 | 2.7468 | 18.6531 | 6.0538 | 18.5397 | 18.5038 | 22.2358 |
3.0489 | 1.61 | 31000 | 2.7450 | 18.4869 | 5.9297 | 18.3139 | 18.3169 | 22.0108 |
3.0247 | 1.66 | 32000 | 2.7449 | 18.5192 | 5.9966 | 18.3721 | 18.3569 | 22.2071 |
2.9877 | 1.71 | 33000 | 2.7160 | 18.1655 | 5.9294 | 18.0304 | 18.0836 | 21.4595 |
3.0383 | 1.76 | 34000 | 2.7202 | 18.4959 | 6.2413 | 18.3363 | 18.3431 | 22.9732 |
3.041 | 1.82 | 35000 | 2.6948 | 17.5306 | 5.8119 | 17.4011 | 17.4149 | 21.9435 |
2.9285 | 1.87 | 36000 | 2.6957 | 18.6418 | 6.1394 | 18.514 | 18.4823 | 22.5174 |
3.0556 | 1.92 | 37000 | 2.7000 | 18.7387 | 6.0585 | 18.5761 | 18.574 | 22.9315 |
3.0033 | 1.97 | 38000 | 2.6974 | 17.9387 | 6.1387 | 17.8271 | 17.8111 | 22.4726 |
2.9207 | 2.02 | 39000 | 2.6998 | 18.6073 | 6.1906 | 18.3891 | 18.4103 | 23.0274 |
2.8922 | 2.08 | 40000 | 2.6798 | 18.4017 | 6.2244 | 18.2321 | 18.2296 | 22.0697 |
2.8938 | 2.13 | 41000 | 2.6666 | 18.8016 | 6.2066 | 18.6411 | 18.6353 | 21.7017 |
2.9124 | 2.18 | 42000 | 2.6606 | 18.7544 | 6.3533 | 18.5923 | 18.5739 | 21.4303 |
2.8597 | 2.23 | 43000 | 2.6947 | 18.8672 | 6.4526 | 18.7416 | 18.7482 | 22.3352 |
2.8435 | 2.28 | 44000 | 2.6738 | 18.9405 | 6.356 | 18.7791 | 18.7729 | 21.9081 |
2.8672 | 2.34 | 45000 | 2.6734 | 18.7509 | 6.3991 | 18.6175 | 18.5828 | 21.8869 |
2.899 | 2.39 | 46000 | 2.6575 | 18.5529 | 6.3489 | 18.4139 | 18.401 | 21.7694 |
2.8616 | 2.44 | 47000 | 2.6485 | 18.7563 | 6.268 | 18.6368 | 18.6253 | 21.5685 |
2.8937 | 2.49 | 48000 | 2.6486 | 18.6525 | 6.3426 | 18.5184 | 18.5129 | 22.3337 |
2.8446 | 2.54 | 49000 | 2.6572 | 18.6529 | 6.2655 | 18.4915 | 18.4764 | 22.3331 |
2.8676 | 2.59 | 50000 | 2.6608 | 19.0913 | 6.494 | 18.929 | 18.9233 | 22.132 |
2.8794 | 2.65 | 51000 | 2.6583 | 18.7648 | 6.459 | 18.6276 | 18.6125 | 22.2414 |
2.8836 | 2.7 | 52000 | 2.6512 | 18.7243 | 6.3865 | 18.5848 | 18.5763 | 22.2551 |
2.8174 | 2.75 | 53000 | 2.6409 | 18.9393 | 6.3914 | 18.7733 | 18.7715 | 22.1243 |
2.8494 | 2.8 | 54000 | 2.6396 | 18.6126 | 6.4389 | 18.4673 | 18.4516 | 21.7638 |
2.9025 | 2.85 | 55000 | 2.6341 | 18.7681 | 6.3762 | 18.6081 | 18.6173 | 22.1086 |
2.8754 | 2.91 | 56000 | 2.6388 | 19.0828 | 6.5203 | 18.9334 | 18.9285 | 22.3497 |
2.8489 | 2.96 | 57000 | 2.6375 | 18.9219 | 6.4922 | 18.763 | 18.7437 | 21.9321 |
Framework versions
- Transformers 4.18.0
- Pytorch 1.10.1+cu113
- Datasets 2.0.0
- Tokenizers 0.11.6
đ License
The model is licensed under the Apache - 2.0 license.
Bart Large Cnn
MIT
BART model pre-trained on English corpus, specifically fine-tuned for the CNN/Daily Mail dataset, suitable for text summarization tasks
Text Generation English
B
facebook
3.8M
1,364
Parrot Paraphraser On T5
Parrot is a T5-based paraphrasing framework designed to accelerate the training of Natural Language Understanding (NLU) models through high-quality paraphrase generation for data augmentation.
Text Generation
Transformers

P
prithivida
910.07k
152
Distilbart Cnn 12 6
Apache-2.0
DistilBART is a distilled version of the BART model, specifically optimized for text summarization tasks, significantly improving inference speed while maintaining high performance.
Text Generation English
D
sshleifer
783.96k
278
T5 Base Summarization Claim Extractor
A T5-based model specialized in extracting atomic claims from summary texts, serving as a key component in summary factuality assessment pipelines.
Text Generation
Transformers English

T
Babelscape
666.36k
9
Unieval Sum
UniEval is a unified multidimensional evaluator for automatic evaluation of natural language generation tasks, supporting assessment across multiple interpretable dimensions.
Text Generation
Transformers

U
MingZhong
318.08k
3
Pegasus Paraphrase
Apache-2.0
A text paraphrasing model fine-tuned based on the PEGASUS architecture, capable of generating sentences with the same meaning but different expressions.
Text Generation
Transformers English

P
tuner007
209.03k
185
T5 Base Korean Summarization
This is a Korean text summarization model based on the T5 architecture, specifically designed for Korean text summarization tasks. It is trained on multiple Korean datasets by fine-tuning the paust/pko-t5-base model.
Text Generation
Transformers Korean

T
eenzeenee
148.32k
25
Pegasus Xsum
PEGASUS is a Transformer-based pretrained model specifically designed for abstractive text summarization tasks.
Text Generation English
P
google
144.72k
198
Bart Large Cnn Samsum
MIT
A dialogue summarization model based on the BART-large architecture, fine-tuned specifically for the SAMSum corpus, suitable for generating dialogue summaries.
Text Generation
Transformers English

B
philschmid
141.28k
258
Kobart Summarization
MIT
A Korean text summarization model based on the KoBART architecture, capable of generating concise summaries of Korean news articles.
Text Generation
Transformers Korean

K
gogamza
119.18k
12
Featured Recommended AI Models
Š 2025AIbase