đ Amazon MultiLingual Review Summarization with google mT5 Small
This model is designed for multilingual product review summarization, specifically fine - tuned from google/mt5-small on a multi - lingual Amazon reviews dataset. It offers efficient summarization for English and German reviews, with promising evaluation results.
đ Quick Start
This model is a fine - tuned version of google/mt5-small on an Multi Lingual Amazon Reviews dataset.
It achieves the following results on the evaluation set:
- Loss: 2.9368
- Model Preparation Time: 0.0038
- Rouge1: 16.1955
- Rouge2: 8.1292
- Rougel: 15.9218
- Rougelsum: 15.9516
⨠Features
- Multilingual Support: Capable of summarizing English and German product reviews.
- Fine - Tuned: Based on the google/mt5-small model, fine - tuned for better performance on Amazon reviews.
- Evaluated Metrics: Provides multiple evaluation metrics such as Rouge scores and loss values.
đ Documentation
Model description
google/mt5-small
Intended uses & limitations
Multilingual Product Review Summarization. Supported Languages: English and German
Training and evaluation data
The original multi - lingual Amazon product reviews dataset available on HuggingFace is defunct.
So, we use the version available at Kaggle.
The original dataset supports 6 languages: English, German, French, Spanish, Japanese, and Chamorro.
Each language has 20,000 training samples, 5,000 validation samples, and 5,000 testing samples.
We upload this dataset to HuggingFace hub at [srvmishra832/multilingual - amazon - reviews - 6 - languages](https://huggingface.co/datasets/srvmishra832/multilingual - amazon - reviews - 6 - languages)
Here, we only select the English and German language reviews for the pc
and electronics
product categories.
We use the review titles as summaries, and to prevent the model from generating very small summaries, we filter out those examples with extremely short review titles.
Finally, we downsample the resulting dataset so that training is feasible on the Google colab T4 GPU in a reasonable amount of time.
The final downsampled and concatenated dataset contains 8,000 training samples, 452 validation samples, and 422 test samples.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.6e - 05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon = 1e - 08 and optimizer_args = No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Model Preparation Time |
Rouge1 |
Rouge2 |
Rougel |
Rougelsum |
9.0889 |
1.0 |
500 |
3.4117 |
0.0038 |
12.541 |
5.1023 |
11.9039 |
11.8749 |
4.3977 |
2.0 |
1000 |
3.1900 |
0.0038 |
15.342 |
6.747 |
14.9223 |
14.8598 |
3.9595 |
3.0 |
1500 |
3.0817 |
0.0038 |
15.3976 |
6.2063 |
15.0635 |
15.069 |
3.7525 |
4.0 |
2000 |
3.0560 |
0.0038 |
15.7991 |
6.8536 |
15.4657 |
15.5263 |
3.6191 |
5.0 |
2500 |
3.0048 |
0.0038 |
16.3791 |
7.3671 |
16.0817 |
16.059 |
3.5155 |
6.0 |
3000 |
2.9779 |
0.0038 |
16.2311 |
7.5629 |
15.7492 |
15.758 |
3.4497 |
7.0 |
3500 |
2.9663 |
0.0038 |
16.2554 |
8.1464 |
15.9499 |
15.9152 |
3.3889 |
8.0 |
4000 |
2.9438 |
0.0038 |
16.5764 |
8.3698 |
16.3225 |
16.2848 |
3.3656 |
9.0 |
4500 |
2.9365 |
0.0038 |
16.1416 |
8.0266 |
15.8921 |
15.8913 |
3.3562 |
10.0 |
5000 |
2.9368 |
0.0038 |
16.1955 |
8.1292 |
15.9218 |
15.9516 |
Framework versions
- Transformers 4.50.0
- Pytorch 2.6.0+cu124
- Datasets 3.4.1
- Tokenizers 0.21.1
đ License
This project is licensed under the Apache - 2.0 license.