🚀 mT5_ru_XLSum
This is a modification of the mT5_multilingual_XLSum model. The tokenizer has been shrunk to 32K (Russian + English) following David's Dale guide. It is designed for tasks such as summarization, dialogue summarization, and text2text generation.
🚀 Quick Start
Prerequisites
Make sure you have the transformers
library installed. If not, you can install it using the following command:
pip install transformers
Using this model in transformers
import re
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
WHITESPACE_HANDLER = lambda k: re.sub('\s+', ' ', re.sub('\n+', ' ', k.strip()))
article_text = """
С точки зрения банальной эрудиции, в аспекте призматической парадоксальности,
цинизм ваших слов в данной конспекции ассоциируется мистификацией парадоксальных иллюзий.
Разложим его семилинейным функционалом в матpицy пpямоyгольнyю тpоеpанговyю обводимyю
собственной неодноpодностью непpеpывно интегpиpyемой в pазpыв попеpек интеpвала
pасходимости кpиволинейным экстpемyмом чеpез область целостности!
"""
model_name = "Nehc/mT5_ru_XLSum"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
input_ids = tokenizer(
[WHITESPACE_HANDLER(article_text)],
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=512
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
max_length=84,
no_repeat_ngram_size=2,
num_beams=3
)[0]
summary = tokenizer.decode(
output_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(summary)
✨ Features
- Multilingual Support: Supports both Russian and English languages.
- Text Summarization: Capable of summarizing various types of texts, including jokes, diploma introductions, and biological information.
📦 Installation
To use this model, you need to install the transformers
library. You can install it using pip
:
pip install transformers
💻 Usage Examples
Basic Usage
The code example above demonstrates how to use the model for text summarization. You can replace the article_text
with your own text to get a summary.
Advanced Usage
You can adjust the parameters in the generate
method, such as max_length
, no_repeat_ngram_size
, and num_beams
, to control the length and quality of the summary.
📚 Documentation
Model Information
Property |
Details |
Model Type |
mT5-based text summarization model |
Training Data |
Not specified |
Widget Examples
The model comes with several widget examples to showcase its summarization capabilities:
- Russian Joke: A humorous text in Russian.
- Diploma Introduction: An introduction to a diploma topic related to the problem of text information overload and automatic summarization.
- Biological Info: A text about the discovery of a new species of millipede with over a thousand legs in Australian caves.
📄 License
This model is licensed under the CC BY-NC-SA 4.0 license.