mT5_ru_XLSum Open-source Text Generation Model - Efficiently Generate Bilingual Summaries in Russian and English for Free

Mt5 Ru XLSum

Developed by Nehc

Modified version based on mT5_multilingual_XLSum, optimized for Russian-English bilingual summarization tasks

Supports Multiple Languages#Russian-English bilingual summarization #Multilingual text compression #Academic literature summarization

Downloads 43

Release Time : 2/28/2024

Model Overview

This is a multilingual T5 model optimized for Russian and English text summarization tasks, with enhanced performance on Russian tasks through a compact tokenizer.

Model Features

Bilingual optimization

Specifically optimized for Russian and English, with tokenizer vocabulary reduced to 32K

Summarization

Capable of generating high-quality summaries from Russian or English texts

Multi-domain applicability

Can handle summarization of various text types from academic papers to news articles

Model Capabilities

Text summarization

Dialogue summarization

Text compression

Multilingual processing

Use Cases

Academic research

Paper summarization

Automatically generate concise summaries of academic papers

Helps researchers quickly grasp core content

News media

News briefing generation

Extract key information from lengthy news reports to generate briefings

Improves news reading efficiency

Business analysis

Report summarization

Automatically generate executive summaries of business reports

Helps decision-makers quickly grasp key points

🚀 mT5_ru_XLSum

This is a modification of the mT5_multilingual_XLSum model. The tokenizer has been shrunk to 32K (Russian + English) following David's Dale guide. It is designed for tasks such as summarization, dialogue summarization, and text2text generation.

🚀 Quick Start

Prerequisites

Make sure you have the transformers library installed. If not, you can install it using the following command:

pip install transformers

Using this model in `transformers`

import re
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

WHITESPACE_HANDLER = lambda k: re.sub('\s+', ' ', re.sub('\n+', ' ', k.strip()))

article_text = """
С точки зрения банальной эрудиции, в аспекте призматической парадоксальности,
цинизм ваших слов в данной конспекции ассоциируется мистификацией парадоксальных иллюзий.
Разложим его семилинейным функционалом в матpицy пpямоyгольнyю тpоеpанговyю обводимyю
собственной неодноpодностью непpеpывно интегpиpyемой в pазpыв попеpек интеpвала
pасходимости кpиволинейным экстpемyмом чеpез область целостности!
"""

model_name = "Nehc/mT5_ru_XLSum"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

input_ids = tokenizer(
    [WHITESPACE_HANDLER(article_text)],
    return_tensors="pt",
    padding="max_length",
    truncation=True,
    max_length=512
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    max_length=84,
    no_repeat_ngram_size=2,
    num_beams=3
)[0]

summary = tokenizer.decode(
    output_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(summary)

✨ Features

Multilingual Support: Supports both Russian and English languages.
Text Summarization: Capable of summarizing various types of texts, including jokes, diploma introductions, and biological information.

📦 Installation

To use this model, you need to install the transformers library. You can install it using pip:

pip install transformers

💻 Usage Examples

Basic Usage

The code example above demonstrates how to use the model for text summarization. You can replace the article_text with your own text to get a summary.

Advanced Usage

You can adjust the parameters in the generate method, such as max_length, no_repeat_ngram_size, and num_beams, to control the length and quality of the summary.

📚 Documentation

Model Information

Property	Details
Model Type	mT5-based text summarization model
Training Data	Not specified

Widget Examples

The model comes with several widget examples to showcase its summarization capabilities:

Russian Joke: A humorous text in Russian.
Diploma Introduction: An introduction to a diploma topic related to the problem of text information overload and automatic summarization.
Biological Info: A text about the discovery of a new species of millipede with over a thousand legs in Australian caves.

📄 License

This model is licensed under the CC BY-NC-SA 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご