FRED - T5 - 1.7B Open-Source Russian Language Model - Supports Multi-Scenario Text Processing, Free Deployment!

FRED T5 1.7B

Developed by ai-forever

Russian pre-trained language model based on T5 architecture, employing a UL2-like mixed training strategy with 7 denoising tasks, 1.7 billion parameters

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Russian text generation #Multi-task denoising #Large-scale pre-training

Downloads 1,671

Release Time : 1/20/2023

Model Overview

Large-scale pre-trained Transformer model for Russian, supporting various text generation and comprehension tasks

Model Features

Multi-task denoising training

Employs a UL2-like mixed training strategy with 7 denoising tasks to enhance model robustness

Large-scale Russian pre-training

Trained on 300GB Russian corpus, using the same dataset as ruT5 model

Prefix task tokens

Supports various prefix tokens like <LM>, <SC1>-<SC6> for different generation tasks

Model Capabilities

Russian text generation

Text denoising

Text completion

Text rewriting

Use Cases

Text generation

Story continuation

Generates coherent story content based on given beginnings

Successfully continued the background story of General Kutuzov in the example

Text completion

Missing text restoration

Completes masked text fragments based on context

The model correctly predicted that '<extra_id_0>' should be filled with 'combat experience' or 'path to generalship'

🚀 FRED-T5 1.7B (Full-scale Russian Enhanced Denoisers T5)

FRED-T5 1.7B is a powerful language model for the Russian language, based on the T5 architecture. It offers high - performance language processing capabilities trained on a large - scale Russian corpus.

The model architecture design, pretraining, and evaluation are documented in our preprint: A Family of Pretrained Transformer Language Models for Russian.

The model was trained by SberDevices.

📋 Model Information

Property	Details
Model Type	Based on T5 architecture
Layers	24 layers
Hidden Size	1536
Training Data	Trained on a 300GB Russian language corpus, same as ruT5 models
Tokenizer	Bbpe tokenizer, 50257 + 107 special tokens. Prefix tokens: '<LM>', '<SC1>',.. '<SC6>'
Training Process	First half of the time, trained on 1% (3GB) of the dataset without prefixes in each task. For RSG, trained as described in the T5 paper (first multitask for all tasks, then took the best checkpoint for the task and trained further). RSG submit here https://russiansuperglue.com/login/submit_info/1936
Total Training Time	Around 45 days on 112 A100 GPUs

🚀 Quick Start

The model is based on the T5 architecture, with specific training processes and configurations. More details can be found in the config.json file.

💻 Usage Examples

Basic Usage

import torch
from transformers import GPT2Tokenizer, T5ForConditionalGeneration 
tokenizer = GPT2Tokenizer.from_pretrained('ai-forever/FRED-T5-1.7B',eos_token='</s>')
model = T5ForConditionalGeneration.from_pretrained('ai-forever/FRED-T5-1.7B')
device='cuda'
model.to(device)

#Prefix <LM>
lm_text='<LM>Принялся Кутузов рассказывать свою историю как он сюда попал. Началось'
input_ids=torch.tensor([tokenizer.encode(lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True)
print(tokenizer.decode(outputs[0][1:]))

# print result: с того, что он был в армии, служил в артиллерии</s>.

#Prefix <SC1>
lm_text='<SC1>Принялся Кутузов рассказывать свою историю <extra_id_0>. Началось с того, что он был в армии, служил в артиллерии.'
input_ids=torch.tensor([tokenizer.encode(lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True)
print(tokenizer.decode(outputs[0][1:]))

#print result: '<extra_id_0>, как он воевал</s>'

# Prefix <SC5> 
lm_text='<SC5>Принялся Кутузов рассказывать свою историю <extra_id_0>. Началось с того, что он был в армии, служил в артиллерии.'
input_ids=torch.tensor([tokenizer.encode(lm_text)]).to(device)
outputs=model.generate(input_ids,eos_token_id=tokenizer.eos_token_id,early_stopping=True)
tokenizer.decode(outputs[0][1:])

#print result: '<extra_id_0>, как он стал генералом</s>'

📚 Documentation

The model was trained on a mixture of 7 denoisers similar to UL2 with several differences (https://arxiv.org/abs/2205.05131).

👨‍💻 Authors

NLP core team RnD Telegram channel:
- Dmitry Zmitrovich
- Andrei Kalmykov
- Vitaly Kadulin
- Mikhail Novikov
- Alexey Khoroshilov

Salute AI Community.

📄 License

This project is licensed under the Apache-2.0 license.

📖 Cite us

@misc{zmitrovich2023family,
      title={A Family of Pretrained Transformer Language Models for Russian}, 
      author={Dmitry Zmitrovich and Alexander Abramov and Andrey Kalmykov and Maria Tikhonova and Ekaterina Taktasheva and Danil Astafurov and Mark Baushenko and Artem Snegirev and Tatiana Shavrina and Sergey Markov and Vladislav Mikhailov and Alena Fenogenova},
      year={2023},
      eprint={2309.10931},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご