M

Mrt5 Large

Developed by stanfordnlp
MrT5 is an efficient byte-level language model based on ByT5 improvements, reducing input sequence length by approximately 50% through dynamic token merging technology
Downloads 33
Release Time : 3/23/2025

Model Overview

MrT5 is an efficient improved version of ByT5, dynamically shortening input sequence lengths by integrating token deletion mechanisms in the encoder, providing a more efficient solution for byte-level models

Model Features

Dynamic Token Merging
Dynamically determines token retention or deletion through a learnable deletion gating mechanism, effectively reducing sequence length
Efficient Byte Processing
Directly processes UTF-8 byte streams without a tokenizer, supporting multilingual processing
Soft Deletion Training
Uses softmax1 attention mechanism and PI controller to achieve stable deletion rate control

Model Capabilities

Multilingual Text Generation
Sequence-to-Sequence Transformation
Efficient Byte-Level Processing

Use Cases

Academic Research
Language Model Efficiency Research
Used to study the impact of dynamic token merging on model efficiency
Sequence length reduced by an average of 50%
Natural Language Processing
Multilingual Text Generation
Supports text generation tasks in 15 languages
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase