U

Umt5 Xl

Developed by google
A multilingual text generation model pretrained on the mC4 multilingual corpus, supporting 107 languages
Downloads 1,049
Release Time : 7/2/2023

Model Overview

UMT5 is a multilingual variant of the T5 model developed by Google, optimized for balanced language distribution using UniMax sampling. Suitable for cross-lingual text generation and understanding tasks. Requires fine-tuning before use.

Model Features

UniMax Sampling Technique
Achieves fairer language distribution by limiting corpus repetition, improving performance for tail languages
Large-scale Multilingual Support
Covers 107 languages, including low-resource languages such as Hmong and Hawaiian
Enhanced mC4 Corpus
Trained on 29 trillion characters of cleaned multilingual data

Model Capabilities

Multilingual text generation
Cross-lingual transfer learning
Text understanding
Foundation model for machine translation

Use Cases

Natural Language Processing
Multilingual Text Summarization
Supports text summarization generation in over a hundred languages
Low-resource Language Processing
Provides foundational support for low-resource languages in Africa, Southeast Asia, etc.
Educational Technology
Language Learning Tools
Can serve as the underlying engine for multilingual learning applications
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase