Umt5 Small
A unified multilingual T5 model pre-trained on the mC4 multilingual corpus, covering 107 languages
Downloads 17.35k
Release Time : 7/2/2023
Model Overview
UMT5 is a multilingual text generation model developed by Google, optimized with UniMax sampling strategy for language distribution, suitable for cross-lingual natural language processing tasks. Requires fine-tuning before use.
Model Features
UniMax Sampling Strategy
Optimizes language distribution by limiting corpus repetition, balancing coverage of head/tail languages
Multilingual Support
Covers 107 languages, including low-resource languages
Large-scale Pre-training
Based on the 29 trillion-character mC4 multilingual corpus
Model Capabilities
Multilingual Text Generation
Cross-lingual Transfer Learning
Zero-shot Learning (requires fine-tuning)
Use Cases
Natural Language Processing
Machine Translation
Achieves cross-lingual text conversion through fine-tuning
Multilingual Q&A Systems
Builds intelligent Q&A applications supporting multiple languages
Content Generation
Multilingual Content Creation
Generates marketing copy/news summaries in different languages
Featured Recommended AI Models
Š 2025AIbase