X

Xlm Roberta Capu

Developed by dragonSwing
XLM-RoBERTa fine-tuned model for Vietnamese punctuation restoration, capable of predicting punctuation and capitalization from lowercase text
Downloads 1,722
Release Time : 5/11/2022

Model Overview

This model is designed to restore punctuation marks and capitalization in Vietnamese text, suitable for speech recognition outputs or other text processing scenarios where punctuation is missing. Supports restoration of common punctuation marks (. , : ?) and capitalization of complex words.

Model Features

Multi-punctuation restoration
Supports restoration of four common punctuation marks: period, comma, colon, and question mark
Intelligent capitalization
Accurately restores capitalization of complex proper nouns such as YouTube, MobiFone, etc.
Long text processing
Can handle Vietnamese texts of any length with built-in chunk processing mechanism
High accuracy
Achieves 0.89 F1 score on test set, with proper noun recognition accuracy reaching 0.93

Model Capabilities

Text punctuation restoration
Case conversion
Vietnamese text processing
Speech recognition post-processing

Use Cases

Speech recognition post-processing
ASR output text normalization
Converts lowercase, punctuation-less text from speech recognition systems into standardized format
Improves readability and professionalism of ASR output texts
Text preprocessing
Social media text normalization
Processes non-standardized Vietnamese texts from social media
Converts informal texts to comply with formal writing standards
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase