R

Roberta Hindi Guj San

Developed by surajp
A multilingual RoBERTa-style model trained on Hindi, Sanskrit, and Gujarati Wikipedia articles, supporting processing for three Indo-Aryan languages.
Downloads 51
Release Time : 3/2/2022

Model Overview

This model employs a phased training strategy, first pretrained on Hindi and then fine-tuned on mixed Sanskrit and Gujarati texts, aiming to enhance multilingual processing capabilities by leveraging linguistic similarities.

Model Features

Multilingual joint training
Achieves joint modeling of three Indo-Aryan languages through shared vocabulary and phased training strategy
Transfer learning optimization
Pretrained on Hindi first, then fine-tuned on other languages to enhance performance using linguistic similarities
Efficient tokenizer
Unified tokenizer trained on merged texts, supporting mixed-language processing for all three languages

Model Capabilities

Text infilling
Language modeling
Multilingual text understanding

Use Cases

Education
Gujarati grammar checking
Automatically detects and corrects syntactic errors in Gujarati sentences
Examples show accurate prediction of missing sentence components
Cultural preservation
Sanskrit ancient text digitization
Assists in machine processing and understanding of ancient Sanskrit literature
Featured Recommended AI Models
ยฉ 2025AIbase