I

Indicbertv2 MLM Only

Developed by ai4bharat
IndicBERT is a multilingual language model that supports 23 Indian languages and English, with 278 million parameters. It is trained on IndicCorp v2 and evaluated on the IndicXTREME benchmark.
Downloads 87.60k
Release Time : 11/13/2022

Model Overview

IndicBERT is a multilingual BERT-style model focused on Indian language processing. It is optimized through multiple training objectives and datasets and supports the fill-mask task.

Model Features

Multilingual support
Supports 23 Indian languages and English, covering multiple language families.
Multiple training objectives
Trained through multiple objectives such as MLM, TLM, and back-translation to improve model performance.
Optimization of vocabulary sharing
The IndicBERT-SS version promotes better vocabulary sharing between languages through script conversion.

Model Capabilities

Multilingual text understanding
Handling of fill-mask tasks
Cross-lingual transfer learning

Use Cases

Natural language understanding
Named entity recognition
Identify named entities in multiple Indian languages.
Sentiment analysis
Analyze the sentiment tendency of Indian language texts.
Machine translation assistance
Enhancement of parallel corpora
Improve the performance of machine translation models through TLM training.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase