IndicNER Open-Source Named Entity Recognition Model - Free to Recognize Entities in Sentences of 11 Indian Languages

Indicner

Developed by ai4bharat

IndicNER is a model specifically trained for recognizing named entities in sentences of 11 Indian languages, fine-tuned based on the bert-base-multilingual-uncased model.

Sequence Labeling

Transformers

OtherOpen Source License:MIT #Indian Language NER #Multilingual Entity Recognition #Low-resource Language Processing

Downloads 45.85k

Release Time : 5/23/2022

Model Overview

This model is used to identify named entities in Indian language sentences, supporting 11 Indian languages including Assamese, Bengali, and others.

Model Features

Multilingual Support

Supports named entity recognition in 11 Indian languages.

Trained on Large-scale Corpus

Trained using datasets from the Samanantar corpus.

Publicly Available

Model and code are released under the MIT license.

Model Capabilities

Indian Language Named Entity Recognition

Multilingual Text Processing

Use Cases

Natural Language Processing

Indian Language Text Analysis

Used for processing and analyzing named entities in Indian language texts.

🚀 IndicNER

IndicNER is a model designed to identify named entities in Indian languages. It is fine - tuned on 11 Indian languages with millions of sentences and benchmarked on multiple datasets.

🚀 Quick Start

IndicNER is a model trained to complete the task of identifying named entities from sentences in Indian languages. Our model is specifically fine - tuned to the 11 Indian languages mentioned above over millions of sentences. The model is then benchmarked over a human annotated testset and multiple other publicly available Indian NER datasets. The 11 languages covered by IndicNER are: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.

✨ Features

Multilingual Support: Covers 11 Indian languages, including Assamese, Bengali, Gujarati, etc.
Trained on Large Datasets: Fine - tuned over millions of sentences and benchmarked on multiple datasets.

📦 Installation

No specific installation steps are provided in the original document.

📚 Documentation

Training Corpus

Our model was trained on a dataset which we mined from the existing Samanantar Corpus. We used a bert - base - multilingual - uncased model as the starting point and then fine - tuned it to the NER dataset mentioned previously.

Downloads

Download from this same Huggingface repo.

Update 20 Dec 2022: We released a new paper documenting IndicNER and Naamapadam. We have a different model reported in the paper. We will update the repo here soon with this model.

Usage

You can use [this Colab notebook](https://colab.research.google.com/drive/1sYa - PDdZQ_c9SzUgnhyb3Fl7j96QBCS8?usp=sharing) for samples on using IndicNER or for finetuning a pre - trained model on Naampadam dataset to build your own NER models.

Citing

If you are using IndicNER, please cite the following article:

@misc{mhaske2022naamapadam,
  doi = {10.48550/ARXIV.2212.10168},
  url = {https://arxiv.org/abs/2212.10168},
  author = {Mhaske, Arnav and Kedia, Harshit and Doddapaneni, Sumanth and Khapra, Mitesh M. and Kumar, Pratyush and Murthy, Rudra and Kunchukuttan, Anoop},
  title = {Naamapadam: A Large - Scale Named Entity Annotated Data for Indic Languages},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non - exclusive license}
}

We would like to hear from you if:

You are using our resources. Please let us know how you are putting these resources to use.
You have any feedback on these resources.

License

The IndicNER code (and models) are released under the MIT License.

Contributors

Arnav Mhaske _{(AI4Bharat, IITM)}
Harshit Kedia _{(AI4Bharat, IITM)}
Sumanth Doddapaneni _{(AI4Bharat, IITM)}
Mitesh M. Khapra _{(AI4Bharat, IITM)}
Pratyush Kumar _{(AI4Bharat, [Microsoft](https://www.microsoft.com/en - in/), IITM)}
Rudra Murthy _{(AI4Bharat, IBM)}
Anoop Kunchukuttan _{(AI4Bharat, [Microsoft](https://www.microsoft.com/en - in/), IITM)}

This work is the outcome of a volunteer effort as part of the AI4Bharat initiative.

Contact

Anoop Kunchukuttan (anoop.kunchukuttan@gmail.com)
Rudra Murthy V (rmurthyv@in.ibm.com)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご