Hindi-Sentence-BERT-NLI Open-Source Hindi Model - Freely Calculate Sentence Similarity and Extract Features

Home

Hindi Sentence Bert Nli

Developed by l3cube-pune

Hindi BERT model trained on NLI dataset for sentence similarity calculation and feature extraction

Text Embedding

Transformers

Other#Hindi sentence similarity #Multilingual support #NLI training

Downloads 93

Release Time : 11/11/2022

Model Overview

This is a Hindi BERT model trained on NLI dataset, capable of mapping sentences and paragraphs into a 768-dimensional dense vector space, suitable for tasks like clustering or semantic search.

Model Features

Multilingual support

The multilingual version of the model supports major Indian languages and cross-lingual capabilities

Sentence similarity calculation

Accurately calculates semantic similarity between sentences

Feature extraction

Converts text into 768-dimensional dense vector representations

Model Capabilities

Sentence similarity calculation

Text feature extraction

Semantic search

Text clustering

Use Cases

Information retrieval

Semantic search

Using sentence vectors for semantic similarity search

Improves the relevance of search results

Text analysis

Text clustering

Cluster analysis of texts based on sentence vectors

Identifies themes or patterns in texts

🚀 HindSBERT

HindSBERT is a model based on the HindBERT architecture (l3cube - pune/hindi - bert - v2), trained on the NLI dataset. It's part of the project MahaNLP, available at https://github.com/l3cube - pune/MarathiNLP. A multilingual version of this model, supporting major Indic languages and cross - lingual capabilities, can be found here indic - sentence - bert - nli . A better - performing sentence similarity model (a fine - tuned version of this model) is shared at https://huggingface.co/l3cube - pune/hindi - sentence - similarity - sbert.

Model Information

Property	Details
Pipeline Tag	Sentence - Similarity
Tags	sentence - transformers, feature - extraction, sentence - similarity, transformers
License	cc - by - 4.0
Language	hi

Widget Examples

Example 1

Source Sentence: "एक आदमी एक रस्सी पर चढ़ रहा है"
Comparison Sentences:
- "एक आदमी एक रस्सी पर चढ़ता है"
- "एक आदमी एक दीवार पर चढ़ रहा है"
- "एक आदमी बांसुरी बजाता है"

Example 2

Source Sentence: "कुछ लोग गा रहे हैं"
Comparison Sentences:
- "लोगों का एक समूह गाता है"
- "बिल्ली दूध पी रही है"
- "दो आदमी लड़ रहे हैं"

Example 3

Source Sentence: "फेडरर ने 7वां विंबलडन खिताब जीत लिया है"
Comparison Sentences:
- "फेडरर अपने करियर में कुल 20 ग्रैंडस्लैम खिताब जीत चुके है "
- "फेडरर ने सितंबर में अपने निवृत्ति की घोषणा की"
- "एक आदमी कुछ खाना पकाने का तेल एक बर्तन में डालता है"

Related Papers

More details on the dataset, models, and baseline results can be found in our paper

@article{joshi2022l3cubemahasbert,
  title={L3Cube - MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi},
  author={Joshi, Ananya and Kajale, Aditi and Gadre, Janhavi and Deode, Samruddhi and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2211.11187},
  year={2022}
}

Other Related Models

Monolingual Indic Sentence BERT Models

[Marathi SBERT](https://huggingface.co/l3cube - pune/marathi - sentence - bert - nli)
[Hindi SBERT](https://huggingface.co/l3cube - pune/hindi - sentence - bert - nli)
[Kannada SBERT](https://huggingface.co/l3cube - pune/kannada - sentence - bert - nli)
[Telugu SBERT](https://huggingface.co/l3cube - pune/telugu - sentence - bert - nli)
[Malayalam SBERT](https://huggingface.co/l3cube - pune/malayalam - sentence - bert - nli)
[Tamil SBERT](https://huggingface.co/l3cube - pune/tamil - sentence - bert - nli)
[Gujarati SBERT](https://huggingface.co/l3cube - pune/gujarati - sentence - bert - nli)
[Oriya SBERT](https://huggingface.co/l3cube - pune/odia - sentence - bert - nli)
[Bengali SBERT](https://huggingface.co/l3cube - pune/bengali - sentence - bert - nli)
[Punjabi SBERT](https://huggingface.co/l3cube - pune/punjabi - sentence - bert - nli)
[Indic SBERT (multilingual)](https://huggingface.co/l3cube - pune/indic - sentence - bert - nli)

Monolingual Similarity Models

[Marathi Similarity](https://huggingface.co/l3cube - pune/marathi - sentence - similarity - sbert)
[Hindi Similarity](https://huggingface.co/l3cube - pune/hindi - sentence - similarity - sbert)
[Kannada Similarity](https://huggingface.co/l3cube - pune/kannada - sentence - similarity - sbert)
[Telugu Similarity](https://huggingface.co/l3cube - pune/telugu - sentence - similarity - sbert)
[Malayalam Similarity](https://huggingface.co/l3cube - pune/malayalam - sentence - similarity - sbert)
[Tamil Similarity](https://huggingface.co/l3cube - pune/tamil - sentence - similarity - sbert)
[Gujarati Similarity](https://huggingface.co/l3cube - pune/gujarati - sentence - similarity - sbert)
[Oriya Similarity](https://huggingface.co/l3cube - pune/odia - sentence - similarity - sbert)
[Bengali Similarity](https://huggingface.co/l3cube - pune/bengali - sentence - similarity - sbert)
[Punjabi Similarity](https://huggingface.co/l3cube - pune/punjabi - sentence - similarity - sbert)
[Indic Similarity (multilingual)](https://huggingface.co/l3cube - pune/indic - sentence - similarity - sbert)

🚀 Quick Start

This is a sentence - transformers model. It maps sentences and paragraphs to a 768 - dimensional dense vector space and can be used for tasks such as clustering or semantic search.

📦 Installation

If you have sentence - transformers installed, using this model is straightforward:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

Using Sentence - Transformers

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

Using HuggingFace Transformers

Without sentence - transformers, you can use the model as follows: First, pass your input through the transformer model, then apply the appropriate pooling operation on top of the contextualized word embeddings.

from transformers import AutoTokenizer, AutoModel
import torch


def cls_pooling(model_output, attention_mask):
    return model_output[0][:,0]


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, cls pooling.
sentence_embeddings = cls_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📄 License

This model is released under the cc - by - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご