hindi - sentence - similarity - sbert open - source model, supporting the calculation of semantic similarity of Hindi sentences

Hindi Sentence Similarity Sbert

Developed by l3cube-pune

This is a Hindi sentence similarity model fine-tuned on the STS dataset, supporting semantic similarity calculation between Hindi sentences.

Text Embedding

Transformers

Other#Hindi Sentence Similarity #Multilingual Support #Semantic Search Optimization

Downloads 655

Release Time : 11/5/2022

Model Overview

This model is fine-tuned from the HindSBERT model for sentence similarity tasks, capable of mapping Hindi sentences into a 768-dimensional vector space for calculating semantic similarity between sentences.

Model Features

Hindi Optimization

Specially optimized for Hindi text, better handling the semantic features of Hindi sentences.

Sentence Similarity Calculation

Accurately calculates semantic similarity between Hindi sentences, suitable for applications like information retrieval and Q&A systems.

768-dimensional Vector Representation

Converts sentences into 768-dimensional dense vector representations, facilitating subsequent similarity calculations and clustering analysis.

Model Capabilities

Sentence Embedding

Semantic Similarity Calculation

Text Feature Extraction

Use Cases

Information Retrieval

🚀 HindSBERT-STS

This is a model designed for sentence similarity tasks. It is a fine - tuned version of the HindSBERT model ( l3cube - pune/hindi - sentence - bert - nli ) on the STS dataset. It was released as part of the project MahaNLP: https://github.com/l3cube - pune/MarathiNLP. A multilingual version of this model, which supports major Indic languages and cross - lingual sentence similarity, is available at indic - sentence - similarity - sbert .

🚀 Quick Start

Prerequisites

This is a sentence - transformers model. It maps sentences and paragraphs to a 768 - dimensional dense vector space and can be used for tasks such as clustering or semantic search.

Installation

Using this model becomes easy when you have sentence - transformers installed:

pip install -U sentence-transformers

Basic Usage

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

Advanced Usage

Without sentence - transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling - operation on top of the contextualized word embeddings.

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 Documentation

Model Details

More details on the dataset, models, and baseline results can be found in our paper

@article{joshi2022l3cubemahasbert,
  title={L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi},
  author={Joshi, Ananya and Kajale, Aditi and Gadre, Janhavi and Deode, Samruddhi and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2211.11187},
  year={2022}
}

Related Models

Monolingual Similarity Models

Monolingual Indic Sentence BERT Models

📄 License

This model is released under the cc - by - 4.0 license.

🔍 Widget Examples

Example 1

Source Sentence: "एक आदमी एक रस्सी पर चढ़ रहा है"
Comparison Sentences:
- "एक आदमी एक रस्सी पर चढ़ता है"
- "एक आदमी एक दीवार पर चढ़ रहा है"
- "एक आदमी बांसुरी बजाता है"

Example 2

Source Sentence: "कुछ लोग गा रहे हैं"
Comparison Sentences:
- "लोगों का एक समूह गाता है"
- "बिल्ली दूध पी रही है"
- "दो आदमी लड़ रहे हैं"

Example 3

Source Sentence: "फेडरर ने 7वां विंबलडन खिताब जीत लिया है"
Comparison Sentences:
- "फेडरर अपने करियर में कुल 20 ग्रैंडस्लैम खिताब जीत चुके है "
- "फेडरर ने सितंबर में अपने निवृत्ति की घोषणा की"
- "एक आदमी कुछ खाना पकाने का तेल एक बर्तन में डालता है"

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご