Model Overview
Model Features
Model Capabilities
Use Cases
ЁЯЪА HindSBERT
HindSBERT is a model based on the HindBERT architecture (l3cube - pune/hindi - bert - v2), trained on the NLI dataset. It's part of the project MahaNLP, available at https://github.com/l3cube - pune/MarathiNLP. A multilingual version of this model, supporting major Indic languages and cross - lingual capabilities, can be found here indic - sentence - bert - nli . A better - performing sentence similarity model (a fine - tuned version of this model) is shared at https://huggingface.co/l3cube - pune/hindi - sentence - similarity - sbert.
Model Information
Property | Details |
---|---|
Pipeline Tag | Sentence - Similarity |
Tags | sentence - transformers, feature - extraction, sentence - similarity, transformers |
License | cc - by - 4.0 |
Language | hi |
Widget Examples
Example 1
- Source Sentence: "рдПрдХ рдЖрджрдореА рдПрдХ рд░рд╕реНрд╕реА рдкрд░ рдЪрдврд╝ рд░рд╣рд╛ рд╣реИ"
- Comparison Sentences:
- "рдПрдХ рдЖрджрдореА рдПрдХ рд░рд╕реНрд╕реА рдкрд░ рдЪрдврд╝рддрд╛ рд╣реИ"
- "рдПрдХ рдЖрджрдореА рдПрдХ рджреАрд╡рд╛рд░ рдкрд░ рдЪрдврд╝ рд░рд╣рд╛ рд╣реИ"
- "рдПрдХ рдЖрджрдореА рдмрд╛рдВрд╕реБрд░реА рдмрдЬрд╛рддрд╛ рд╣реИ"
Example 2
- Source Sentence: "рдХреБрдЫ рд▓реЛрдЧ рдЧрд╛ рд░рд╣реЗ рд╣реИрдВ"
- Comparison Sentences:
- "рд▓реЛрдЧреЛрдВ рдХрд╛ рдПрдХ рд╕рдореВрд╣ рдЧрд╛рддрд╛ рд╣реИ"
- "рдмрд┐рд▓реНрд▓реА рджреВрдз рдкреА рд░рд╣реА рд╣реИ"
- "рджреЛ рдЖрджрдореА рд▓рдбрд╝ рд░рд╣реЗ рд╣реИрдВ"
Example 3
- Source Sentence: "рдлреЗрдбрд░рд░ рдиреЗ 7рд╡рд╛рдВ рд╡рд┐рдВрдмрд▓рдбрди рдЦрд┐рддрд╛рдм рдЬреАрдд рд▓рд┐рдпрд╛ рд╣реИ"
- Comparison Sentences:
- "рдлреЗрдбрд░рд░ рдЕрдкрдиреЗ рдХрд░рд┐рдпрд░ рдореЗрдВ рдХреБрд▓ 20 рдЧреНрд░реИрдВрдбрд╕реНрд▓реИрдо рдЦрд┐рддрд╛рдм рдЬреАрдд рдЪреБрдХреЗ рд╣реИ "
- "рдлреЗрдбрд░рд░ рдиреЗ рд╕рд┐рддрдВрдмрд░ рдореЗрдВ рдЕрдкрдиреЗ рдирд┐рд╡реГрддреНрддрд┐ рдХреА рдШреЛрд╖рдгрд╛ рдХреА"
- "рдПрдХ рдЖрджрдореА рдХреБрдЫ рдЦрд╛рдирд╛ рдкрдХрд╛рдиреЗ рдХрд╛ рддреЗрд▓ рдПрдХ рдмрд░реНрддрди рдореЗрдВ рдбрд╛рд▓рддрд╛ рд╣реИ"
Related Papers
- More details on the dataset, models, and baseline results can be found in our paper
@article{joshi2022l3cubemahasbert,
title={L3Cube - MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi},
author={Joshi, Ananya and Kajale, Aditi and Gadre, Janhavi and Deode, Samruddhi and Joshi, Raviraj},
journal={arXiv preprint arXiv:2211.11187},
year={2022}
}
Other Related Models
Monolingual Indic Sentence BERT Models
- [Marathi SBERT](https://huggingface.co/l3cube - pune/marathi - sentence - bert - nli)
- [Hindi SBERT](https://huggingface.co/l3cube - pune/hindi - sentence - bert - nli)
- [Kannada SBERT](https://huggingface.co/l3cube - pune/kannada - sentence - bert - nli)
- [Telugu SBERT](https://huggingface.co/l3cube - pune/telugu - sentence - bert - nli)
- [Malayalam SBERT](https://huggingface.co/l3cube - pune/malayalam - sentence - bert - nli)
- [Tamil SBERT](https://huggingface.co/l3cube - pune/tamil - sentence - bert - nli)
- [Gujarati SBERT](https://huggingface.co/l3cube - pune/gujarati - sentence - bert - nli)
- [Oriya SBERT](https://huggingface.co/l3cube - pune/odia - sentence - bert - nli)
- [Bengali SBERT](https://huggingface.co/l3cube - pune/bengali - sentence - bert - nli)
- [Punjabi SBERT](https://huggingface.co/l3cube - pune/punjabi - sentence - bert - nli)
- [Indic SBERT (multilingual)](https://huggingface.co/l3cube - pune/indic - sentence - bert - nli)
Monolingual Similarity Models
- [Marathi Similarity](https://huggingface.co/l3cube - pune/marathi - sentence - similarity - sbert)
- [Hindi Similarity](https://huggingface.co/l3cube - pune/hindi - sentence - similarity - sbert)
- [Kannada Similarity](https://huggingface.co/l3cube - pune/kannada - sentence - similarity - sbert)
- [Telugu Similarity](https://huggingface.co/l3cube - pune/telugu - sentence - similarity - sbert)
- [Malayalam Similarity](https://huggingface.co/l3cube - pune/malayalam - sentence - similarity - sbert)
- [Tamil Similarity](https://huggingface.co/l3cube - pune/tamil - sentence - similarity - sbert)
- [Gujarati Similarity](https://huggingface.co/l3cube - pune/gujarati - sentence - similarity - sbert)
- [Oriya Similarity](https://huggingface.co/l3cube - pune/odia - sentence - similarity - sbert)
- [Bengali Similarity](https://huggingface.co/l3cube - pune/bengali - sentence - similarity - sbert)
- [Punjabi Similarity](https://huggingface.co/l3cube - pune/punjabi - sentence - similarity - sbert)
- [Indic Similarity (multilingual)](https://huggingface.co/l3cube - pune/indic - sentence - similarity - sbert)
ЁЯЪА Quick Start
This is a sentence - transformers model. It maps sentences and paragraphs to a 768 - dimensional dense vector space and can be used for tasks such as clustering or semantic search.
ЁЯУж Installation
If you have sentence - transformers installed, using this model is straightforward:
pip install -U sentence-transformers
ЁЯТ╗ Usage Examples
Basic Usage
Using Sentence - Transformers
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)
Using HuggingFace Transformers
Without sentence - transformers, you can use the model as follows: First, pass your input through the transformer model, then apply the appropriate pooling operation on top of the contextualized word embeddings.
from transformers import AutoTokenizer, AutoModel
import torch
def cls_pooling(model_output, attention_mask):
return model_output[0][:,0]
# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, cls pooling.
sentence_embeddings = cls_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
ЁЯУД License
This model is released under the cc - by - 4.0 license.





