Odia - Sentence - Similarity - Sbert Open - Source Model - Free Implementation of Odia Sentence Similarity Calculation

Home

Odia Sentence Similarity Sbert

Developed by l3cube-pune

This is an OdiaSBERT model fine-tuned on STS dataset for Odia sentence similarity calculation

Text Embedding

Transformers

Other#Odia sentence similarity #Multilingual support #Indian language processing

Downloads 19

Release Time : 2/25/2023

Model Overview

This model is an Odia language model based on the Sentence BERT architecture, specifically designed for calculating similarity between sentences. It is part of the MahaNLP project, supporting Odia sentence feature extraction and similarity comparison.

Model Features

Odia-specific

Sentence similarity model specifically optimized for Odia language

STS fine-tuned

Fine-tuned using Semantic Text Similarity (STS) dataset to improve similarity calculation accuracy

Multilingual support

As part of the multilingual Indian SBERT project, supports cross-language sentence similarity

Model Capabilities

Sentence feature extraction

Sentence similarity calculation

Semantic text similarity evaluation

Use Cases

Text similarity

Similar sentence identification

Identify different sentences expressing similar meanings

Can accurately calculate semantic similarity between sentences

Information retrieval

Improve semantic relevance of Odia search results

Search results based on semantics rather than keyword matching

Natural language processing

Text clustering

Group semantically similar texts together

Improves text clustering effectiveness

🚀 OdiaSBERT-STS

This is an OdiaSBERT model (l3cube-pune/odia-sentence-bert-nli) fine - tuned on the STS dataset. It is released as a part of project MahaNLP: https://github.com/l3cube-pune/MarathiNLP. A multilingual version of this model supporting major Indic languages and cross - lingual sentence similarity is shared here.

More details on the dataset, models, and baseline results can be found in our paper.

🚀 Quick Start

✨ Features

This model is an OdiaSBERT model fine - tuned on the STS dataset. It can be used for sentence similarity tasks. There is also a multilingual version supporting major Indic languages and cross - lingual sentence similarity.

📦 Installation

Using this model becomes easy when you have sentence - transformers installed:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

Using Sentence - Transformers

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

Advanced Usage

Using HuggingFace Transformers

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 Documentation

The model is based on the OdiaSBERT architecture and is fine - tuned on the STS dataset. It can be used to calculate sentence similarity. You can use it with the sentence - transformers library or directly with the transformers library from HuggingFace.

🔧 Technical Details

The model is an OdiaSBERT model fine - tuned on the STS dataset. When using the transformers library, it involves passing the input through the transformer model and then applying the mean pooling operation on the contextualized word embeddings to get the sentence embeddings.

📄 License

This model is released under the CC - BY - 4.0 license.

Additional Information

Widget Examples

The following are some examples of using the model for sentence similarity:

Example 1
- Source sentence: "লোকটি কুড়াল দিয়ে একটি গাছ কেটে ফেলল"
- Comparison sentences:
  - "একজন লোক কুড়াল দিয়ে একটি গাছের নিচে চপ করে"
  - "একজন লোক গিটার বাজছে"
  - "একজন মহিলা ঘোড়ায় চড়ে"
Example 2
- Source sentence: "একটি গোলাপী সাইকেল একটি বিল্ডিংয়ের সামনে রয়েছে"
- Comparison sentences:
  - "কিছু ধ্বংসাবশেষের সামনে একটি সাইকেল"
  - "গোলাপী দুটি ছোট মেয়ে নাচছে"
  - "ভেড়া গাছের লাইনের সামনে মাঠে চারণ করছে"
Example 3
- Source sentence: "আলোর গতি সসীম হওয়ার গতি আমাদের মহাবিশ্বের অন্যতম মৌলিক"
- Comparison sentences:
  - "আলোর গতি কত?"
  - "আলোর গতি সসীম"
  - "আলো মহাবিশ্বের দ্রুততম জিনিস"

Other Related Models

Monolingual Similarity Models:
Monolingual Indic Sentence BERT Models:

Citations

@article{deode2023l3cube,
  title={L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT},
  author={Deode, Samruddhi and Gadre, Janhavi and Kajale, Aditi and Joshi, Ananya and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2304.11434},
  year={2023}
}

@article{joshi2022l3cubemahasbert,
  title={L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi},
  author={Joshi, Ananya and Kajale, Aditi and Gadre, Janhavi and Deode, Samruddhi and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2211.11187},
  year={2022}
}

Related Papers

Property	Details
Pipeline Tag	sentence - similarity
Tags	sentence - transformers, feature - extraction, sentence - similarity, transformers
Model Type	OdiaSBERT fine - tuned on STS dataset
License	CC - BY - 4.0
Language	or

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご