Kannada - Sentence - Similarity - SBERT Open - Source Model - Calculate Kannada Sentence Similarity for Free

Home

Kannada Sentence Similarity Sbert

Developed by l3cube-pune

This is a Kannada SBERT model fine-tuned on the STS dataset for calculating sentence similarity.

Text Embedding

Transformers

Other#Kannada sentence similarity #Multilingual support #Semantic feature extraction

Downloads 15

Release Time : 2/25/2023

Model Overview

This model is a Sentence-BERT model fine-tuned on the Kannada NLI dataset, specifically designed for calculating similarity between Kannada sentences. Released as part of the MahaNLP project.

Model Features

Kannada Optimization

Specially optimized for Kannada, providing more accurate sentence similarity calculations

STS Fine-tuning

Fine-tuned using the Semantic Text Similarity (STS) dataset to optimize similarity calculation performance

Multilingual Support

Part of the multilingual Indian SBERT project, supporting cross-lingual similarity calculations with other Indian languages

Model Capabilities

Sentence feature extraction

Sentence similarity calculation

Semantic text similarity evaluation

Use Cases

Information Retrieval

🚀 KannadaSBERT-STS

This is a KannadaSBERT model (l3cube-pune/kannada-sentence-bert-nli) fine - tuned on the STS dataset. It is released as a part of project MahaNLP: https://github.com/l3cube-pune/MarathiNLP. A multilingual version of this model supporting major Indic languages and cross - lingual sentence similarity is shared here.

More details on the dataset, models, and baseline results can be found in our paper.

🚀 Quick Start

✨ Features

This is a fine - tuned KannadaSBERT model on the STS dataset.
It is part of the MahaNLP project.
There is a multilingual version supporting major Indic languages and cross - lingual sentence similarity.

📦 Installation

Using this model becomes easy when you have sentence - transformers installed:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage (Sentence - Transformers)

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

Advanced Usage (HuggingFace Transformers)

Without sentence - transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling - operation on - top of the contextualized word embeddings.

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 Documentation

Model Details: This is a KannadaSBERT model fine - tuned on the STS dataset.
Project Link: Released as a part of project MahaNLP: https://github.com/l3cube-pune/MarathiNLP.
Multilingual Version: A multilingual version supporting major Indic languages and cross - lingual sentence similarity is available here.
Paper Reference: More details can be found in our paper.

📄 License

This model is released under the cc - by - 4.0 license.

BibTeX References

@article{deode2023l3cube,
  title={L3Cube-IndicSBERT: A simple approach for learning cross-lingual sentence representations using multilingual BERT},
  author={Deode, Samruddhi and Gadre, Janhavi and Kajale, Aditi and Joshi, Ananya and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2304.11434},
  year={2023}
}

@article{joshi2022l3cubemahasbert,
  title={L3Cube-MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi},
  author={Joshi, Ananya and Kajale, Aditi and Gadre, Janhavi and Deode, Samruddhi and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2211.11187},
  year={2022}
}

Other Related Models

Monolingual Similarity Models:
Monolingual Indic Sentence BERT Models:

Widget Examples

Example 1:
- Source Sentence: "ನಮ್ಮ ಪರಿಸರದ ಬಗ್ಗೆ ನಾವು ಕಾಳಜಿ ವಹಿಸಬೇಕು"
- Comparison Sentences:
  - "ನಮ್ಮ ಪರಿಸರವನ್ನು ಸ್ವಚ್ಛವಾಗಿಟ್ಟುಕೊಳ್ಳೋಣ"
  - "ಜಾಗತಿಕ ತಾಪಮಾನವು ಗಂಭೀರ ಸಮಸ್ಯೆಯಾಗಿದೆ"
  - "ಹೆಚ್ಚು ಮರಗಳನ್ನು ನೆಡಿ"
Example 2:
- Source Sentence: "ಕೆಲವರು ಹಾಡುತ್ತಿದ್ದಾರೆ"
- Comparison Sentences:
  - "ಜನರ ಗುಂಪು ಹಾಡುತ್ತಿದೆ"
  - "ಬೆಕ್ಕು ಹಾಲು ಕುಡಿಯುತ್ತಿದೆ"
  - "ಇಬ್ಬರು ಪುರುಷರು ಜಗಳವಾಡುತ್ತಿದ್ದಾರೆ"
Example 3:
- Source Sentence: "ಫೆಡರರ್ ವಿಂಬಲ್ಡನ್ ಪ್ರಶಸ್ತಿ ಗೆದ್ದಿದ್ದಾರೆ"
- Comparison Sentences:
  - "ಫೆಡರರ್ ತಮ್ಮ ವೃತ್ತಿಜೀವನದಲ್ಲಿ ಒಟ್ಟು 20 ಗ್ರ್ಯಾನ್ ಸ್ಲಾಮ್ ಪ್ರಶಸ್ತಿಗಳನ್ನು ಗೆದ್ದಿದ್ದಾರೆ "
  - "ಫೆಡರರ್ ಸೆಪ್ಟೆಂಬರ್‌ನಲ್ಲಿ ನಿವೃತ್ತಿ ಘೋಷಿಸಿದರು"
  - "ಒಬ್ಬ ಮನುಷ್ಯ ಒಂದು ಪಾತ್ರೆಯಲ್ಲಿ ಸ್ವಲ್ಪ ಅಡುಗೆ ಎಣ್ಣೆಯನ್ನು ಸುರಿಯುತ್ತಾನೆ"

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご