Punjabi - Sentence - BERT - NLI Open - source Model - Free for Punjabi Sentence Similarity Calculation and Feature Extraction

Home

Punjabi Sentence Bert Nli

Developed by l3cube-pune

This is a Punjabi BERT model trained on NLI datasets for sentence similarity calculation and feature extraction.

Text Embedding

Transformers

Other#Punjabi sentence similarity #NLI training #Cross-lingual support

Downloads 14

Release Time : 3/4/2023

Model Overview

This model is a Punjabi sentence transformer based on l3cube-pune/punjabi-bert, primarily used for sentence similarity calculation and feature extraction tasks.

Model Features

Punjabi language support

Sentence embedding model specifically optimized for Punjabi

NLI training

Trained on Natural Language Inference (NLI) datasets, suitable for sentence similarity tasks

Indian language series

Part of the MahaNLP project, sharing architecture with other Indian language models

Model Capabilities

Sentence feature extraction

Sentence similarity calculation

Text semantic representation

Use Cases

Text similarity

Semantic search

Used for Punjabi semantic search systems

Duplicate detection

Detect similar or duplicate content in Punjabi texts

Information retrieval

Document clustering

Cluster analysis for Punjabi documents

🚀 PunjabiSBERT

PunjabiSBERT is a model designed for sentence similarity tasks. It is based on a PunjabiBERT model (l3cube - pune/punjabi - bert) trained on the NLI dataset. This model is part of the project MahaNLP: [https://github.com/l3cube - pune/MarathiNLP](https://github.com/l3cube - pune/MarathiNLP). There is also a multilingual version supporting major Indic languages and cross - lingual capabilities available at [indic - sentence - bert - nli](https://huggingface.co/l3cube - pune/indic - sentence - bert - nli). Additionally, a better fine - tuned sentence similarity model can be found at [https://huggingface.co/l3cube - pune/punjabi - sentence - similarity - sbert](https://huggingface.co/l3cube - pune/punjabi - sentence - similarity - sbert).

Model Information

Property	Details
Pipeline Tag	sentence - similarity
Tags	sentence - transformers, feature - extraction, sentence - similarity, transformers
License	cc - by - 4.0
Language	pa

Widget Examples

Example 1
- Source Sentence: "ਪੇਂਟਿੰਗ ਮੇਰਾ ਸ਼ੌਕ ਹੈ"
- Comparison Sentences:
  - "ਨੱਚਣਾ ਮੇਰਾ ਸ਼ੌਕ ਹੈ"
  - "ਮੇਰੇ ਬਹੁਤ ਸਾਰੇ ਸ਼ੌਕ ਹਨ"
  - "ਮੈਨੂੰ ਪੇਂਟਿੰਗ ਅਤੇ ਡਾਂਸ ਦੋਵਾਂ ਦਾ ਆਨੰਦ ਆਉਂਦਾ ਹੈ"
Example 2
- Source Sentence: "ਕੁਝ ਲੋਕ ਗਾ ਰਹੇ ਹਨ"
- Comparison Sentences:
  - "ਲੋਕਾਂ ਦਾ ਇੱਕ ਸਮੂਹ ਗਾ ਰਿਹਾ ਹੈ"
  - "ਇੱਕ ਬਿੱਲੀ ਦੁੱਧ ਪੀ ਰਹੀ ਹੈ"
  - "ਦੋ ਆਦਮੀ ਲੜ ਰਹੇ ਹਨ"
Example 3
- Source Sentence: "ਮੇਰੇ ਘਰ ਵਿੱਚ ਤੁਹਾਡਾ ਸੁਆਗਤ ਹੈ"
- Comparison Sentences:
  - "ਮੈਂ ਤੁਹਾਡੇ ਘਰ ਵਿੱਚ ਤੁਹਾਡਾ ਸੁਆਗਤ ਕਰਾਂਗਾ "
  - "ਮੇਰਾ ਘਰ ਕਾਫੀ ਵੱਡਾ ਹੈ"
  - "ਅੱਜ ਮੇਰੇ ਘਰ ਵਿੱਚ ਰਹੋ"

Related Papers

You can find more details on the dataset, models, and baseline results in our paper.

@article{deode2023l3cube,
  title={L3Cube - IndicSBERT: A simple approach for learning cross - lingual sentence representations using multilingual BERT},
  author={Deode, Samruddhi and Gadre, Janhavi and Kajale, Aditi and Joshi, Ananya and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2304.11434},
  year={2023}
}

@article{joshi2022l3cubemahasbert,
  title={L3Cube - MahaSBERT and HindSBERT: Sentence BERT Models and Benchmarking BERT Sentence Representations for Hindi and Marathi},
  author={Joshi, Ananya and Kajale, Aditi and Gadre, Janhavi and Deode, Samruddhi and Joshi, Raviraj},
  journal={arXiv preprint arXiv:2211.11187},
  year={2022}
}

monolingual Indic SBERT paper
multilingual Indic SBERT paper

Other Monolingual Indic Sentence BERT Models

[Marathi SBERT](https://huggingface.co/l3cube - pune/marathi - sentence - bert - nli)
[Hindi SBERT](https://huggingface.co/l3cube - pune/hindi - sentence - bert - nli)
[Kannada SBERT](https://huggingface.co/l3cube - pune/kannada - sentence - bert - nli)
[Telugu SBERT](https://huggingface.co/l3cube - pune/telugu - sentence - bert - nli)
[Malayalam SBERT](https://huggingface.co/l3cube - pune/malayalam - sentence - bert - nli)
[Tamil SBERT](https://huggingface.co/l3cube - pune/tamil - sentence - bert - nli)
[Gujarati SBERT](https://huggingface.co/l3cube - pune/gujarati - sentence - bert - nli)
[Oriya SBERT](https://huggingface.co/l3cube - pune/odia - sentence - bert - nli)
[Bengali SBERT](https://huggingface.co/l3cube - pune/bengali - sentence - bert - nli)
[Punjabi SBERT](https://huggingface.co/l3cube - pune/punjabi - sentence - bert - nli)
[Indic SBERT (multilingual)](https://huggingface.co/l3cube - pune/indic - sentence - bert - nli)

Other Monolingual Similarity Models

[Marathi Similarity](https://huggingface.co/l3cube - pune/marathi - sentence - similarity - sbert)
[Hindi Similarity](https://huggingface.co/l3cube - pune/hindi - sentence - similarity - sbert)
[Kannada Similarity](https://huggingface.co/l3cube - pune/kannada - sentence - similarity - sbert)
[Telugu Similarity](https://huggingface.co/l3cube - pune/telugu - sentence - similarity - sbert)
[Malayalam Similarity](https://huggingface.co/l3cube - pune/malayalam - sentence - similarity - sbert)
[Tamil Similarity](https://huggingface.co/l3cube - pune/tamil - sentence - similarity - sbert)
[Gujarati Similarity](https://huggingface.co/l3cube - pune/gujarati - sentence - similarity - sbert)
[Oriya Similarity](https://huggingface.co/l3cube - pune/odia - sentence - similarity - sbert)
[Bengali Similarity](https://huggingface.co/l3cube - pune/bengali - sentence - similarity - sbert)
[Punjabi Similarity](https://huggingface.co/l3cube - pune/punjabi - sentence - similarity - sbert)
[Indic Similarity (multilingual)](https://huggingface.co/l3cube - pune/indic - sentence - similarity - sbert)

🚀 Quick Start

Prerequisites

You need to install sentence - transformers to use this model easily. You can install it using the following command:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage (Sentence - Transformers)

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('{MODEL_NAME}')
embeddings = model.encode(sentences)
print(embeddings)

Advanced Usage (HuggingFace Transformers)

If you don't have sentence - transformers installed, you can still use the model. First, pass your input through the transformer model, then apply the right pooling - operation on top of the contextualized word embeddings.

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📄 License

This model is released under the cc - by - 4.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご