đ DeCLUTR-base
The "DeCLUTR-base" model is designed for sentence similarity tasks, offering a powerful solution for encoding sentences and computing semantic similarities.
đ Quick Start
The "DeCLUTR-base" model is sourced from the paper DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. It serves as a universal sentence encoder, similar to Google's Universal Sentence Encoder or Sentence Transformers.
⨠Features
- Universal Sentence Encoder: Can be used as a general - purpose sentence encoder for various natural language processing tasks.
- Semantic Similarity Computation: Capable of computing semantic similarities between sentences effectively.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from scipy.spatial.distance import cosine
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("johngiorgi/declutr-base")
texts = [
"A smiling costumed woman is holding an umbrella.",
"A happy woman in a fairy costume holds an umbrella.",
]
embeddings = model.encode(texts)
semantic_sim = 1 - cosine(embeddings[0], embeddings[1])
With đ¤ Transformers
import torch
from scipy.spatial.distance import cosine
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("johngiorgi/declutr-base")
model = AutoModel.from_pretrained("johngiorgi/declutr-base")
text = [
"A smiling costumed woman is holding an umbrella.",
"A happy woman in a fairy costume holds an umbrella.",
]
inputs = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
sequence_output = model(**inputs)[0]
embeddings = torch.sum(
sequence_output * inputs["attention_mask"].unsqueeze(-1), dim=1
) / torch.clamp(torch.sum(inputs["attention_mask"], dim=1, keepdims=True), min=1e-9)
semantic_sim = 1 - cosine(embeddings[0], embeddings[1])
đ Documentation
For full details, please see our repo.
đ License
This model is licensed under the apache-2.0
license.
BibTeX entry and citation info
@inproceedings{giorgi-etal-2021-declutr,
title = {{D}e{CLUTR}: Deep Contrastive Learning for Unsupervised Textual Representations},
author = {Giorgi, John and Nitski, Osvald and Wang, Bo and Bader, Gary},
year = 2021,
month = aug,
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
address = {Online},
pages = {879--895},
doi = {10.18653/v1/2021.acl-long.72},
url = {https://aclanthology.org/2021.acl-long.72}
}
Property |
Details |
Pipeline Tag |
sentence - similarity |
Tags |
sentence - transformers, feature - extraction, sentence - similarity |
Language |
en |
License |
apache - 2.0 |
Datasets |
openwebtext |