đ DeCLUTR-small
The "DeCLUTR-small" model is designed for sentence similarity tasks, offering a powerful solution for semantic understanding in text.
đ Quick Start
The "DeCLUTR-small" model is sourced from our paper: DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. It is intended to serve as a universal sentence encoder, similar to Google's Universal Sentence Encoder or Sentence Transformers.
⨠Features
- Universal Encoder: Can be used as a universal sentence encoder.
- Semantic Similarity: Capable of computing semantic similarity between text sentences.
đĻ Installation
No specific installation steps are provided in the original document.
đģ Usage Examples
Basic Usage
from scipy.spatial.distance import cosine
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("johngiorgi/declutr-small")
texts = [
"A smiling costumed woman is holding an umbrella.",
"A happy woman in a fairy costume holds an umbrella.",
]
embeddings = model.encode(texts)
semantic_sim = 1 - cosine(embeddings[0], embeddings[1])
With đ¤ Transformers
import torch
from scipy.spatial.distance import cosine
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("johngiorgi/declutr-small")
model = AutoModel.from_pretrained("johngiorgi/declutr-small")
text = [
"A smiling costumed woman is holding an umbrella.",
"A happy woman in a fairy costume holds an umbrella.",
]
inputs = tokenizer(text, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
sequence_output = model(**inputs)[0]
embeddings = torch.sum(
sequence_output * inputs["attention_mask"].unsqueeze(-1), dim=1
) / torch.clamp(torch.sum(inputs["attention_mask"], dim=1, keepdims=True), min=1e-9)
semantic_sim = 1 - cosine(embeddings[0], embeddings[1])
đ Documentation
Please see our repo for full details.
đ License
This model is licensed under the apache-2.0 license.
BibTeX entry and citation info
@inproceedings{giorgi-etal-2021-declutr,
title = {{D}e{CLUTR}: Deep Contrastive Learning for Unsupervised Textual Representations},
author = {Giorgi, John and Nitski, Osvald and Wang, Bo and Bader, Gary},
year = 2021,
month = aug,
booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
publisher = {Association for Computational Linguistics},
address = {Online},
pages = {879--895},
doi = {10.18653/v1/2021.acl-long.72},
url = {https://aclanthology.org/2021.acl-long.72}
}
Property |
Details |
Model Type |
Sentence Similarity |
Training Data |
openwebtext |
Pipeline Tag |
sentence-similarity |
Tags |
sentence-transformers, feature-extraction, sentence-similarity |
License |
apache-2.0 |