🚀 Prompsit/paraphrase-bert-pt
This model enables the evaluation of paraphrases for a given phrase, offering valuable insights in text - classification tasks.
🚀 Quick Start
This model is designed to answer the question: Is "phrase B" a paraphrase of "phrase A"? It's important to note that the model operates on phrases rather than sentences. Thus, it doesn't expect to encounter punctuation marks or long text passages.
The resulting probabilities correspond to the following classes:
- 0: Not a paraphrase
- 1: It's a paraphrase
💻 Usage Examples
Basic Usage
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Prompsit/paraphrase-bert-pt")
model = AutoModelForSequenceClassification.from_pretrained("Prompsit/paraphrase-bert-pt")
input = tokenizer('logo após o homicídio','pouco depois do assassinato',return_tensors='pt')
logits = model(**input).logits
soft = torch.nn.Softmax(dim=1)
print(soft(logits))
Advanced Usage
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Prompsit/paraphrase-bert-pt")
model = AutoModelForSequenceClassification.from_pretrained("Prompsit/paraphrase-bert-pt")
phrase_pairs = [
('logo após o homicídio', 'pouco depois do assassinato'),
('outro par de frases', 'mais um par de candidatos')
]
for pair in phrase_pairs:
input = tokenizer(pair[0], pair[1], return_tensors='pt')
logits = model(**input).logits
soft = torch.nn.Softmax(dim=1)
print(soft(logits))
📚 Documentation
This model was fine - tuned from the pretrained "neuralmind/bert - base - portuguese - cased". It was built under a TSI - 100905 - 2019 - 4 project, co - financed by the Ministry of Economic Affairs and Digital Transformation from the Government of Spain.
🔧 Technical Details
Evaluation Results
We used a test dataset consisting of 16500 pairs of human - tagged phrases. The following metrics were obtained:
metrics={
'test_loss': 0.6074697375297546,
'test_accuracy': 0.7809,
'test_precision': 0.7157638466220329,
'test_recall': 0.40551724137931033,
'test_f1': 0.5177195685670262,
'test_matthews_correlation': 0.41603913834665324,
'test_runtime': 16.4585,
'test_samples_per_second': 607.587,
'test_steps_per_second': 19.017
}
Property |
Details |
Model Type |
Fine - tuned from "neuralmind/bert - base - portuguese - cased" for paraphrase evaluation |
Training Data |
Built under a TSI - 100905 - 2019 - 4 project, co - financed by the Ministry of Economic Affairs and Digital Transformation from the Government of Spain. Tested on 16500 pairs of human - tagged phrases |