paraphrase-bert-pt Open Source Model - Free Evaluation of Portuguese Phrase Paraphrase Relationships

Paraphrase Bert Pt

Developed by Prompsit

A BERT-based Portuguese phrase paraphrase detection model for evaluating the paraphrase relationship between two phrases.

Text Classification

Transformers

Other#Portuguese Paraphrase Detection #Phrase-level Semantic Matching #BERT Fine-tuned Model

Downloads 25

Release Time : 3/2/2022

Model Overview

This model is fine-tuned from the pre-trained model 'neuralmind/bert-base-portuguese-cased' and is specifically designed to determine whether two given Portuguese phrases are paraphrases of each other.

Model Features

Portuguese Language Specialization

Optimized specifically for Portuguese phrase paraphrase detection.

Phrase-level Processing

Optimized for phrases rather than full sentences; does not support punctuation or long text inputs.

Efficient Inference

Test speed reaches 607 samples/second, suitable for real-time applications.

Model Capabilities

Phrase Similarity Evaluation

Text Rewriting Detection

Semantic Matching Judgment

Use Cases

Natural Language Processing

Paraphrase Detection

Determine if two Portuguese phrases express the same meaning.

Outputs paraphrase probability score (0-1).

Text Rewriting Verification

Verify if rewritten text retains the original meaning.

Provides a quantitative assessment of rewriting accuracy.

🚀 Prompsit/paraphrase-bert-pt

This model enables the evaluation of paraphrases for a given phrase, offering valuable insights in text - classification tasks.

🚀 Quick Start

This model is designed to answer the question: Is "phrase B" a paraphrase of "phrase A"? It's important to note that the model operates on phrases rather than sentences. Thus, it doesn't expect to encounter punctuation marks or long text passages.

The resulting probabilities correspond to the following classes:

0: Not a paraphrase
1: It's a paraphrase

💻 Usage Examples

Basic Usage

# Import necessary libraries
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Initialize tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Prompsit/paraphrase-bert-pt")
model = AutoModelForSequenceClassification.from_pretrained("Prompsit/paraphrase-bert-pt")

# Prepare input
input = tokenizer('logo após o homicídio','pouco depois do assassinato',return_tensors='pt')

# Get logits
logits = model(**input).logits

# Apply softmax
soft = torch.nn.Softmax(dim=1)

# Print results
print(soft(logits))

Advanced Usage

# The model was fine - tuned from pretrained "neuralmind/bert-base-portuguese-cased".
# Here is an example of using it in a more complex scenario where you can loop through multiple pairs of phrases.
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("Prompsit/paraphrase-bert-pt")
model = AutoModelForSequenceClassification.from_pretrained("Prompsit/paraphrase-bert-pt")

phrase_pairs = [
    ('logo após o homicídio', 'pouco depois do assassinato'),
    ('outro par de frases', 'mais um par de candidatos')
]

for pair in phrase_pairs:
    input = tokenizer(pair[0], pair[1], return_tensors='pt')
    logits = model(**input).logits
    soft = torch.nn.Softmax(dim=1)
    print(soft(logits))

📚 Documentation

This model was fine - tuned from the pretrained "neuralmind/bert - base - portuguese - cased". It was built under a TSI - 100905 - 2019 - 4 project, co - financed by the Ministry of Economic Affairs and Digital Transformation from the Government of Spain.

🔧 Technical Details

Evaluation Results

We used a test dataset consisting of 16500 pairs of human - tagged phrases. The following metrics were obtained:

metrics={
 'test_loss': 0.6074697375297546, 
 'test_accuracy': 0.7809, 
 'test_precision': 0.7157638466220329, 
 'test_recall': 0.40551724137931033, 
 'test_f1': 0.5177195685670262, 
 'test_matthews_correlation': 0.41603913834665324, 
 'test_runtime': 16.4585, 
 'test_samples_per_second': 607.587, 
 'test_steps_per_second': 19.017
}

Property	Details
Model Type	Fine - tuned from "neuralmind/bert - base - portuguese - cased" for paraphrase evaluation
Training Data	Built under a TSI - 100905 - 2019 - 4 project, co - financed by the Ministry of Economic Affairs and Digital Transformation from the Government of Spain. Tested on 16500 pairs of human - tagged phrases

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご