Contradiction-PSB Open-source Patent Contradictory Sentence Recognition Model - Free Deployment for Clustering and Semantic Search

Contradiction Psb

Developed by nategro

A patent contradiction sentence identification model based on PatentSBERTa, capable of mapping sentences and paragraphs into a 768-dimensional dense vector space, suitable for tasks such as clustering or semantic search.

Text Embedding

Transformers

#Patent Contradiction Identification #Sentence Vectorization #Patent Text Analysis

Downloads 38

Release Time : 10/13/2022

Model Overview

This is a sentence transformer model specifically designed for identifying contradictory sentences in patent texts, capable of generating high-quality sentence embedding vectors.

Model Features

Patent Text Optimization

Specifically trained for patent texts, excelling in sentence similarity tasks within the patent domain.

High-Dimensional Vector Representation

Capable of mapping sentences and paragraphs into a 768-dimensional dense vector space.

Contradiction Sentence Identification

Particularly adept at identifying contradictory statements in patent texts.

Model Capabilities

Sentence Similarity Calculation

Text Feature Extraction

Semantic Search

Text Clustering

Contradiction Sentence Identification

Use Cases

Patent Analysis

Patent Contradiction Detection

Identify potential contradictory statements in patent documents.

Improve patent examination efficiency and accuracy.

Patent Similarity Search

Find patent documents similar to a given patent.

Assist in patent retrieval and prior art investigation.

Text Analysis

Document Clustering

Group large volumes of patent documents by semantic similarity.

Enable efficient organization and retrieval of patent documents.

🚀 Contradiction-PSB

A model for identifying contradictory sentences in patents using PatentSBERTa. It maps sentences and paragraphs to a 768-dimensional dense vector space, useful for tasks like clustering and semantic search.

🚀 Quick Start

This model simplifies the identification of contradictory sentences in patents. It maps text to a 768-dimensional vector space, facilitating tasks such as clustering and semantic search.

✨ Features

Sentence Embedding: Maps sentences and paragraphs into a 768-dimensional dense vector space.
Multiple Usage Modes: Can be used with sentence-transformers or HuggingFace Transformers.

📦 Installation

To use this model, you need to install sentence-transformers:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

If you have sentence-transformers installed, you can use the model as follows:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('nategro/contradiction-psb')
embeddings = model.encode(sentences)
print(embeddings)

Advanced Usage

Without sentence-transformers, you can use the model in the following way. First, pass your input through the transformer model, and then apply the appropriate pooling operation to the contextualized word embeddings:

from transformers import AutoTokenizer, AutoModel
import torch


def cls_pooling(model_output, attention_mask):
    return model_output[0][:,0]


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('nategro/contradiction-psb')
model = AutoModel.from_pretrained('nategro/contradiction-psb')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, cls pooling.
sentence_embeddings = cls_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

📚 Documentation

Evaluation Results

For an automated evaluation of this model, see the Sentence Embeddings Benchmark: https://seb.sbert.net

Training

The model was trained with the following parameters:

DataLoader: torch.utils.data.dataloader.DataLoader of length 496 with parameters:

{'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

Loss: sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss

Parameters of the fit()-Method:

{
    "epochs": 1,
    "evaluation_steps": 0,
    "evaluator": "NoneType",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 2e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": 496,
    "warmup_steps": 50,
    "weight_decay": 0.01
}

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

Citing & Authors

The following pre-trained model was used: AI-Growth-Lab/PatentSBERTa

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご