Reddit - TC - BERT Open-Source Text Classification Model - Freely Determine Whether Two Sentences are Relevant

Reddit Tc Bert

Developed by Fan-s

A text classification model fine-tuned on Reddit dialogue dataset based on bert-base-uncased, used to determine whether two sentences are related

Text Classification

Transformers

Open Source License:Apache-2.0 #Text relevance judgment #High-accuracy dialogue analysis #Reddit data fine-tuning

Downloads 23

Release Time : 3/2/2022

Model Overview

This model is a fine-tuned version of bert-base-uncased, suitable for text classification tasks, particularly adept at judging the relevance between two sentences.

Model Features

High accuracy

Achieved 92.67% accuracy on the evaluation set

Fine-tuning optimization

Specially optimized based on Reddit dialogue data, suitable for dialogue relevance judgment

Efficient training

Utilizes Adam optimizer and linear learning rate scheduler for efficient training

Model Capabilities

Text classification

Sentence relevance judgment

Natural language understanding

Use Cases

Dialogue systems

Dialogue coherence detection

Determine whether user's consecutive dialogues are relevant

Accurately identifies 92.67% of relevant dialogues

Content moderation

Comment relevance check

Detect whether comments are relevant to the topic

🚀 bert-uncased-base

This is a fine - tuned version of bert-base-uncased on a Reddit - dialogue dataset, which can be used for text classification to determine the relationship between two sentences.

🚀 Quick Start

This model is a fine - tuned version of bert-base-uncased on an Reddit - dialogue dataset. It can be employed for Text Classification, specifically to check if two given sentences are related. On the evaluation set, it attains the following results:

Loss: 0.2297
Accuracy: 0.9267

✨ Features

Fine - tuned on Reddit - dialogue dataset.
Capable of text classification to determine sentence relationships.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# label_list
label_list = ['matched', 'unmatched']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained("Fan-s/reddit-tc-bert", use_fast=True)
model = AutoModelForSequenceClassification.from_pretrained("Fan-s/reddit-tc-bert") 

# Set the input
post = "don't make gravy with asbestos."
response = "i'd expect someone with a culinary background to know that. since we're talking about school dinner ladies, they need to learn this pronto."

# Predict whether the two sentences are matched 
def predict(post, response, max_seq_length=128):
    with torch.no_grad():
        args =  (post, response)
        input = tokenizer(*args, padding="max_length", max_length=max_seq_length, truncation=True, return_tensors="pt")
        output = model(**input)
        logits = output.logits
        item = torch.argmax(logits, dim=1)
        predict_label = label_list[item]
        return predict_label, logits
         
predict_label, logits = predict(post, response)
# Matched
print("predict_label:", predict_label)

Advanced Usage

No advanced usage examples are provided in the original document, so this part is not added.

📚 Documentation

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e - 05
train_batch_size: 320
eval_batch_size: 80
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon = 1e - 08
lr_scheduler_type: linear
num_epochs: 5.0

Framework versions

Transformers 4.16.0.dev0
Pytorch 1.10.1+cu102
Datasets 1.17.0
Tokenizers 0.11.0

📄 License

This model is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご