đ (distil)BERT-based Sentiment Classification Model: Unleashing the Power of Synthetic Data
This is a sentiment classification model based on (distil)BERT, leveraging synthetic data to provide high - performance sentiment analysis. It can be applied in various scenarios such as social media analysis and customer feedback analysis.
đ Quick Start
Python Example
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "tabularisai/robust-sentiment-analysis"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
def predict_sentiment(text):
inputs = tokenizer(text.lower(), return_tensors="pt", truncation=True, padding=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(probabilities, dim=-1).item()
sentiment_map = {0: "Very Negative", 1: "Negative", 2: "Neutral", 3: "Positive", 4: "Very Positive"}
return sentiment_map[predicted_class]
texts = [
"I absolutely loved this movie! The acting was superb and the plot was engaging.",
"The service at this restaurant was terrible. I'll never go back.",
"The product works as expected. Nothing special, but it gets the job done.",
"I'm somewhat disappointed with my purchase. It's not as good as I hoped.",
"This book changed my life! I couldn't put it down and learned so much."
]
for text in texts:
sentiment = predict_sentiment(text)
print(f"Text: {text}")
print(f"Sentiment: {sentiment}\n")
JavaScript Example
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Tabularis Sentiment Analysis</title>
</head>
<body>
<div id="output"></div>
<script type="module">
import { AutoTokenizer, AutoModel, env } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.6.0';
env.allowLocalModels = false;
env.useCDN = true;
const MODEL_NAME = 'tabularisai/robust-sentiment-analysis';
function softmax(arr) {
const max = Math.max(...arr);
const exp = arr.map(x => Math.exp(x - max));
const sum = exp.reduce((acc, val) => acc + val);
return exp.map(x => x / sum);
}
async function analyzeSentiment() {
try {
const tokenizer = await AutoTokenizer.from_pretrained(MODEL_NAME);
const model = await AutoModel.from_pretrained(MODEL_NAME);
const texts = [
"I absolutely loved this movie! The acting was superb and the plot was engaging.",
"The service at this restaurant was terrible. I'll never go back.",
"The product works as expected. Nothing special, but it gets the job done.",
"I'm somewhat disappointed with my purchase. It's not as good as I hoped.",
"This book changed my life! I couldn't put it down and learned so much."
];
const output = document.getElementById('output');
for (const text of texts) {
const inputs = await tokenizer(text, { return_tensors: 'pt' });
const result = await model(inputs);
console.log('Model output:', result);
if (result.output && result.output.data) {
const logitsArray = Array.from(result.output.data);
console.log('Logits array:', logitsArray);
const probabilities = softmax(logitsArray);
const predicted_class = probabilities.indexOf(Math.max(...probabilities));
const sentimentMap = {
0: "Very Negative",
1: "Negative",
2: "Neutral",
3: "Positive",
4: "Very Positive"
};
const sentiment = sentimentMap[predicted_class];
const score = probabilities[predicted_class];
output.innerHTML += `Text: "${text}"<br>`;
output.innerHTML += `Sentiment: ${sentiment}, Score: ${score.toFixed(4)}<br><br>`;
} else {
console.error('Unexpected model output structure:', result);
output.innerHTML += `Unable to process: "${text}"<br><br>`;
}
}
} catch (error) {
console.error('Error:', error);
document.getElementById('output').innerHTML = 'An error occurred. Please check the console for details.';
}
}
analyzeSentiment();
</script>
</body>
</html>
⨠Features
- Multi - Class Classification: Capable of classifying sentiment into five classes: Very Negative, Negative, Neutral, Positive, and Very Positive.
- Synthetic Data Utilization: Trained on synthetic data to cover a wide range of sentiment expressions.
- Multiple Application Scenarios: Suitable for social media analysis, customer feedback analysis, product reviews classification, brand monitoring, market research, customer service optimization, and competitive intelligence.
đĻ Installation
No specific installation steps are provided in the original document.
đ Documentation
Model Details
Property |
Details |
Model Name |
tabularisai/robust-sentiment-analysis |
Model Type |
(distil)BERT-based Sentiment Classification Model |
Base Model |
distilbert/distilbert-base-uncased |
Task |
Text Classification (Sentiment Analysis) |
Language |
English |
Number of Classes |
5 (Very Negative, Negative, Neutral, Positive, Very Positive) |
Usage |
Social media analysis, customer feedback analysis, product reviews classification, brand monitoring, market research, customer service optimization, competitive intelligence |
Model Description
This model is a fine - tuned version of distilbert/distilbert-base-uncased
for sentiment analysis, trained only on synthetic data.
Training Data
The model was fine - tuned on synthetic data, which allows for targeted training on a diverse range of sentiment expressions without the limitations often found in real - world datasets.
Training Procedure
The model was fine - tuned on synthetic data using the distilbert/distilbert-base-uncased
architecture. The training process involved:
- Dataset: Synthetic data designed to cover a wide range of sentiment expressions
- Training framework: PyTorch Lightning
- Number of epochs: 5
- Performance metric: Achieved train_acc_off_by_one of approximately 0.95 on the validation dataset
Intended Use
This model is designed for sentiment analysis tasks, particularly useful for social media monitoring, customer feedback analysis, product review sentiment classification, and brand sentiment tracking.
Ethical Considerations
While efforts have been made to create a balanced and fair model through the use of synthetic data, users should be aware that the model may still exhibit biases. It's crucial to thoroughly test the model in your specific use case and monitor its performance over time.
đ§ Technical Details
The model is based on the distilbert/distilbert-base-uncased
architecture. During training, it used PyTorch Lightning as the training framework and was fine - tuned for 5 epochs on synthetic data. It achieved a train_acc_off_by_one of approximately 0.95 on the validation dataset.
đ License
The model is under the apache - 2.0 license.
đ NEWS!
- 2024/12: We uploaded an even better and more robust sentiment model! The error rate is reduced by 10%, and overall accuracy is improved!
đ Contact
For questions or private and reliable API with our model please contact info@tabularis.ai
đ Citation
Will be included