🚀 Twitter bitcoin related spam detection
This model is designed to classify bitcoin or crypto - related tweets as "human", "spam", or "bot", eliminating the prejudice in classifying such tweets.
🚀 Quick Start
This model aims to classify tweets related to bitcoin or crypto topics as "human", "spam", or "bot". There are already many models available, but bitcoin - related tweets are often quickly classified as "spam" because they are usually associated with phishing sites or scams. This model, trained on a bitcoin - related dataset, removes this prejudice, enabling effective work with bitcoin - related tweets.
The model is a fine - tuned version of [vinai/bertweet - base](https://huggingface.co/vinai/bertweet - base), a roBERTa - based model fine - tuned with 850M English Tweets, and it's trained for emotion classification over a bitcoin - related dataset.
✨ Features
- Classify bitcoin or crypto - related tweets into "human", "spam", or "bot".
- Eliminate the prejudice in classifying bitcoin - related tweets.
- Based on a fine - tuned version of [vinai/bertweet - base](https://huggingface.co/vinai/bertweet - base).
💻 Usage Examples
Basic Usage
BERTweet was trained over normalized tweets such as:
tweet = "DHEC confirms HTTPURL via @USER :crying_face:"
So for better results, it's recommended to normalize the texts before applying the spam detection. What it does is converting user mentions and web/url links into special tokens @USER and HTTPURL, respectively, and other preprocessing modifications.
To do so, copy or download the TweetNormalizer provided at the BERTweet webpage:
!git clone https://github.com/VinAIResearch/BERTweet.git
!pip install emoji
from transformers import pipeline
import sys
sys.path.append('/content/BERTweet')
from TweetNormalizer import normalizeTweet
classifier = pipeline("text - classification",
model="sandiumenge/twitter - bitcoin - spam - detection",
tokenizer="sandiumenge/twitter - bitcoin - spam - detection",
truncation=True,
padding=True,
max_length=128
)
tweet = "I'm winning iPhone XS,BTC,ETH and other Awards. Join with us!@freecoinhunt https://t.co/VIUwLmdy4n"
normalized_tweet = normalizeTweet(tweet)
print(classifier(normalized_tweet))
>> [{'label': 'spam', 'score': 0.9803344011306763}]
Advanced Usage
The model achieves the following results on the evaluation set:
- Loss: 0.4793
- Accuracy: 0.8755
- F1: 0.8767
- Precision: 0.8792
- Recall: 0.8755

📄 License
This project is licensed under the MIT license.
Property |
Details |
Model Type |
Fine - tuned version of [vinai/bertweet - base](https://huggingface.co/vinai/bertweet - base) |
Training Data |
[sandiumenge/bitcoin - tweets - spam - emotion - sentiment](sandiumenge/bitcoin - tweets - spam - emotion - sentiment) |
Metrics |
accuracy, f1, precision, recall |
Pipeline Tag |
text - classification |
Model Name |
twitter - bitcoin - spam - detection |