Covid-Twitter-BERT-v2-MNLI Open-Source Model - Free Analysis of COVID-19 Related Twitter Content

Covid Twitter Bert V2 Mnli

Developed by digitalepidemiologylab

A BERT-based zero-shot classifier optimized for analyzing COVID-19-related Twitter content

Large Language Model EnglishOpen Source License:MIT #Zero-shot classification #COVID-19 tweet analysis #MNLI fine-tuning

Downloads 142

Release Time : 3/2/2022

Model Overview

This model provides a zero-shot classifier suitable for scenarios where labeled data is insufficient to fine-tune CT-BERT for specific tasks. It can be directly used as a zero-shot classifier by reformulating classification tasks as questions.

Model Features

Zero-shot classification capability

Performs classification tasks without labeled data by directly reasoning through reformulated questions

COVID-19 domain optimization

Specifically optimized for COVID-19-related Twitter content

MNLI fine-tuning

Fine-tuned on 400,000 MNLI task entries, equipped with strong logical reasoning capabilities

Model Capabilities

Text classification

Zero-shot learning

Natural language inference

Use Cases

Public health

Vaccine-related tweet classification

Automatically identifies vaccine-related content on Twitter

Epidemic information monitoring

Analyzes trends in epidemic-related information on social media

Social media analysis

Topic classification

Automatically classifies COVID-19-related tweets

🚀 COVID-Twitter-BERT v2 MNLI

This model offers a zero - shot classifier for scenarios where finetuning CT - BERT on a specific task is unfeasible due to the lack of labeled data.

🚀 Quick Start

The easiest way to try this out is by using the Hugging Face pipeline. This uses the default English template where it puts the text "This example is " in front of the text.

from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="digitalepidemiologylab/covid-twitter-bert-v2-mnli")

You can then use this pipeline to classify sequences into any of the class names you specify.

sequence_to_classify = 'To stop the pandemic it is important that everyone turns up for their shots.'
candidate_labels = ['health', 'sport', 'vaccine', 'guns']
hypothesis_template = 'This example is {}.'
classifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template, multi_class=True)

✨ Features

This model provides a zero - shot classifier for situations where finetuning CT - BERT on a specific task is impossible because of the lack of labelled data.
The technique is based on Yin et al., which describes a clever way of using pre - trained MNLI models as zero - shot sequence classifiers.
The model is already finetuned on 400,000 generic logical tasks.

📚 Documentation

Model description

This model provides a zero - shot classifier to be used in cases where it is not possible to finetune CT - BERT on a specific task, due to lack of labelled data.

The technique is based on Yin et al.. The article describes a very clever way of using pre - trained MNLI models as zero - shot sequence classifiers. The model is already finetuned on 400,000 generic logical tasks. We can then use it as a zero - shot classifier by reformulating the classification task as a question.

Let's say we want to classify COVID - tweets as vaccine - related and not vaccine - related. The typical way would be to collect a few hundred pre - annotated tweets and organise them in two classes. Then you would finetune the model on this.

With the zero - shot mnli - classifier, you can instead reformulate your question as "This text is about vaccines", and use this directly on inference - without any training.

Find more info about the model on our GitHub page.

Usage

Please note that how you formulate the question can give slightly different results. Collecting a training set and finetuning on this, will most likely give you better accuracy.

Training procedure

The model is finetuned on the 400k large MNLI - task.

📄 License

This project is licensed under the MIT license.

🔗 Additional Information

Tags: Twitter, COVID - 19, text - classification, pytorch, tensorflow, bert
Datasets: mnli
Pipeline Tag: zero - shot - classification
Thumbnail: COVID - Twitter - BERT_small

📖 References

@article{muller2020covid,
  title={COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter},
  author={M{\"u}ller, Martin and Salath{\'e}, Marcel and Kummervold, Per E},
  journal={arXiv preprint arXiv:2005.07503},
  year={2020}
}

Martin Müller, Marcel Salathé, and Per E. Kummervold.
COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter.
arXiv preprint arXiv:2005.07503 (2020).

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご