đ SKEP-Roberta
SKEP-Roberta is a sentiment analysis model. It leverages sentiment knowledge enhanced pre - training to improve performance in sentiment analysis tasks.
đ Quick Start
To start using the SKEP-Roberta model, you can follow these steps. First, load the tokenizer and the model as shown in the basic usage example below.
⨠Features
- Sentiment Knowledge Enhanced: SKEP proposes Sentiment Knowledge Enhanced Pre - training for sentiment analysis. It designs sentiment masking and three sentiment pre - training objectives to incorporate various types of knowledge into the pre - training model.
- Model Conversion: The released PyTorch model is converted from the officially released PaddlePaddle SKEP model, and a series of experiments have been conducted to ensure the accuracy of the conversion.
đĻ Installation
The installation process mainly involves loading the model and tokenizer from the pre - trained model repository. You can use the transformers
library in Python to achieve this.
đģ Usage Examples
Basic Usage
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Yaxin/roberta-large-ernie2-skep-en")
model = AutoModel.from_pretrained("Yaxin/roberta-large-ernie2-skep-en")
Advanced Usage
import torch
from transformers import RobertaTokenizer, RobertaForMaskedLM
tokenizer = RobertaTokenizer.from_pretrained('Yaxin/roberta-large-ernie2-skep-en')
input_tx = "<s> He like play with student, so he became a <mask> after graduation </s>"
tokenized_text = tokenizer.tokenize(input_tx)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([[0] * len(tokenized_text)])
model = RobertaForMaskedLM.from_pretrained('Yaxin/roberta-large-ernie2-skep-en')
model.eval()
with torch.no_grad():
outputs = model(tokens_tensor, token_type_ids=segments_tensors)
predictions = outputs[0]
predicted_index = [torch.argmax(predictions[0, i]).item() for i in range(0, (len(tokenized_text) - 1))]
predicted_token = [tokenizer.convert_ids_to_tokens([predicted_index[x]])[0] for x in
range(1, (len(tokenized_text) - 1))]
print('Predicted token is:', predicted_token)
đ Documentation
Released Model Info
Property |
Details |
Model Name |
skep - roberta - large |
Language |
English |
Model Structure |
Layer:24, Hidden:1024, Heads:24 |
This released PyTorch model is converted from the officially released PaddlePaddle SKEP model, and a series of experiments have been conducted to check the accuracy of the conversion.
- Official PaddlePaddle SKEP repo:
- https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/skep
- https://github.com/baidu/Senta
- Pytorch Conversion repo: Not released yet
More Detail
For more detailed information about SKEP, please refer to: https://aclanthology.org/2020.acl-main.374.pdf
đ License
No license information is provided in the original document.
đ Citation
@article{tian2020skep,
title={SKEP: Sentiment knowledge enhanced pre-training for sentiment analysis},
author={Tian, Hao and Gao, Can and Xiao, Xinyan and Liu, Hao and He, Bolei and Wu, Hua and Wang, Haifeng and Wu, Feng},
journal={arXiv preprint arXiv:2005.05635},
year={2020}
}
Reference
https://github.com/nghuyong/ERNIE-Pytorch