Open-source roberta-large-ernie2-skep-en model - Accurately assist with sentiment analysis tasks

Roberta Large Ernie2 Skep En

Developed by Yaxin

SKEP (Sentiment Knowledge Enhanced Pre-training) was proposed by Baidu in 2020, specifically designed for sentiment analysis tasks. The model incorporates multi-type knowledge through sentiment masking techniques and three sentiment pre-training objectives.

Large Language Model

Transformers

English#Sentiment Analysis #Knowledge-Enhanced Pre-training #Multi-task Learning

Downloads 29

Release Time : 4/4/2022

Model Overview

SKEP-Roberta is a pre-trained model based on the Roberta architecture, optimized for sentiment analysis tasks with enhanced performance through sentiment knowledge augmentation.

Model Features

Sentiment Knowledge Enhancement

Incorporates multi-type knowledge through sentiment masking techniques and three sentiment pre-training objectives.

Based on Roberta Architecture

Adopts the Roberta-large architecture with 24 layers, 1024 hidden dimensions, and 24 attention heads.

PyTorch Conversion

Converted from the official PaddlePaddle version of the SKEP model, with experimental validation of conversion accuracy.

Model Capabilities

Sentiment Analysis

Text Classification

Masked Language Modeling

Use Cases

Sentiment Analysis

Product Review Sentiment Analysis

Analyze the sentiment tendency (positive/negative) of user reviews on products.

Social Media Emotion Detection

Identify emotional expressions in social media texts.

Educational Applications

Student Feedback Analysis

Analyze the sentiment of student feedback on courses or teaching.

🚀 SKEP-Roberta

SKEP-Roberta is a sentiment analysis model. It leverages sentiment knowledge enhanced pre - training to improve performance in sentiment analysis tasks.

🚀 Quick Start

To start using the SKEP-Roberta model, you can follow these steps. First, load the tokenizer and the model as shown in the basic usage example below.

✨ Features

Sentiment Knowledge Enhanced: SKEP proposes Sentiment Knowledge Enhanced Pre - training for sentiment analysis. It designs sentiment masking and three sentiment pre - training objectives to incorporate various types of knowledge into the pre - training model.
Model Conversion: The released PyTorch model is converted from the officially released PaddlePaddle SKEP model, and a series of experiments have been conducted to ensure the accuracy of the conversion.

📦 Installation

The installation process mainly involves loading the model and tokenizer from the pre - trained model repository. You can use the transformers library in Python to achieve this.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Yaxin/roberta-large-ernie2-skep-en")
model = AutoModel.from_pretrained("Yaxin/roberta-large-ernie2-skep-en")

Advanced Usage

#!/usr/bin/env python
#encoding: utf-8
import torch
from transformers import RobertaTokenizer, RobertaForMaskedLM

tokenizer = RobertaTokenizer.from_pretrained('Yaxin/roberta-large-ernie2-skep-en')

input_tx = "<s> He like play with student, so he became a <mask> after graduation </s>"
# input_tx = "<s> He is a <mask> and likes to get along with his students </s>"

tokenized_text = tokenizer.tokenize(input_tx)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)

tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([[0] * len(tokenized_text)])

model = RobertaForMaskedLM.from_pretrained('Yaxin/roberta-large-ernie2-skep-en')
model.eval()

with torch.no_grad():
    outputs = model(tokens_tensor, token_type_ids=segments_tensors)
    predictions = outputs[0]

predicted_index = [torch.argmax(predictions[0, i]).item() for i in range(0, (len(tokenized_text) - 1))]
predicted_token = [tokenizer.convert_ids_to_tokens([predicted_index[x]])[0] for x in
                   range(1, (len(tokenized_text) - 1))]

print('Predicted token is:', predicted_token)

📚 Documentation

Released Model Info

Property	Details
Model Name	skep - roberta - large
Language	English
Model Structure	Layer:24, Hidden:1024, Heads:24

This released PyTorch model is converted from the officially released PaddlePaddle SKEP model, and a series of experiments have been conducted to check the accuracy of the conversion.

Official PaddlePaddle SKEP repo:
1. https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/skep
2. https://github.com/baidu/Senta
Pytorch Conversion repo: Not released yet

More Detail

For more detailed information about SKEP, please refer to: https://aclanthology.org/2020.acl-main.374.pdf

📄 License

No license information is provided in the original document.

📚 Citation

@article{tian2020skep,
  title={SKEP: Sentiment knowledge enhanced pre-training for sentiment analysis},
  author={Tian, Hao and Gao, Can and Xiao, Xinyan and Liu, Hao and He, Bolei and Wu, Hua and Wang, Haifeng and Wu, Feng},
  journal={arXiv preprint arXiv:2005.05635},
  year={2020}
}

Reference

https://github.com/nghuyong/ERNIE-Pytorch

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご