🚀 ONNX Model for Emotion Classification
This model is designed for text classification, specifically for multi - class and multi - label emotion classification. It leverages the ONNX format to offer faster inference, especially for smaller batch sizes, while maintaining high accuracy.
✨ Features
- ONNX Version: Available in both full - precision and quantized (INT8) versions.
- Performance: Faster inference compared to normal Transformers, especially for small batch sizes.
- Accuracy: Metrics are comparable to the original Transformers model.
- Size: The quantized model is one - quarter the size of the full - precision model.
📦 Installation
No specific installation steps are provided in the original document.
💻 Usage Examples
Basic Usage
sentences = ["ONNX is seriously fast for small batches. Impressive"]
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSequenceClassification
model_id = "SamLowe/roberta-base-go_emotions-onnx"
file_name = "onnx/model_quantized.onnx"
model = ORTModelForSequenceClassification.from_pretrained(model_id, file_name=file_name)
tokenizer = AutoTokenizer.from_pretrained(model_id)
onnx_classifier = pipeline(
task="text-classification",
model=model,
tokenizer=tokenizer,
top_k=None,
function_to_apply="sigmoid",
)
model_outputs = onnx_classifier(sentences)
print(model_outputs)
Advanced Usage
from tokenizers import Tokenizer
import onnxruntime as ort
from os import cpu_count
import numpy as np
sentences = ["hello world"]
labels = ['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
tokenizer = Tokenizer.from_pretrained("SamLowe/roberta-base-go_emotions")
params = {**tokenizer.padding, "length": None}
tokenizer.enable_padding(**params)
tokens_obj = tokenizer.encode_batch(sentences)
def load_onnx_model(model_filepath):
_options = ort.SessionOptions()
_options.inter_op_num_threads, _options.intra_op_num_threads = cpu_count(), cpu_count()
_providers = ["CPUExecutionProvider"]
return ort.InferenceSession(path_or_bytes=model_filepath, sess_options=_options, providers=_providers)
model = load_onnx_model("path_to_model_dot_onnx_or_model_quantized_dot_onnx")
output_names = [model.get_outputs()[0].name]
input_feed_dict = {
"input_ids": [t.ids for t in tokens_obj],
"attention_mask": [t.attention_mask for t in tokens_obj]
}
logits = model.run(output_names=output_names, input_feed=input_feed_dict)[0]
def sigmoid(x):
return 1.0 / (1.0 + np.exp(-x))
model_outputs = sigmoid(logits)
for probas in model_outputs:
top_result_index = np.argmax(probas)
print(labels[top_result_index], "with score:", probas[top_result_index])
📚 Documentation
Full precision ONNX version
onnx/model.onnx
is the full - precision ONNX version:
- It has identical accuracy/metrics to the original Transformers model.
- It has the same model size (499MB).
- It is faster in inference than normal Transformers, particularly for smaller batch sizes. In tests on an 8 - core 11th - gen i7 CPU using ONNXRuntime, it is about 2x to 3x as fast for a batch size of 1.
Metrics
Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label:
Property |
Details |
Accuracy |
0.474 |
Precision |
0.575 |
Recall |
0.396 |
F1 |
0.450 |
See more details in the SamLowe/roberta - base - go_emotions model card for the increases possible through selecting label - specific thresholds to maximise F1 scores, or another metric.
Quantized (INT8) ONNX version
onnx/model_quantized.onnx
is the int8 quantized version:
- It is one - quarter the size (125MB) of the full - precision model.
- It delivers almost all of the accuracy.
- It is faster in inference than both the full - precision ONNX and the normal Transformers model. On an 8 - core 11th - gen i7 CPU using ONNXRuntime, it is about 2x as fast as the full - precision model for a batch size of 1, which makes it circa 5x as fast as the full - precision normal Transformers model.
Metrics for Quantized (INT8) Model
Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label:
Property |
Details |
Accuracy |
0.475 |
Precision |
0.582 |
Recall |
0.398 |
F1 |
0.447 |
Note that the metrics are almost identical to the full - precision metrics.
Example notebook: showing usage, accuracy & performance
Notebook with more details to follow.
📄 License
This model is released under the MIT license.