đ ONNX version of papluca/xlm-roberta-base-language-detection
This model is a conversion of papluca/xlm-roberta-base-language-detection to ONNX format. It offers a practical solution for language detection tasks, leveraging the power of the XLM - RoBERTa architecture.
đ Quick Start
To start using this model, you need to have the đ¤ Optimum library installed. Here is a simple example of how to load and use the model:
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("laiyer/xlm-roberta-base-language-detection-onnx")
model = ORTModelForSequenceClassification.from_pretrained("laiyer/xlm-roberta-base-language-detection-onnx")
classifier = pipeline(
task="text-classification",
model=model,
tokenizer=tokenizer,
top_k=None,
)
classifier_output = classifier("It's not toxic comment")
print(classifier_output)
⨠Features
- Multilingual Support: Supports 20 languages including Arabic, Bulgarian, German, and more.
- Fine - Tuned Model: Based on the fine - tuned [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) on the [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset.
- ONNX Conversion: Converted to ONNX format using the đ¤ Optimum library for efficient inference.
đĻ Installation
Loading the model requires the đ¤ Optimum library installed. You can install it via pip:
pip install optimum
đ Documentation
Model description
This model is a fine - tuned version of [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) on the [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset. It is an XLM - RoBERTa transformer model with a classification head on top (a linear layer on top of the pooled output). For additional information, please refer to the [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) model card or the paper Unsupervised Cross - lingual Representation Learning at Scale by Conneau et al.
Intended uses & limitations
You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages:
arabic (ar), bulgarian (bg), german (de), modern greek (el), english (en), spanish (es), french (fr), hindi (hi), italian (it), japanese (ja), dutch (nl), polish (pl), portuguese (pt), russian (ru), swahili (sw), thai (th), turkish (tr), urdu (ur), vietnamese (vi), and chinese (zh)
Usage
Optimum
The provided Python code demonstrates how to load the model using the đ¤ Optimum library.
LLM Guard
You can also use this model with the [Language scanner](https://llm - guard.com/input_scanners/language/).
đ§ Technical Details
- Model Type: XLM - RoBERTa transformer model with a classification head.
- Training Data: [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset.
- Base Model: [papluca/xlm - roberta - base - language - detection](https://huggingface.co/papluca/xlm - roberta - base - language - detection)
Property |
Details |
Model Type |
XLM - RoBERTa transformer model with a classification head |
Training Data |
Language Identification dataset |
Base Model |
papluca/xlm - roberta - base - language - detection |
đ License
This model is released under the MIT license.