Open-source model xlm-roberta-base-language-detection-onnx - Free deployment, supporting detection and classification of 20 languages

Xlm Roberta Base Language Detection Onnx

Developed by protectai

This is the ONNX format conversion of the papluca/xlm-roberta-base-language-detection model, designed for multilingual text classification tasks, supporting detection in 20 languages.

Text Classification

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Multilingual Detection #Text Classification #ONNX Optimization

Downloads 6,535

Release Time : 11/14/2023

Model Overview

A language detection model fine-tuned on xlm-roberta-base, capable of identifying text content in 20 different languages.

Model Features

Multilingual Support

Supports detection in 20 different languages, covering major global languages.

ONNX Format

Converted to ONNX format for optimized inference performance and easier deployment.

High Accuracy

Fine-tuned on XLM-RoBERTa-base, delivering excellent performance in language identification tasks.

Model Capabilities

Language Detection

Text Classification

Use Cases

Content Moderation

Multilingual Content Classification

Automatically identifies the language of user-submitted content for subsequent processing.

Accurately identifies text content in 20 languages.

International Applications

Language Routing

Automatically switches to the corresponding language interface or service based on user input.

Enhances multilingual user experience.

🚀 ONNX version of papluca/xlm-roberta-base-language-detection

This model is a conversion of the original language - detection model to ONNX format, offering efficient language - detection capabilities.

🚀 Quick Start

This model is a converted version of papluca/xlm-roberta-base-language-detection to ONNX format using the 🤗 Optimum library.

✨ Features

Multilingual Support: Supports 20 languages including Arabic, Bulgarian, German, etc.
Fine - tuned Model: Based on [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) and fine - tuned on the [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset.
Sequence Classification: Can be directly used for language detection in sequence classification tasks.

📦 Installation

Loading the model requires the 🤗 Optimum library installed.

💻 Usage Examples

Basic Usage

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

tokenizer = AutoTokenizer.from_pretrained("laiyer/xlm-roberta-base-language-detection-onnx")
model = ORTModelForSequenceClassification.from_pretrained("laiyer/xlm-roberta-base-language-detection-onnx")
classifier = pipeline(
    task="text-classification",
    model=model,
    tokenizer=tokenizer,
    top_k=None,
)

classifier_output = classifier("It's not toxic comment")
print(classifier_output)

Advanced Usage

This model can also be used in the [Language scanner](https://llm - guard.com/input_scanners/language/).

📚 Documentation

Model description

This model is a fine - tuned version of [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) on the [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset. It is an XLM - RoBERTa transformer model with a classification head on top (i.e. a linear layer on top of the pooled output). For additional information please refer to the [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) model card or to the paper Unsupervised Cross - lingual Representation Learning at Scale by Conneau et al.

Intended uses & limitations

You can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 20 languages:

arabic (ar), bulgarian (bg), german (de), modern greek (el), english (en), spanish (es), french (fr), hindi (hi), italian (it), japanese (ja), dutch (nl), polish (pl), portuguese (pt), russian (ru), swahili (sw), thai (th), turkish (tr), urdu (ur), vietnamese (vi), and chinese (zh)

🔧 Technical Details

Model Type: XLM - RoBERTa transformer model with a classification head.
Base Model: [papluca/xlm - roberta - base - language - detection](https://huggingface.co/papluca/xlm - roberta - base - language - detection)
Training Data: [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset
Metrics: Accuracy, F1

Property	Details
Model Type	XLM - RoBERTa transformer model with a classification head
Base Model	papluca/xlm - roberta - base - language - detection
Training Data	Language Identification dataset
Metrics	Accuracy, F1

📄 License

This model is released under the MIT license.

👥 Community

Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions, or engage in discussions about LLM security!

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご