xlm - roberta - base - language - detection - tfjs Open Source Model

Xlm Roberta Base Language Detection Tfjs

Developed by dnouv

This is a multilingual detection model based on XLM-RoBERTa, supporting recognition of 20 languages.

Text Classification Supports Multiple LanguagesOpen Source License:MIT #Multilingual Detection #ONNX Optimization #Text Classification

Downloads 14

Release Time : 4/7/2025

Model Overview

This model is a fine-tuned version of xlm-roberta-base on a language identification dataset for text classification tasks, capable of recognizing 20 different languages.

Model Features

Multilingual Support

Supports detection of 20 different languages

ONNX Format

Converted to ONNX format for optimized inference performance

High Accuracy

Leverages XLM-RoBERTa's powerful language understanding capabilities

Model Capabilities

Language Detection

Text Classification

Use Cases

Content Moderation

Multilingual Content Classification

Automatically identifies the language of user-generated content

Improves content classification efficiency

Localization Services

Language Identification

Identifies the language of user input to provide appropriate services

Enhances user experience

🚀 ONNX version of papluca/xlm-roberta-base-language-detection

This model is a conversion of papluca/xlm-roberta-base-language-detection to ONNX format. It offers a practical solution for language detection tasks, leveraging the power of the XLM - RoBERTa architecture.

🚀 Quick Start

To start using this model, you need to have the 🤗 Optimum library installed. Here is a simple example of how to load and use the model:

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline


tokenizer = AutoTokenizer.from_pretrained("laiyer/xlm-roberta-base-language-detection-onnx")
model = ORTModelForSequenceClassification.from_pretrained("laiyer/xlm-roberta-base-language-detection-onnx")
classifier = pipeline(
    task="text-classification",
    model=model,
    tokenizer=tokenizer,
    top_k=None,
)

classifier_output = classifier("It's not toxic comment")
print(classifier_output)

✨ Features

Multilingual Support: Supports 20 languages including Arabic, Bulgarian, German, and more.
Fine - Tuned Model: Based on the fine - tuned [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) on the [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset.
ONNX Conversion: Converted to ONNX format using the 🤗 Optimum library for efficient inference.

📦 Installation

Loading the model requires the 🤗 Optimum library installed. You can install it via pip:

pip install optimum

📚 Documentation

Model description

This model is a fine - tuned version of [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) on the [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset. It is an XLM - RoBERTa transformer model with a classification head on top (a linear layer on top of the pooled output). For additional information, please refer to the [xlm - roberta - base](https://huggingface.co/xlm - roberta - base) model card or the paper Unsupervised Cross - lingual Representation Learning at Scale by Conneau et al.

Intended uses & limitations

You can directly use this model as a language detector for sequence classification tasks. Currently, it supports the following 20 languages:

arabic (ar), bulgarian (bg), german (de), modern greek (el), english (en), spanish (es), french (fr), hindi (hi), italian (it), japanese (ja), dutch (nl), polish (pl), portuguese (pt), russian (ru), swahili (sw), thai (th), turkish (tr), urdu (ur), vietnamese (vi), and chinese (zh)

Usage

Optimum

The provided Python code demonstrates how to load the model using the 🤗 Optimum library.

LLM Guard

You can also use this model with the [Language scanner](https://llm - guard.com/input_scanners/language/).

🔧 Technical Details

Model Type: XLM - RoBERTa transformer model with a classification head.
Training Data: [Language Identification](https://huggingface.co/datasets/papluca/language - identification#additional - information) dataset.
Base Model: [papluca/xlm - roberta - base - language - detection](https://huggingface.co/papluca/xlm - roberta - base - language - detection)

Property	Details
Model Type	XLM - RoBERTa transformer model with a classification head
Training Data	Language Identification dataset
Base Model	papluca/xlm - roberta - base - language - detection

📄 License

This model is released under the MIT license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご