XLMRoberta - Alexa - Intents - Classificationオープンソースモデル - 51言語に対応したユーザー発話の意図識別

ホーム

Xlmroberta Alexa Intents Classification

qanastekによって開発

XLM - RoBERTaに基づく多言語意図分類モデルで、51言語をサポートし、ユーザーの発話の意図カテゴリを識別するために使用されます。

テキスト分類

Transformers

#多言語意図識別 #スマートホーム制御 #音声アシスタントNLU

ダウンロード数 2,413

リリース時間 : 5/4/2022

モデル概要

このモデルは多言語意図分類器で、XLM - RoBERTaアーキテクチャに基づき、MASSIVEデータセットで訓練され、60種類の異なるユーザー意図を識別でき、スマートアシスタントなどの自然言語理解シーンに適しています。

モデル特徴

多言語対応

51言語の意図分類をサポートし、世界の主要言語を網羅します。

広範な意図カバレッジ

60種類の異なるユーザー意図を識別でき、スマートアシスタントの多様なニーズを満たします。

高精度分類

複数の意図カテゴリでF1スコアが0.9を超え、優れた性能を発揮します。

モデル能力

多言語意図識別

自然言語理解

テキスト分類

使用事例

スマートアシスタント

アラーム設定

ユーザーのアラーム設定要求を識別します。

F1スコア0.8921

音楽再生

ユーザーの音楽再生要求を識別します。

F1スコア0.8763

天気検索

ユーザーの天気検索要求を識別します。

F1スコア0.9439

スマートホーム制御

照明制御

ユーザーのスマートホーム照明制御要求を識別します。

Hue照明制御F1スコア0.9075

機器のオンオフ

ユーザーのスマートホーム機器のオンオフ要求を識別します。

Wemo機器制御F1スコア0.9143

🚀 XLMRoberta-Alexa-Intents-Classification

このモデルは、多言語のテキスト分類と意図分類を行うためのTransformerベースのモデルです。多言語に対応し、様々な自然言語理解タスクに適用できます。

🚀 クイックスタート

このモデルをHuggingFace Transformers Pipelineで使用するには、以下の手順に従ってください。

インストール

transformers ライブラリをインストールする必要があります。

pip install transformers

コード例

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)

res = classifier("réveille-moi à neuf heures du matin le vendredi")
print(res)

出力例

[{'label': 'alarm_set', 'score': 0.9998375177383423}]

✨ 主な機能

多言語対応：51言語に対応しており、多言語のテキスト分類と意図分類が可能です。
高精度：多言語の意図予測とスロット注釈に対して高精度な結果を提供します。

📦 インストール

transformers ライブラリをインストールする必要があります。

pip install transformers

💻 使用例

基本的な使用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)

res = classifier("réveille-moi à neuf heures du matin le vendredi")
print(res)

高度な使用法

高度なシナリオでは、モデルを微調整することができます。ただし、その詳細はここでは説明しません。

📚 ドキュメント

訓練データ

MASSIVE は、51言語にわたる100万以上の発話を含む並列データセットです。意図予測とスロット注釈の自然言語理解タスクに対する注釈が付けられています。発話は60の意図をカバーし、55のスロットタイプを含みます。MASSIVEは、一般的なインテリジェントボイスアシスタントのシングルショット対話で構成されるSLURPデータセットをローカライズすることで作成されました。

意図の種類

audio_volume_other
play_music
iot_hue_lighton
general_greet
calendar_set
audio_volume_down
social_query
audio_volume_mute
iot_wemo_on
iot_hue_lightup
audio_volume_up
iot_coffee
takeaway_query
qa_maths
play_game
cooking_query
iot_hue_lightdim
iot_wemo_off
music_settings
weather_query
news_query
alarm_remove
social_post
recommendation_events
transport_taxi
takeaway_order
music_query
calendar_query
lists_query
qa_currency
recommendation_movies
general_joke
recommendation_locations
email_querycontact
lists_remove
play_audiobook
email_addcontact
lists_createoradd
play_radio
qa_stock
alarm_query
email_sendemail
general_quirky
music_likeness
cooking_recipe
email_query
datetime_query
transport_traffic
play_podcasts
iot_hue_lightchange
calendar_remove
transport_query
transport_ticket
qa_factoid
iot_cleaning
alarm_set
datetime_convert
iot_hue_lightoff
qa_definition
music_dislikeness

評価結果

                          precision    recall  f1-score   support

             alarm_query     0.9661    0.9037    0.9338      1734
            alarm_remove     0.9484    0.9608    0.9545      1071
               alarm_set     0.8611    0.9254    0.8921      2091
       audio_volume_down     0.8657    0.9537    0.9075       561
       audio_volume_mute     0.8608    0.9130    0.8861      1632
      audio_volume_other     0.8684    0.5392    0.6653       306
         audio_volume_up     0.7198    0.8446    0.7772       663
          calendar_query     0.7555    0.8229    0.7878      6426
         calendar_remove     0.8688    0.9441    0.9049      3417
            calendar_set     0.9092    0.9014    0.9053     10659
           cooking_query     0.0000    0.0000    0.0000         0
          cooking_recipe     0.9282    0.8592    0.8924      3672
        datetime_convert     0.8144    0.7686    0.7909       765
          datetime_query     0.9152    0.9305    0.9228      4488
        email_addcontact     0.6482    0.8431    0.7330       612
             email_query     0.9629    0.9319    0.9472      6069
      email_querycontact     0.6853    0.8032    0.7396      1326
         email_sendemail     0.9530    0.9381    0.9455      5814
           general_greet     0.1026    0.3922    0.1626        51
            general_joke     0.9305    0.9123    0.9213       969
          general_quirky     0.6984    0.5417    0.6102      8619
            iot_cleaning     0.9590    0.9359    0.9473      1326
              iot_coffee     0.9304    0.9749    0.9521      1836
     iot_hue_lightchange     0.8794    0.9374    0.9075      1836
        iot_hue_lightdim     0.8695    0.8711    0.8703      1071
        iot_hue_lightoff     0.9440    0.9229    0.9334      2193
         iot_hue_lighton     0.4545    0.5882    0.5128       153
         iot_hue_lightup     0.9271    0.8315    0.8767      1377
            iot_wemo_off     0.9615    0.8715    0.9143       918
             iot_wemo_on     0.8455    0.7941    0.8190       510
       lists_createoradd     0.8437    0.8356    0.8396      1989
             lists_query     0.8918    0.8335    0.8617      2601
            lists_remove     0.9536    0.8601    0.9044      2652
       music_dislikeness     0.7725    0.7157    0.7430       204
          music_likeness     0.8570    0.8159    0.8359      1836
             music_query     0.8667    0.8050    0.8347      1785
          music_settings     0.4024    0.3301    0.3627       306
              news_query     0.8343    0.8657    0.8498      6324
          play_audiobook     0.8172    0.8125    0.8149      2091
               play_game     0.8666    0.8403    0.8532      1785
              play_music     0.8683    0.8845    0.8763      8976
           play_podcasts     0.8925    0.9125    0.9024      3213
              play_radio     0.8260    0.8935    0.8585      3672
             qa_currency     0.9459    0.9578    0.9518      1989
           qa_definition     0.8638    0.8552    0.8595      2907
              qa_factoid     0.7959    0.8178    0.8067      7191
                qa_maths     0.8937    0.9302    0.9116      1275
                qa_stock     0.7995    0.9412    0.8646      1326
   recommendation_events     0.7646    0.7702    0.7674      2193
recommendation_locations     0.7489    0.8830    0.8104      1581
   recommendation_movies     0.6907    0.7706    0.7285      1020
             social_post     0.9623    0.9080    0.9344      4131
            social_query     0.8104    0.7914    0.8008      1275
          takeaway_order     0.7697    0.8458    0.8059      1122
          takeaway_query     0.9059    0.8571    0.8808      1785
         transport_query     0.8141    0.7559    0.7839      2601
          transport_taxi     0.9222    0.9403    0.9312      1173
        transport_ticket     0.9259    0.9384    0.9321      1785
       transport_traffic     0.6919    0.9660    0.8063       765
           weather_query     0.9387    0.9492    0.9439      7956

                accuracy                         0.8617    151674
               macro avg     0.8162    0.8273    0.8178    151674
            weighted avg     0.8639    0.8617    0.8613    151674