XLMRoberta-Alexa-Intents-Classification开源模型 - 支持51种语言的用户语句意图识别

首页

Xlmroberta Alexa Intents Classification

由 qanastek 开发

基于XLM-RoBERTa的多语言意图分类模型，支持51种语言，用于识别用户语句的意图类别。

文本分类

Transformers

#多语言意图识别 #智能家居控制 #语音助手NLU

下载量 2,413

发布时间 : 5/4/2022

模型简介

该模型是一个多语言意图分类器，基于XLM-RoBERTa架构，训练于MASSIVE数据集，能够识别60种不同的用户意图，适用于智能助手等自然语言理解场景。

模型特点

多语言支持

支持51种语言的意图分类，覆盖全球主要语言

广泛的意图覆盖

能够识别60种不同的用户意图，满足智能助手多样化需求

高精度分类

在多个意图类别上F1分数超过0.9，表现优异

模型能力

多语言意图识别

自然语言理解

文本分类

使用案例

智能助手

闹钟设置

识别用户设置闹钟的请求

F1分数0.8921

音乐播放

识别用户播放音乐的请求

F1分数0.8763

天气查询

识别用户查询天气的请求

F1分数0.9439

智能家居控制

灯光控制

识别用户控制智能家居灯光的请求

Hue灯控制F1分数0.9075

设备开关

识别用户开关智能家居设备的请求

Wemo设备控制F1分数0.9143

🚀 XLMRoberta-Alexa意图分类模型

本项目是一个基于XLMRoberta的多语言文本意图分类模型，可处理51种语言，适用于自然语言理解中的意图预测和槽位标注任务，为智能语音助手等应用提供支持。

🚀 快速开始

环境准备

需要安装 transformers，使用以下命令进行安装：

pip install transformers

代码示例

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)

res = classifier("réveille-moi à neuf heures du matin le vendredi")
print(res)

输出结果

[{'label': 'alarm_set', 'score': 0.9998375177383423}]

✨ 主要特性

多语言支持：支持51种语言，包括南非荷兰语、阿姆哈拉语、阿拉伯语等，覆盖广泛的语言范围。
丰富的意图分类：可识别60种不同的意图，涵盖了音频控制、音乐播放、日程安排、天气查询等多个领域。
高质量的训练数据：基于 MASSIVE 数据集进行训练，该数据集包含超过100万个跨51种语言的话语。

📦 安装指南

确保你已经安装了Python环境，然后使用以下命令安装所需的依赖库：

pip install transformers

💻 使用示例

基础用法

from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline

model_name = 'qanastek/XLMRoberta-Alexa-Intents-Classification'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)

res = classifier("réveille-moi à neuf heures du matin le vendredi")
print(res)

高级用法

你可以根据需要修改输入的文本，以测试不同的意图分类：

# 测试不同的输入文本
res = classifier("我想听周杰伦的歌")
print(res)

📚 详细文档

训练数据

MASSIVE 是一个并行数据集，包含超过100万个跨51种语言的话语，用于自然语言理解中的意图预测和槽位标注任务。这些话语涵盖了60种意图和55种槽位类型。MASSIVE 是通过本地化 SLURP 数据集创建的，SLURP 数据集由通用智能语音助手的单轮交互组成。

意图分类

模型可以识别以下60种意图：

audio_volume_other
play_music
iot_hue_lighton
general_greet
calendar_set
audio_volume_down
social_query
audio_volume_mute
iot_wemo_on
iot_hue_lightup
audio_volume_up
iot_coffee
takeaway_query
qa_maths
play_game
cooking_query
iot_hue_lightdim
iot_wemo_off
music_settings
weather_query
news_query
alarm_remove
social_post
recommendation_events
transport_taxi
takeaway_order
music_query
calendar_query
lists_query
qa_currency
recommendation_movies
general_joke
recommendation_locations
email_querycontact
lists_remove
play_audiobook
email_addcontact
lists_createoradd
play_radio
qa_stock
alarm_query
email_sendemail
general_quirky
music_likeness
cooking_recipe
email_query
datetime_query
transport_traffic
play_podcasts
iot_hue_lightchange
calendar_remove
transport_query
transport_ticket
qa_factoid
iot_cleaning
alarm_set
datetime_convert
iot_hue_lightoff
qa_definition
music_dislikeness

评估结果

以下是模型在各个意图分类上的评估结果：

                          precision    recall  f1-score   support

             alarm_query     0.9661    0.9037    0.9338      1734
            alarm_remove     0.9484    0.9608    0.9545      1071
               alarm_set     0.8611    0.9254    0.8921      2091
       audio_volume_down     0.8657    0.9537    0.9075       561
       audio_volume_mute     0.8608    0.9130    0.8861      1632
      audio_volume_other     0.8684    0.5392    0.6653       306
         audio_volume_up     0.7198    0.8446    0.7772       663
          calendar_query     0.7555    0.8229    0.7878      6426
         calendar_remove     0.8688    0.9441    0.9049      3417
            calendar_set     0.9092    0.9014    0.9053     10659
           cooking_query     0.0000    0.0000    0.0000         0
          cooking_recipe     0.9282    0.8592    0.8924      3672
        datetime_convert     0.8144    0.7686    0.7909       765
          datetime_query     0.9152    0.9305    0.9228      4488
        email_addcontact     0.6482    0.8431    0.7330       612
             email_query     0.9629    0.9319    0.9472      6069
      email_querycontact     0.6853    0.8032    0.7396      1326
         email_sendemail     0.9530    0.9381    0.9455      5814
           general_greet     0.1026    0.3922    0.1626        51
            general_joke     0.9305    0.9123    0.9213       969
          general_quirky     0.6984    0.5417    0.6102      8619
            iot_cleaning     0.9590    0.9359    0.9473      1326
              iot_coffee     0.9304    0.9749    0.9521      1836
     iot_hue_lightchange     0.8794    0.9374    0.9075      1836
        iot_hue_lightdim     0.8695    0.8711    0.8703      1071
        iot_hue_lightoff     0.9440    0.9229    0.9334      2193
         iot_hue_lighton     0.4545    0.5882    0.5128       153
         iot_hue_lightup     0.9271    0.8315    0.8767      1377
            iot_wemo_off     0.9615    0.8715    0.9143       918
             iot_wemo_on     0.8455    0.7941    0.8190       510
       lists_createoradd     0.8437    0.8356    0.8396      1989
             lists_query     0.8918    0.8335    0.8617      2601
            lists_remove     0.9536    0.8601    0.9044      2652
       music_dislikeness     0.7725    0.7157    0.7430       204
          music_likeness     0.8570    0.8159    0.8359      1836
             music_query     0.8667    0.8050    0.8347      1785
          music_settings     0.4024    0.3301    0.3627       306
              news_query     0.8343    0.8657    0.8498      6324
          play_audiobook     0.8172    0.8125    0.8149      2091
               play_game     0.8666    0.8403    0.8532      1785
              play_music     0.8683    0.8845    0.8763      8976
           play_podcasts     0.8925    0.9125    0.9024      3213
              play_radio     0.8260    0.8935    0.8585      3672
             qa_currency     0.9459    0.9578    0.9518      1989
           qa_definition     0.8638    0.8552    0.8595      2907
              qa_factoid     0.7959    0.8178    0.8067      7191
                qa_maths     0.8937    0.9302    0.9116      1275
                qa_stock     0.7995    0.9412    0.8646      1326
   recommendation_events     0.7646    0.7702    0.7674      2193
recommendation_locations     0.7489    0.8830    0.8104      1581
   recommendation_movies     0.6907    0.7706    0.7285      1020
             social_post     0.9623    0.9080    0.9344      4131
            social_query     0.8104    0.7914    0.8008      1275
          takeaway_order     0.7697    0.8458    0.8059      1122
          takeaway_query     0.9059    0.8571    0.8808      1785
         transport_query     0.8141    0.7559    0.7839      2601
          transport_taxi     0.9222    0.9403    0.9312      1173
        transport_ticket     0.9259    0.9384    0.9321      1785
       transport_traffic     0.6919    0.9660    0.8063       765
           weather_query     0.9387    0.9492    0.9439      7956

                accuracy                         0.8617    151674
               macro avg     0.8162    0.8273    0.8178    151674
            weighted avg     0.8639    0.8617    0.8613    151674