XLMRoberta-Alexa-Intents-NER-NLU開源模型 - 支持51種語言意圖與實體識別

首頁

Xlmroberta Alexa Intents NER NLU

由qanastek開發

基於XLM-Roberta的多語言自然語言理解模型，支持51種語言的意圖識別和命名實體識別

序列標註

Transformers

#多語言槽位標註 #語音助手NLU #高精度意圖識別

下載量 18

發布時間 : 5/8/2022

模型概述

該模型是一個多語言序列標註模型，專門用於語音助手的自然語言理解任務，能夠識別60種意圖和55種槽位類型。

模型特點

多語言支持

支持51種語言的意圖識別和命名實體識別

廣泛的實體覆蓋

能夠識別55種不同類型的命名實體

高精度識別

在多種實體類型上達到高F1分數，如時間識別F1分數達0.8593

語音助手優化

專門針對語音助手場景優化，覆蓋60種常見意圖

模型能力

意圖識別

命名實體識別

槽位標註

多語言處理

語音指令理解

使用案例

智能語音助手

鬧鐘設置

識別用戶設置鬧鐘的時間和日期

時間識別F1分數0.8593，日期識別F1分數0.8995

媒體播放控制

識別用戶請求播放的歌曲、藝術家或播客

藝術家名稱識別F1分數0.7757，歌曲名稱識別F1分數0.6433

信息查詢

識別用戶查詢的股票、天氣或地點信息

業務名稱識別F1分數0.8075，地點名稱識別F1分數0.8417

多語言應用

跨語言指令理解

在不同語言環境下理解相同意圖的用戶指令

支持51種語言的相同意圖識別

🚀 XLMRoberta-Alexa-Intents-NER-NLU項目

XLMRoberta-Alexa-Intents-NER-NLU是一個用於多語言自然語言理解的模型，支持51種語言，可進行意圖預測和槽位標註等任務，能助力智能語音助手等應用更好地理解用戶輸入。

🚀 快速開始

環境準備

需要安裝 transformers 庫，可使用以下命令進行安裝：

pip install transformers

代碼示例

from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline

tokenizer = AutoTokenizer.from_pretrained('qanastek/XLMRoberta-Alexa-Intents-NER-NLU')
model = AutoModelForTokenClassification.from_pretrained('qanastek/XLMRoberta-Alexa-Intents-NER-NLU')
predict = TokenClassificationPipeline(model=model, tokenizer=tokenizer)
res = predict("réveille-moi à neuf heures du matin le vendredi")
print(res)

輸出示例

English - Hebrew - Spanish

[{'word': '▁neuf', 'score': 0.9911066293716431, 'entity': 'B-time', 'index': 6, 'start': 15, 'end': 19},
{'word': '▁heures', 'score': 0.9200698733329773, 'entity': 'I-time', 'index': 7, 'start': 20, 'end': 26},
{'word': '▁du', 'score': 0.8476170897483826, 'entity': 'I-time', 'index': 8, 'start': 27, 'end': 29},
{'word': '▁matin', 'score': 0.8271021246910095, 'entity': 'I-time', 'index': 9, 'start': 30, 'end': 35},
{'word': '▁vendredi', 'score': 0.9813069701194763, 'entity': 'B-date', 'index': 11, 'start': 39, 'end': 47}]

📦 安裝指南

安裝所需的 transformers 庫，使用以下命令：

pip install transformers

📚 詳細文檔

訓練數據

MASSIVE 是一個包含超過100萬條跨51種語言的平行數據集，帶有用於自然語言理解任務（意圖預測和槽位標註）的標註信息。這些語句涵蓋60種意圖，幷包含55種槽位類型。MASSIVE 是通過對 SLURP 數據集進行本地化創建的，SLURP 數據集由通用的智能語音助手單輪交互組成。

命名實體

O
currency_name
personal_info
app_name
list_name
alarm_type
cooking_type
time_zone
media_type
change_amount
transport_type
drink_type
news_topic
artist_name
weather_descriptor
transport_name
player_setting
email_folder
music_album
coffee_type
meal_type
song_name
date
movie_type
movie_name
game_name
business_type
music_descriptor
joke_type
music_genre
device_type
house_place
place_name
sport_type
podcast_name
game_type
timeofday
business_name
time
definition_word
audiobook_author
event_name
general_frequency
relation
color_type
audiobook_name
food_type
person
transport_agency
email_address
podcast_descriptor
order_type
ingredient
transport_descriptor
playlist_name
radio_name

評估結果

                      precision    recall  f1-score   support

                   O     0.9537    0.9498    0.9517   1031927
          alarm_type     0.8214    0.1800    0.2953       511
            app_name     0.3448    0.5318    0.4184       660
         artist_name     0.7143    0.8487    0.7757     11413
    audiobook_author     0.7038    0.2971    0.4178      1232
      audiobook_name     0.7271    0.5381    0.6185      5090
       business_name     0.8301    0.7862    0.8075     15385
       business_type     0.7009    0.6196    0.6577      4600
       change_amount     0.8179    0.9104    0.8617      1663
         coffee_type     0.6147    0.8322    0.7071       876
          color_type     0.6999    0.9176    0.7941      2890
        cooking_type     0.7037    0.5184    0.5970      1003
       currency_name     0.8479    0.9686    0.9042      6501
                date     0.8667    0.9348    0.8995     49866
     definition_word     0.9043    0.8135    0.8565      8333
         device_type     0.8502    0.8825    0.8661     11631
          drink_type     0.0000    0.0000    0.0000       131
       email_address     0.9715    0.9747    0.9731      3986
        email_folder     0.5913    0.9740    0.7359       884
          event_name     0.7659    0.7630    0.7645     38625
           food_type     0.6502    0.8697    0.7441     12353
           game_name     0.8974    0.6275    0.7386      4518
   general_frequency     0.8012    0.8673    0.8329      3173
         house_place     0.9337    0.9168    0.9252      7067
          ingredient     0.5481    0.0491    0.0901      1161
           joke_type     0.8147    0.9101    0.8598      1435
           list_name     0.8411    0.7275    0.7802      8188
           meal_type     0.6072    0.8926    0.7227      2282
          media_type     0.8578    0.8522    0.8550     17751
          movie_name     0.4598    0.1856    0.2645       431
          movie_type     0.2673    0.4341    0.3309       364
         music_album     0.0000    0.0000    0.0000       146
    music_descriptor     0.2906    0.3979    0.3359      1053
         music_genre     0.7999    0.7483    0.7732      5908
          news_topic     0.7052    0.5702    0.6306      9265
          order_type     0.6374    0.8845    0.7409      2614
              person     0.8173    0.9376    0.8733     33708
       personal_info     0.7035    0.7444    0.7234      1976
          place_name     0.8616    0.8228    0.8417     38881
      player_setting     0.6429    0.6212    0.6319      5409
       playlist_name     0.5852    0.5293    0.5559      3671
  podcast_descriptor     0.7486    0.5413    0.6283      4951
        podcast_name     0.6858    0.5675    0.6211      3339
          radio_name     0.8196    0.8013    0.8103      9892
            relation     0.6662    0.8569    0.7496      6477
           song_name     0.5617    0.7527    0.6433      7251
          sport_type     0.0000    0.0000    0.0000         0
                time     0.9032    0.8195    0.8593     35456
           time_zone     0.8368    0.4467    0.5824      2823
           timeofday     0.7931    0.8459    0.8187      6140
    transport_agency     0.7876    0.7764    0.7820      1051
transport_descriptor     0.5738    0.2756    0.3723       254
      transport_name     0.8497    0.5149    0.6412      1010
      transport_type     0.9303    0.8980    0.9139      6363
  weather_descriptor     0.8584    0.7466    0.7986     11702

            accuracy                         0.9092   1455270
           macro avg     0.6940    0.6668    0.6613   1455270
        weighted avg     0.9111    0.9092    0.9086   1455270