Model Overview
Model Features
Model Capabilities
Use Cases
đ Massively Multilingual Speech (MMS) - Finetuned LID
This model is fine - tuned for speech language identification (LID), enabling accurate recognition of spoken languages across a wide range of linguistic diversity.
đ Quick Start
This MMS checkpoint can be used with Transformers to identify the spoken language of an audio. It can recognize 4017 languages.
First, we install transformers and some other libraries:
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
Note: In order to use MMS you need to have at least transformers >= 4.30
installed. If the 4.30
version is not yet available on PyPI make sure to install transformers
from source:
pip install git+https://github.com/huggingface/transformers.git
Next, we load a couple of audio samples via datasets
. Make sure that the audio data is sampled to 16000 kHz.
from datasets import load_dataset, Audio
# English
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]
# Arabic
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "ar", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
ar_sample = next(iter(stream_data))["audio"]["array"]
Next, we load the model and processor:
from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch
model_id = "facebook/mms-lid-4017"
processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id)
Now we process the audio data, pass the processed audio data to the model to classify it into a language, just like we usually do for Wav2Vec2 audio classification models such as ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition:
# English
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
# 'eng'
# Arabic
inputs = processor(ar_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
# 'ara'
To see all the supported languages of a checkpoint, you can print out the language ids as follows:
processor.id2label.values()
For more details about the architecture, please have a look at the official docs.
⨠Features
- Multilingual Support: This model supports 4017 languages, allowing for extensive language recognition across the globe.
- Fine - Tuned for LID: It is fine - tuned for speech language identification, providing accurate results for spoken language classification.
đĻ Installation
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
If the 4.30
version of transformers
is not available on PyPI, install it from source:
pip install git+https://github.com/huggingface/transformers.git
đģ Usage Examples
Basic Usage
# First, install necessary libraries
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
# Load audio samples
from datasets import load_dataset, Audio
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]
# Load model and processor
from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch
model_id = "facebook/mms-lid-4017"
processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id)
# Process audio and classify language
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
Advanced Usage
# You can loop through multiple audio samples to classify languages in batch
from datasets import load_dataset, Audio
from transformers import Wav2Vec2ForSequenceClassification, AutoFeatureExtractor
import torch
model_id = "facebook/mms-lid-4017"
processor = AutoFeatureExtractor.from_pretrained(model_id)
model = Wav2Vec2ForSequenceClassification.from_pretrained(model_id)
languages = ["en", "ar"]
for lang in languages:
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", lang, split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
sample = next(iter(stream_data))["audio"]["array"]
inputs = processor(sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
lang_id = torch.argmax(outputs, dim=-1)[0].item()
detected_lang = model.config.id2label[lang_id]
print(f"Detected language for {lang}: {detected_lang}")
đ Documentation
This checkpoint is a model fine - tuned for speech language identification (LID) and part of Facebook's Massive Multilingual Speech project. It is based on the Wav2Vec2 architecture and classifies raw audio input to a probability distribution over 4017 output classes (each class representing a language). The checkpoint consists of 1 billion parameters and has been fine - tuned from facebook/mms-1b on 4017 languages.
đ§ Technical Details
- Model Architecture: Based on the Wav2Vec2 architecture.
- Parameters: The checkpoint consists of 1 billion parameters.
- Fine - Tuning: Fine - tuned from facebook/mms-1b on 4017 languages.
đ License
This model is licensed under cc-by-nc-4.0
.
đ Additional Information
Tags
- mms
Supported Languages
This model supports 4017 languages. Unclick the following to toggle all supported languages of this checkpoint in ISO 639-3 code. You can find more details about the languages and their ISO 649 - 3 codes in the MMS Language Coverage Overview.
Click to toggle
- ara
- cmn
- eng
- spa
- fra
- mlg
- swe
- ful
- por
- vie
- sun
- zlm
- ben
- kor
- tuk
- hin
- asm
- ind
- urd
- swh
- aze
- hau
- som
- mon
- tel
- bod
- rus
- tat
- tgl
- slv
- tur
- mar
- heb
- tha
- ron
- yor
- bel
- mal
- cat
- amh
- bul
- hat
- mkd
- pol
- nld
- hun
- tam
- hrv
- fas
- afr
- nya
- cym
- isl
- orm
- kmr
- lin
- jav
- snd
- nob
- uzb
- bos
- deu
- lit
- mya
- lat
- grn
- kaz
- npi
- kik
- ell
- sqi
- yue
- cak
- hye
- kat
- kan
- jpn
- pan
- lav
- guj
- ces
- tgk
- khm
- bak
- ukr
- che
- fao
- mam
- xog
- glg
- ltz
- quc
- aka
- lao
- crh
- sna
- mlt
- poh
- sin
- cfm
- ixl
- aiw
- mri
- tuv
- gag
- pus
- ita
- srp
- lug
- eus
- kia
- nno
- nhx
- gur
- ory
- luo
- sxn
- xsm
- cmo
- kbp
- slk
- ewe
- dtp
- fin
- acr
- ayo
- quy
- saq
- quh
- rif
- bre
- bqc
- tzj
- mos
- bwq
- yao
- cac
- xon
- new
- yid
- hne
- dan
- hus
- dyu
- uig
- pse
- bam
- bus
- ttq
- ngl
- est
- txa
- tso
- gng
- seh
- wlx
- sck
- rjs
- ntm
- lok
- tcc
- mup
- dga
- lis
- kru
- cnh
- bxk
- mnk
- amf
- dos
- guh
- rmc
- rel
- zne
- teo
- mzi
- tpi
- ycl
- bis
- xsr
- ddn
- thl
- wal
- ctg
- onb
- uhn
- gbo
- vmw
- beh
- mip
- lnd
- khg
- bfz
- ifa
- gna
- rol
- nzi
- ceb
- lcp
- kml
- sxb
- nym
- acn
- bfo
- mhy
- adx
- mqj
- bbc
- pmf
- dsh
- bfy
- sid
- nko
- grc
- bno
- bfa
- pxm
- sda
- oku
- mbu
- pwg
- qxl
- ndv
- nmz
- soy
- vut
- tzh
- mcr
- box
- iri
- nxq
- ayr
- ikk
- bgq
- bbo
- gof
- bmq
- kdt
- cla
- asa
- lew
- war
- muv
- bqp
- kfx
- zpu
- txu
- xal
- fon
- maj
- mag
- kle
- alp
- hlb
- any
- poe
- bjw
- rro
- pil
- rej
- lbw
- bdu
- dgi
- mgo
- mkl
- mco
- maa
- vif
- btd
- kcg
- tng
- pls
- kdl
- tzo
- pui
- pap
- lns
- kyb
- ksb
- akp
- zar
- gil
- blt
- ctd
- mhx
- mtg
- gud
- hnn
- kek
- mxt
- frd
- bmv
- krc
- suz
- ndy
- pny
- ava
- cwa
- icr
- mcp
- hyw
- bov
- hlt
- zim
- dnw
- naw
- udm
- xed
- kqp
- kpv
- bkd
- xnj
- atb
- cwe
- lje
- nog
- kij
- ttc
- sch
- mqn
- mbj
- btx
- atg
- ife
- bgw
- gri
- trs
- kjh
- bhz
- moz
- mjv
- kyf
- chv
- ati
- ybb
- did
- gau
- dnj
- bzh
- kbo
- cle
- pib
- crs
- had
- nhy
- yba
- zpz
- yka
- tgo
- dgk
- mgd
- twb
- lon
- cek
- tuo
- cab
- muy
- rug
- taq
- tex
- tlj
- sne
- smo
- nsu
- enx
- bqj
- nin
- cnl
- btt
- nlc
- nlg
- mcq
- tly
- mge
- prk
- ium
- zpt
- aeu
- eka
- mfk
- akb
- mxb
- cso
- kak
- yre
- obo
- tgj
- abi
- yas
- men
- nga
- blh
- kdc
- cmr
- zyp
- bom
- aia
- zpg
- yea
- xuo
- ubl
- hwc
- xtm
- mhr
- avn
- log
- xsb
- kri
- idd
- mnw
- plw
- nuj
- ted
- guq
- sbp
- lln
- blx
- tmc
- knb
- kwf
- met
- rkt
- mib
- miy
- lsi
- zaj
- mih
- myv
- luc
- rnl
- tob
- mpm
- kfw
- kne
- asg
- pps
- ake
- amk
- flr
- trn
- tom
- yat
- tna
- xmm
- poi
- qxr
- myy
- bep
- tte
- zmz
- kqe
- sjm
- kwd
- abp
- kmd
- tih
- pez
- urt
- mim
- knj
- gqr
- kvn
- suc
- med
- ury
- kpq
- tbl
- mto
- kzf
- lex
- bdh
- zpc
- hoc
- mbc
- moa
- krs
- mej
- snp
- nlk
- wsg
- zaq
- far
- rap
- bmr
- gwr
- not
- yaz
- ess
- pss
- cgc
- dbq
- gub
- kje
- azg
- ktj
- sil
- kqr
- tnt
- pjt
- cul
- tmf
- tav
- stn
- cjo
- mil
- kir
- cbi
- dav
- gai
- sey
- ppk
- xtd
- pis
- qvh
- cbr
- mai
- omw
- tao
- prt
- tnr
- tlb
- kin
- ami
- agu
- cok
- san
- kaq
- lif
- arl
- tvw
- atq
- iba
- knk
- wap
- kog
- rub
- tuf
- zga
- crt
- jun
- yal
- ksr
- boj
- run
- tye
- cpu
- ngu
- huu
- mcd
- byr
- dah
- cpb
- nas
- nij
- pkb
- gux
- dig
- gog
- gbm
- kmu
- tbk
- nhe
- snn
- cui
- lid
- hnj
- ojb
- ubu
- nyy
- sho
- mpd
- tir
- kdj
- gvc
- urb
- awa
- bcc
- cof
- cot
- bgr
- sus
- nan
- ame
- kno
- nyn
- nyf
- bor
- dnt
- grt
- xte
- mdy
- hak
- guo
- ses
- suk
- mqf
- bjr
- bem
- keo
- guk
- mtj
- bbb
- crk
- lam
- kue
- khq
- kus
- lsm
- bwu
- dug
- nsk
- sbd
- kdh
- crq
- sah
- mur
- shn
- spy
- cko
- aha
- mfz
- rmy
- nim
- gjn
- kde
- bsq
- spp
- esi
- kqn
- zyb
- oci
- nnw
- qva
- cly
- rim
- oss
- vag
- bru
- dag
- ade
- gum
- law
- yaa
- tem
- hap
- kaa
- raw
- mpx
- kff
- lhu
- taj
- dyo
- hui
- kbr
- mpg
- mwq
- guc
- niy
- nus
- mzj
- mnx
- tbz
- bib
- quz
- mev
- kma
- ptu
- wme
- lef
- mfi
- bky
- mdm
- mgh
- bvc
- bim
- eip
- mnb
- fij
- maw
- dip
- qul
- bgc
- mxv
- thf
- bud
- dzo
- lom
- ztq
- urk
- mfq
- ach
- las
- kyz
- nia
- sgb
- tpm
- tbt
- dgo
- qvo
- zab
- dik
- pbb
- cas
- kac
- dop
- pcm
- shk
- xnr
- zpo
- ktb
- bba
- sba
- myb
- quw
- emp
- ctu
- gbk
- guw
- blz
- nst
- cnt
- ilo
- cme
- yan
- srx
- qvm
- mhi
- mzw
- fal
- zao
- set
- csk
- wol
- nnb
- zas
- zaw
- cap
- mgq
- yam
- sig
- kam
- biv
- laj
- otq
- pce
- mwv
- mak
- bvz
- kfb
- alz
- dwr
- hif
- hag
- kao
- rav
- mor
- lme
- nav
- lob
- cax
- cdj
- upv
- dhi
- knf
- mad
- kfy
- alt
- tgw
- ceg
- wwa
- ljp
- myk
- acd
- jow
- sag
- ntr
- kbq
- jiv
- mxq
- ahk
- kab
- mie
- car
- nfr
- mfe
- cni
- led
- mbb
- twu
- nag
- cya
- kum
- tsz
- cco
- mnf
- prf
- bgt
- nhu
- mzm
- trq
- ken
- ker
- bpr
- cou
- kyq
- pkr
- xpe
- zpl
- kyc
- enb
- yva
- zad
- bcl
- bex
- huv
- sas
- ruf
- srn
- vun
- gor
- tik
- xtn
- gmv
- kez
- sld
- kss
- vid
- old
- nod
- kxm
- lia
- izr
- ozm
- bfd
- acf
- thk
- mah
- sgw
- mfh
- daa
- yuz
- ifb
- jmc
- nyo
- anv
- cbt
- myx
- zai
- nhw
- tby
- ncu
- nhi
- adj
- wba
- usp
- lgg
- irk
- iou
- tca
- mjl
- ote
- kpz
- bdq
- qub
- jam
- agr
- zpi
- sml
- soq
- mvp
- kxc
- bsc
- hay
- dyi
- ilb
- itv
- hil
- bkv
- poy
- tgp
- awb
- cuk
- miz
- bmu
- txq
- gyr
- kdi
- zpm
- adh
- npl
- tue
- mrw
- lee
- bss
- pam
- aaz
- kqy
- pau
- key
- cpa
- alj
- kkj
- tap
- sbl
- qvw
- yua
- ziw
- xrb
- msy
- mcu
- sur
- heh
- con
- lwo
- gej
- gnd
- ace
- zos
- agd
- bci
- cce
- toc
- mbt
- shi
- tll
- cjp
- kjb
- toi
- pbi
- ann
- krl
- bht
- vmy
- bst
- gkn
- klv
- nwb
- bng
- shp
- pag
- jbu
- klu
- gso
- kyu
- mio
- ngp
- zaa
- eza
- omi
- izz
- loq
- pww
- udu
- miq
- tnk
- min
- pab
- cuc
- mca
- agn
- lem
- bav
- bzj
- jac
- gbi
- pko
- noa
- dts
- bnp
- gxx
- haw
- ood
- qxh
- bts
- crn
- krj
- umb
- sgj
- tbc
- tpp
- zty
- kki
- rai
- qwh
- kub
- ndj
- hns
- chz
- ksp
- qvn
- gde
- mfy
- bjv
- udg
- mpp
- sja
- cbs
- ese
- ded
- rng
- bao
- muh
- mif
- cwt
- wmw
- ign
- acu
- ndp
- mir
- bzi
- bps
- ycn
- snw
- jnj
- ifu
- iqw
- djk
- lip
- gvl
- kdn
- mzk
- tnn
- toh
- apb
- qxn
- nnq
- rmo
- xsu
- ncj
- nyu
- mop
- mrj
- tpt
- wob
- ifk
- mog
- ter
- bcw
- boa
- stp
- hig
- mit
- maz
- way
- tee
- ban
- srm
- pao
- pbc
- mas
- mda
- nse
- gym
- tri
- hto
- mfx
- hno
- bgd
- cbc
- mqb
- yli
- gwi
- tac
- cbv
- bxg
- npy
- qvs
- ura
- nch
- hub
- coe
- ibg
- pir
- mbh
- mey
- meq
- zae
- neb
- ldi
- ify
- qvz
- zca
- gam
- pad
- jvn
- kwi
- tfr
- ata
- bxh
- mox
- nab
- ndz
- sri
- guu
- quf
- csy
- yad
- cbu
- mza
- inb
- qve
- qvc
- waw
- saj
- caa
- wbi
- alw
- lgl
- jic
- lac
- apr
- azz
- cnw
- tos
- qxo
- ibo
- des
- nca
- mkw
- avu
- otn
- stb
- kby
- xho
- bcq
- pae
- aui
- lnl
- tbg
- tnc
- guz
- ksw
- syl
- tyv
- lww
- zul
- lai
- mww
- mcb
- loz
- beq
- mer
- mwt
- arn
- ore
- bza
- lun
- lbj
- apf
- bto
- mnh
- sab
- kxf
- pov
- nbw
- ckb
- bdg
- epo
- sfw
- knc
- tzm
- top
- lus
- ige
- tum
- gvr
- bjz
- csh
- xdy
- bho
- abk
- ijc
- nso
- vai
- neq
- gkp
- dje
- bev
- jen
- lub
- ndc
- lrc
- qug
- aca
- bax
- bum
- srr
- tiv
- sea
- maf
- pci
- xkl
- rhg
- dgr
- bft
- ngc
- tew
- lua
- kck
- awn
- cag
- lag
- tdy
- ada
- soe
- swk
- mni
- pdt
- ebu
- bwr
- etu
- krw
- gaa
- isn
- sru
- mkn
- gle
- ubr
- mzz
- mug
- kqs
- ipi
- ssn
- ida
- kvj
- knp
- trc
- zza
- nzb
- mcn
- wed
- lol
- lic
- jaa
- zpq
- skr
- dww
- rml
- ggu
- hdy
- ktu
- mgw
- lmp
- mfa
- enl
- cje
- ijn
- mwm
- vmk
- mua
- ngb
- dur
- nup
- tsc
- bkm
- kpm
- ayz
- wim
- idu
- ksf
- kqf
- kea
- urh
- ksz
- mro
- ego
- gya
- kfc
- nnc
- mrt
- ndi
- ena
- ogo
- tui
- bhi
- bzw
- elm
- okr
- its
- adi
- geb
- dow
- kng
- mhw
- mgr
- ast
- igb
- kfi
- dzg
- mzl
- gvs
- ncl
- rao
- kmb
- krr
- sat
- unr
- ald
- bhb
- glk
- gnn
- iso
- sef
- bin
- sgc
- coh
- dua
- aoi
- swp
- viv
- acz
- nbq
- wbp
- gvo
- giz
- tod
- mwf
- khe
- dks
- kaj
- wlo
- ady
- ntj
- emk
- suj
- lzz
- snf
- tvs
- wnc
- jra
- zav
- bbj
- mhu
- kel
- njz
- tuy
- efi
- lgr
- bmk
- dhg
- lgm
- tdh
Datasets
- google/fleurs
Metrics
- acc







