Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Massively Multilingual Speech (MMS) - Finetuned ASR - L1107
This checkpoint represents a model fine - tuned for multi - lingual Automatic Speech Recognition (ASR). It's part of Facebook's Massive Multilingual Speech project, capable of transcribing over 1000 languages.
🚀 Quick Start
This MMS checkpoint can be used with Transformers to transcribe audio of 1107 different languages. The following is a step - by - step guide on how to use it:
1. Install Dependencies
First, install transformers
and some other necessary libraries:
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
Note: In order to use MMS you need to have at least transformers >= 4.30
installed. If the 4.30
version is not yet available on PyPI make sure to install transformers
from source:
pip install git+https://github.com/huggingface/transformers.git
2. Load Audio Samples
Next, load a couple of audio samples via datasets
. Make sure that the audio data is sampled to 16000 kHz.
from datasets import load_dataset, Audio
# English
stream_data = load_dataset("mozilla - foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]
# French
stream_data = load_dataset("mozilla - foundation/common_voice_13_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]
3. Load the Model and Processor
from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch
model_id = "facebook/mms - 1b - l1107"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
4. Process Audio and Transcribe
Now process the audio data, pass the processed audio data to the model and transcribe the model output, just like we usually do for Wav2Vec2 models such as [facebook/wav2vec2 - base - 960h](https://huggingface.co/facebook/wav2vec2 - base - 960h)
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# 'joe keton disapproved of films and buster also had reservations about the media'
5. Switch Languages
We can keep the same model in memory and simply switch out the language adapters by calling the convenient load_adapter()
function for the model and set_target_lang()
for the tokenizer. We pass the target language as an input - "fra" for French.
processor.tokenizer.set_target_lang("fra")
model.load_adapter("fra")
inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# "ce dernier est volé tout au long de l'histoire romaine"
In the same way the language can be switched out for all other supported languages. Please have a look at:
processor.tokenizer.vocab.keys()
For more details, please have a look at the official docs.
✨ Features
- Multilingual Support: This model supports 1107 languages, providing a wide - range of language coverage for ASR tasks.
- Adapter - Based: It makes use of adapter models to handle different languages efficiently.
- Fine - Tuned: Finetuned for better performance on multi - lingual ASR tasks.
📦 Installation
The installation steps are included in the Quick Start section. You need to install the necessary libraries via pip
and ensure that transformers
is at least version 4.30
.
📚 Documentation
Supported Languages
This model supports 1107 languages. Unclick the following to toggle all supported languages of this checkpoint in [ISO 639 - 3 code](https://en.wikipedia.org/wiki/ISO_639 - 3). You can find more details about the languages and their ISO 649 - 3 codes in the MMS Language Coverage Overview.
Click to toggle
- abi
- abp
- aca
- acd
- ace
- acf
- ach
- acn
- acr
- acu
- ade
- adh
- adj
- adx
- aeu
- agd
- agg
- agn
- agr
- agu
- agx
- aha
- ahk
- aia
- aka
- akb
- ake
- akp
- alj
- alp
- alt
- alz
- ame
- amf
- amh
- ami
- amk
- ann
- any
- aoz
- apb
- apr
- ara
- arl
- asa
- asg
- asm
- ata
- atb
- atg
- ati
- atq
- ava
- avn
- avu
- awa
- awb
- ayo
- ayr
- ayz
- azb
- azg
- azj - script_cyrillic
- azj - script_latin
- azz
- bak
- bam
- ban
- bao
- bav
- bba
- bbb
- bbc
- bbo
- bcc - script_arabic
- bcc - script_latin
- bcl
- bcw
- bdg
- bdh
- bdq
- bdu
- bdv
- beh
- bem
- ben
- bep
- bex
- bfa
- bfo
- bfy
- bfz
- bgc
- bgq
- bgr
- bgt
- bgw
- bha
- bht
- bhz
- bib
- bim
- bis
- biv
- bjr
- bjv
- bjw
- bjz
- bkd
- bkv
- blh
- blt
- blx
- blz
- bmq
- bmr
- bmu
- bmv
- bng
- bno
- bnp
- boa
- bod
- boj
- bom
- bor
- bov
- box
- bpr
- bps
- bqc
- bqi
- bqj
- bqp
- bru
- bsc
- bsq
- bss
- btd
- bts
- btt
- btx
- bud
- bul
- bus
- bvc
- bvz
- bwq
- bwu
- byr
- bzh
- bzi
- bzj
- caa
- cab
- cac - dialect_sanmateoixtatan
- cac - dialect_sansebastiancoatan
- cak - dialect_central
- cak - dialect_santamariadejesus
- cak - dialect_santodomingoxenacoj
- cak - dialect_southcentral
- cak - dialect_western
- cak - dialect_yepocapa
- cap
- car
- cas
- cat
- cax
- cbc
- cbi
- cbr
- cbs
- cbt
- cbu
- cbv
- cce
- cco
- cdj
- ceb
- ceg
- cek
- cfm
- cgc
- chf
- chv
- chz
- cjo
- cjp
- cjs
- cko
- ckt
- cla
- cle
- cly
- cme
- cmo - script_khmer
- cmo - script_latin
- cmr
- cnh
- cni
- cnl
- cnt
- coe
- cof
- cok
- con
- cot
- cou
- cpa
- cpb
- cpu
- crh
- crk - script_latin
- crk - script_syllabics
- crn
- crq
- crs
- crt
- csk
- cso
- ctd
- ctg
- cto
- ctu
- cuc
- cui
- cuk
- cul
- cwa
- cwe
- cwt
- cya
- cym
- daa
- dah
- dar
- dbj
- dbq
- ddn
- ded
- des
- deu
- dga
- dgi
- dgk
- dgo
- dgr
- dhi
- did
- dig
- dik
- dip
- div
- djk
- dnj - dialect_blowowest
- dnj - dialect_gweetaawueast
- dnt
- dnw
- dop
- dos
- dsh
- dso
- dtp
- dts
- dug
- dwr
- dyi
- dyo
- dyu
- dzo
- eip
- eka
- ell
- emp
- enb
- eng
- enx
- ese
- ess
- eus
- evn
- ewe
- eza
- fal
- fao
- far
- fas
- fij
- fin
- flr
- fmu
- fon
- fra
- frd
- ful
- gag - script_cyrillic
- gag - script_latin
- gai
- gam
- gau
- gbi
- gbk
- gbm
- gbo
- gde
- geb
- gej
- gil
- gjn
- gkn
- gld
- glk
- gmv
- gna
- gnd
- gng
- gof - script_latin
- gog
- gor
- gqr
- grc
- gri
- grn
- grt
- gso
- gub
- guc
- gud
- guh
- guj
- guk
- gum
- guo
- guq
- guu
- gux
- gvc
- gvl
- gwi
- gwr
- gym
- gyr
- had
- hag
- hak
- hap
- hat
- hau
- hay
- heb
- heh
- hif
- hig
- hil
- hin
- hlb
- hlt
- hne
- hnn
- hns
- hoc
- hoy
- hto
- hub
- hui
- hun
- hus - dialect_centralveracruz
- hus - dialect_westernpotosino
- huu
- huv
- hvn
- hwc
- hyw
- iba
- icr
- idd
- ifa
- ifb
- ife
- ifk
- ifu
- ify
- ign
- ikk
- ilb
- ilo
- imo
- inb
- ind
- iou
- ipi
- iqw
- iri
- irk
- isl
- itl
- itv
- ixl - dialect_sangasparchajul
- ixl - dialect_sanjuancotzal
- ixl - dialect_santamarianebaj
- izr
- izz
- jac
- jam
- jav
- jbu
- jen
- jic
- jiv
- jmc
- jmd
- jun
- juy
- jvn
- kaa
- kab
- kac
- kak
- kan
- kao
- kaq
- kay
- kaz
- kbo
- kbp
- kbq
- kbr
- kby
- kca
- kcg
- kdc
- kde
- kdh
- kdi
- kdj
- kdl
- kdn
- kdt
- kek
- ken
- keo
- ker
- key
- kez
- kfb
- kff - script_telugu
- kfw
- kfx
- khg
- khm
- khq
- kia
- kij
- kik
- kin
- kir
- kjb
- kje
- kjg
- kjh
- kki
- kkj
- kle
- klu
- klv
- klw
- kma
- kmd
- kml
- kmr - script_arabic
- kmr - script_cyrillic
- kmr - script_latin
- kmu
- knb
- kne
- knf
- knj
- knk
- kno
- kog
- kor
- kpq
- kps
- kpv
- kpy
- kpz
- kqe
- kqp
- kqr
- kqy
- krc
- kri
- krj
- krl
- krr
- krs
- kru
- ksb
- ksr
- kss
- ktb
- ktj
- kub
- kue
- kum
- kus
- kvn
- kvw
- kwd
- kwf
- kwi
- kxc
- kxf
- kxm
- kxv
- kyb
- kyc
- kyf
- kyg
- kyo
- kyq
- kyu
- kyz
- kzf
- lac
- laj
- lam
- lao
- las
- lat
- lav
- law
- lbj
- lbw
- lcp
- lee
- lef
- lem
- lew
- lex
- lgg
- lgl
- lhu
- lia
- lid
- lif
- lip
- lis
- lje
- ljp
- llg
- lln
- lme
- lnd
- lns
- lob
- lok
- lom
- lon
- loq
- lsi
- lsm
- luc
- lug
- lwo
- lww
- lzz
- maa - dialect_sanantonio
- maa - dialect_sanjeronimo
- mad
- mag
- mah
- mai
- maj
- mak
- mal
- mam - dialect_central
- mam - dialect_northern
- mam - dialect_southern
- mam - dialect_western
- maq
- mar
- maw
- maz
- mbb
- mbc
- mbh
- mbj
- mbt
- mbu
- mbz
- mca
- mcb
- mcd
- mco
- mcp
- mcq
- mcu
- mda
- mdv
- mdy
- med
- mee
- mej
- men
- meq
- met
- mev
- mfe
- mfh
- mfi
- mfk
- mfq
- mfy
- mfz
- mgd
- mge
- mgh
- mgo
- mhi
- mhr
- mhu
- mhx
- mhy
- mib
- mie
- mif
- mih
- mil
- mim
- min
- mio
- mip
- miq
- mit
- miy
- miz
- mjl
- mjv
- mkl
- mkn
- mlg
- mmg
- mnb
- mnf
- mnk
- mnw
- mnx
- moa
- mog
- mon
- mop
- mor
- mos
- mox
- moz
- mpg
- mpm
- mpp
- mpx
- mqb
- mqf
- mqj
- mqn
- mrw
- msy
- mtd
- mtj
- mto
- muh
- mup
- mur
- muv
- muy
- mvp
- mwq
- mwv
- mxb
- mxq
- mxt
- mxv
- mya
- myb
- myk
- myl
- myv
- myx
- myy
- mza
- mzi
- mzj
- mzk
- mzm
- mzw
- nab
- nag
- nan
- nas
- naw
- nca
- nch
- ncj
- ncl
- ncu
- ndj
- ndp
- ndv
- ndy
- ndz
- neb
- new
- nfa
- nfr
- nga
- ngl
- ngp
- ngu
- nhe
- nhi
- nhu
- nhw
- nhx
- nhy
- nia
- nij
- nim
- nin
- nko
- nlc
- nld
- nlg
- nlk
- nmz
- nnb
- nnq
- nnw
- noa
- nod
- nog
- not
- npl
- npy
- nst
- nsu
- ntm
- ntr
- nuj
- nus
- nuz
- nwb
- nxq
- nya
- nyf
- nyn
- nyo
- nyy
- nzi
- obo
- ojb - script_latin
- ojb - script_syllabics
- oku
- old
- omw
- onb
- ood
- orm
- ory
- oss
- ote
- otq
- ozm
- pab
- pad
- pag
- pam
- pan
- pao
- pap
- pau
- pbb
- pbc
- pbi
- pce
- pcm
- peg
- pez
- pib
- pil
- pir
- pis
- pjt
- pkb
- pls
- plw
- pmf
- pny
- poh - dialect_eastern
- poh - dialect_western
- poi
- pol
- por
- poy
- ppk
- pps
- prf
- prk
- prt
- pse
- pss
- ptu
- pui
- pwg
- pww
- pxm
- qub
- quc - dialect_central
- quc - dialect_east
- quc - dialect_north
- quf
- quh
- qul
- quw
- quy
- quz
- qvc
- qve
- qvh
- qvm
- qvn
- qvo
- qvs
- qvw
- qvz
- qwh
- qxh
- qxl
- qxn
- qxo
- qxr
- rah
- rai
- rap
- rav
- raw
- rej
- rel
- rgu
- rhg
- rif - script_arabic
- rif - script_latin
- ril
- rim
- rjs
- rkt
- rmc - script_cyrillic
- rmc - script_latin
- rmo
- rmy - script_cyrillic
- rmy - script_latin
- rng
- rnl
- rol
- ron
- rop
- rro
- rub
- ruf
- rug
- run
- rus
- sab
- sag
- sah
- saj
- saq
- sas
- sba
- sbd
- sbl
- sbp
- sch
- sck
- sda
- sea
- seh
- ses
- sey
- sgb
- sgj
- sgw
- shi
- shk
- shn
- sho
- shp
- sid
- sig
- sil
- sja
- sjm
- sld
- slu
- sml
- smo
- sna
- sne
- snn
- snp
- snw
- som
- soy
- spa
- spp
- spy
- sqi
- sri
- srm
- srn
- srx
- stn
- stp
- suc
- suk
- sun
- sur
- sus
- suv
- suz
- swe
- swh
- sxb
- sxn
- sya
- syl
- sza
- tac
- taj
- tam
- tao
- tap
- taq
- tat
- tav
- tbc
- tbg
- tbk
- tbl
- tby
- tbz
- tca
- tcc
- tcs
- tcz
- tdj
- ted
- tee
- tel
- tem
- teo
- ter
- tes
- tew
- tex
- tfr
- tgj
- tgk
- tgl
- tgo
- tgp
- tha
- thk
- thl
- tih
- tik
- tir
- tkr
- tlb
- tlj
- tly
- tmc
- tmf
- tna
- tng
- tnk
- tnn
- tnp
- tnr
- tnt
- tob
- toc
- toh
- tom
- tos
- tpi
- tpm
- tpp
- tpt
- trc
- tri
- trn
- trs
- tso
- tsz
- ttc
- tte
- ttq - script_tifinagh
- tue
- tuf
- tuk - script_arabic
- tuk - script_latin
- tuo
- tur
- tvw
- twb
- twe
- twu
- txa
- txq
- txu
- tye
- tzh - dialect_bachajon
- tzh - dialect_tenejapa
- tzj - dialect_eastern
- tzj - dialect_western
- tzo - dialect_chamula
- tzo - dialect_chenalho
- ubl
- ubu
- udm
- udu
- uig - script_arabic
- uig - script_cyrillic
- ukr
- unr
- upv
- ura
- urb
- urd - script_arabic
- urd - script_devanagari
- urd - script_latin
- urk
- urt
- ury
- usp
- uzb - script_cyrillic
- vag
- vid
- vie
- vif
- vmw
- vmy
- vun
- vut
- wal - script_ethiopic
- wal - script_latin
- wap
- war
- waw
- way
- wba
- wlo
- wlx
- wmw
- wob
- wsg
- wwa
- xal
- xdy
- xed
- xer
- xmm
- xnj
- xnr
- xog
- xon
- xrb
- xsb
- xsm
- xsr
- xsu
- xta
- xtd
- xte
- xtm
- xtn
- xua
- xuo
- yaa
- yad
- yal
- yam
- yao
- yas
- yat
- yaz
- yba
- ybb
- ycl
- ycn
- yea
- yka
- yli
- yor
- yre
- yua
- yuz
- yva
- zaa
- zab
- zac
- zad
- zae
- zai
- zam
- zao
- zaq
- zar
- zas
- zav
- zaw
- zca
- zga
- zim
- ziw
- zlm
- zmz
- zne
- zos
- zpc
- zpg
- zpi
- zpl
- zpm
- zpo
- zpt
- zpu
Model Details
This checkpoint is based on the Wav2Vec2 architecture and makes use of adapter models to transcribe 1000+ languages. The checkpoint consists of 1 billion parameters and has been fine - tuned from [facebook/mms - 1b](https://huggingface.co/facebook/mms - 1b) on 1107 languages.
Additional Links
- [Massive Multilingual Speech project](https://research.facebook.com/publications/scaling - speech - technology - to - 1000 - languages/)
- Wav2Vec2 architecture
- Official docs
📄 License
This project is licensed under the cc - by - nc - 4.0
license.

