模型简介
模型特点
模型能力
使用案例
🚀 大规模多语言语音(MMS) - 微调后的自动语音识别 - L1107
本检查点是一个针对多语言自动语音识别(ASR)进行微调的模型,是Facebook 大规模多语言语音项目 的一部分。该检查点基于 Wav2Vec2架构,并利用适配器模型来转录1000多种语言。该检查点包含 10亿个参数,是在1107种语言上对 facebook/mms-1b 进行微调得到的。
🚀 快速开始
这个MMS检查点可以与 Transformers 一起使用,来转录1107种不同语言的音频。下面来看一个简单的示例。
首先,我们安装 transformers
和其他一些库:
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
⚠️ 重要提示
为了使用MMS,你需要至少安装
transformers >= 4.30
。如果4.30
版本还未在 PyPI 上发布,请确保从源代码安装transformers
:
pip install git+https://github.com/huggingface/transformers.git
接下来,我们通过 datasets
加载一些音频样本。确保音频数据的采样率为16000 kHz。
from datasets import load_dataset, Audio
# 英语
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]
# 法语
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]
然后,我们加载模型和处理器:
from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch
model_id = "facebook/mms-1b-l1107"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
现在,我们处理音频数据,将处理后的音频数据传入模型,并对模型输出进行转录,就像我们通常对 facebook/wav2vec2-base-960h 等Wav2Vec2模型所做的那样:
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# 'joe keton disapproved of films and buster also had reservations about the media'
我们现在可以将同一个模型保留在内存中,并通过为模型调用方便的 load_adapter()
函数和为分词器调用 set_target_lang()
函数来简单地切换语言适配器。我们将目标语言作为输入传入 - 对于法语是 "fra"。
processor.tokenizer.set_target_lang("fra")
model.load_adapter("fra")
inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# "ce dernier est volé tout au long de l'histoire romaine"
以同样的方式,可以为所有其他支持的语言切换语言。请查看:
processor.tokenizer.vocab.keys()
有关更多详细信息,请查看 官方文档。
✨ 主要特性
- 多语言支持:支持1107种语言的自动语音识别。
- 基于Wav2Vec2架构:利用适配器模型实现多语言转录。
- 大规模参数:模型包含10亿个参数,经过在1107种语言上的微调。
💻 使用示例
基础用法
# 首先,我们安装必要的库
pip install torch accelerate torchaudio datasets
pip install --upgrade transformers
# 加载数据集
from datasets import load_dataset, Audio
# 英语
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "en", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
en_sample = next(iter(stream_data))["audio"]["array"]
# 加载模型和处理器
from transformers import Wav2Vec2ForCTC, AutoProcessor
import torch
model_id = "facebook/mms-1b-l1107"
processor = AutoProcessor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)
# 处理音频数据并转录
inputs = processor(en_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# 'joe keton disapproved of films and buster also had reservations about the media'
高级用法
# 切换语言适配器以处理不同语言的音频
processor.tokenizer.set_target_lang("fra")
model.load_adapter("fra")
# 加载法语音频样本
stream_data = load_dataset("mozilla-foundation/common_voice_13_0", "fr", split="test", streaming=True)
stream_data = stream_data.cast_column("audio", Audio(sampling_rate=16000))
fr_sample = next(iter(stream_data))["audio"]["array"]
# 处理法语音频数据并转录
inputs = processor(fr_sample, sampling_rate=16_000, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs).logits
ids = torch.argmax(outputs, dim=-1)[0]
transcription = processor.decode(ids)
# "ce dernier est volé tout au long de l'histoire romaine"
📚 详细文档
你可以在以下链接找到更多关于该模型的详细文档:
🔧 技术细节
- 模型架构:基于 Wav2Vec2架构,使用适配器模型实现多语言转录。
- 训练数据:该模型在1107种语言的数据集上进行了微调,部分数据来自 google/fleurs。
- 评估指标:使用字错误率(WER)作为评估指标。
📄 许可证
该模型遵循CC-BY-NC 4.0许可证。
📦 支持的语言
此模型支持1107种语言。点击下面的展开按钮查看此检查点支持的所有语言的 ISO 639-3代码。你可以在 MMS语言覆盖概述 中找到有关这些语言及其ISO 649-3代码的更多详细信息。
点击展开
- abi
- abp
- aca
- acd
- ace
- acf
- ach
- acn
- acr
- acu
- ade
- adh
- adj
- adx
- aeu
- agd
- agg
- agn
- agr
- agu
- agx
- aha
- ahk
- aia
- aka
- akb
- ake
- akp
- alj
- alp
- alt
- alz
- ame
- amf
- amh
- ami
- amk
- ann
- any
- aoz
- apb
- apr
- ara
- arl
- asa
- asg
- asm
- ata
- atb
- atg
- ati
- atq
- ava
- avn
- avu
- awa
- awb
- ayo
- ayr
- ayz
- azb
- azg
- azj-script_cyrillic
- azj-script_latin
- azz
- bak
- bam
- ban
- bao
- bav
- bba
- bbb
- bbc
- bbo
- bcc-script_arabic
- bcc-script_latin
- bcl
- bcw
- bdg
- bdh
- bdq
- bdu
- bdv
- beh
- bem
- ben
- bep
- bex
- bfa
- bfo
- bfy
- bfz
- bgc
- bgq
- bgr
- bgt
- bgw
- bha
- bht
- bhz
- bib
- bim
- bis
- biv
- bjr
- bjv
- bjw
- bjz
- bkd
- bkv
- blh
- blt
- blx
- blz
- bmq
- bmr
- bmu
- bmv
- bng
- bno
- bnp
- boa
- bod
- boj
- bom
- bor
- bov
- box
- bpr
- bps
- bqc
- bqi
- bqj
- bqp
- bru
- bsc
- bsq
- bss
- btd
- bts
- btt
- btx
- bud
- bul
- bus
- bvc
- bvz
- bwq
- bwu
- byr
- bzh
- bzi
- bzj
- caa
- cab
- cac-dialect_sanmateoixtatan
- cac-dialect_sansebastiancoatan
- cak-dialect_central
- cak-dialect_santamariadejesus
- cak-dialect_santodomingoxenacoj
- cak-dialect_southcentral
- cak-dialect_western
- cak-dialect_yepocapa
- cap
- car
- cas
- cat
- cax
- cbc
- cbi
- cbr
- cbs
- cbt
- cbu
- cbv
- cce
- cco
- cdj
- ceb
- ceg
- cek
- cfm
- cgc
- chf
- chv
- chz
- cjo
- cjp
- cjs
- cko
- ckt
- cla
- cle
- cly
- cme
- cmo-script_khmer
- cmo-script_latin
- cmr
- cnh
- cni
- cnl
- cnt
- coe
- cof
- cok
- con
- cot
- cou
- cpa
- cpb
- cpu
- crh
- crk-script_latin
- crk-script_syllabics
- crn
- crq
- crs
- crt
- csk
- cso
- ctd
- ctg
- cto
- ctu
- cuc
- cui
- cuk
- cul
- cwa
- cwe
- cwt
- cya
- cym
- daa
- dah
- dar
- dbj
- dbq
- ddn
- ded
- des
- deu
- dga
- dgi
- dgk
- dgo
- dgr
- dhi
- did
- dig
- dik
- dip
- div
- djk
- dnj-dialect_blowowest
- dnj-dialect_gweetaawueast
- dnt
- dnw
- dop
- dos
- dsh
- dso
- dtp
- dts
- dug
- dwr
- dyi
- dyo
- dyu
- dzo
- eip
- eka
- ell
- emp
- enb
- eng
- enx
- ese
- ess
- eus
- evn
- ewe
- eza
- fal
- fao
- far
- fas
- fij
- fin
- flr
- fmu
- fon
- fra
- frd
- ful
- gag-script_cyrillic
- gag-script_latin
- gai
- gam
- gau
- gbi
- gbk
- gbm
- gbo
- gde
- geb
- gej
- gil
- gjn
- gkn
- gld
- glk
- gmv
- gna
- gnd
- gng
- gof-script_latin
- gog
- gor
- gqr
- grc
- gri
- grn
- grt
- gso
- gub
- guc
- gud
- guh
- guj
- guk
- gum
- guo
- guq
- guu
- gux
- gvc
- gvl
- gwi
- gwr
- gym
- gyr
- had
- hag
- hak
- hap
- hat
- hau
- hay
- heb
- heh
- hif
- hig
- hil
- hin
- hlb
- hlt
- hne
- hnn
- hns
- hoc
- hoy
- hto
- hub
- hui
- hun
- hus-dialect_centralveracruz
- hus-dialect_westernpotosino
- huu
- huv
- hvn
- hwc
- hyw
- iba
- icr
- idd
- ifa
- ifb
- ife
- ifk
- ifu
- ify
- ign
- ikk
- ilb
- ilo
- imo
- inb
- ind
- iou
- ipi
- iqw
- iri
- irk
- isl
- itl
- itv
- ixl-dialect_sangasparchajul
- ixl-dialect_sanjuancotzal
- ixl-dialect_santamarianebaj
- izr
- izz
- jac
- jam
- jav
- jbu
- jen
- jic
- jiv
- jmc
- jmd
- jun
- juy
- jvn
- kaa
- kab
- kac
- kak
- kan
- kao
- kaq
- kay
- kaz
- kbo
- kbp
- kbq
- kbr
- kby
- kca
- kcg
- kdc
- kde
- kdh
- kdi
- kdj
- kdl
- kdn
- kdt
- kek
- ken
- keo
- ker
- key
- kez
- kfb
- kff-script_telugu
- kfw
- kfx
- khg
- khm
- khq
- kia
- kij
- kik
- kin
- kir
- kjb
- kje
- kjg
- kjh
- kki
- kkj
- kle
- klu
- klv
- klw
- kma
- kmd
- kml
- kmr-script_arabic
- kmr-script_cyrillic
- kmr-script_latin
- kmu
- knb
- kne
- knf
- knj
- knk
- kno
- kog
- kor
- kpq
- kps
- kpv
- kpy
- kpz
- kqe
- kqp
- kqr
- kqy
- krc
- kri
- krj
- krl
- krr
- krs
- kru
- ksb
- ksr
- kss
- ktb
- ktj
- kub
- kue
- kum
- kus
- kvn
- kvw
- kwd
- kwf
- kwi
- kxc
- kxf
- kxm
- kxv
- kyb
- kyc
- kyf
- kyg
- kyo
- kyq
- kyu
- kyz
- kzf
- lac
- laj
- lam
- lao
- las
- lat
- lav
- law
- lbj
- lbw
- lcp
- lee
- lef
- lem
- lew
- lex
- lgg
- lgl
- lhu
- lia
- lid
- lif
- lip
- lis
- lje
- ljp
- llg
- lln
- lme
- lnd
- lns
- lob
- lok
- lom
- lon
- loq
- lsi
- lsm
- luc
- lug
- lwo
- lww
- lzz
- maa-dialect_sanantonio
- maa-dialect_sanjeronimo
- mad
- mag
- mah
- mai
- maj
- mak
- mal
- mam-dialect_central
- mam-dialect_northern
- mam-dialect_southern
- mam-dialect_western
- maq
- mar
- maw
- maz
- mbb
- mbc
- mbh
- mbj
- mbt
- mbu
- mbz
- mca
- mcb
- mcd
- mco
- mcp
- mcq
- mcu
- mda
- mdv
- mdy
- med
- mee
- mej
- men
- meq
- met
- mev
- mfe
- mfh
- mfi
- mfk
- mfq
- mfy
- mfz
- mgd
- mge
- mgh
- mgo
- mhi
- mhr
- mhu
- mhx
- mhy
- mib
- mie
- mif
- mih
- mil
- mim
- min
- mio
- mip
- miq
- mit
- miy
- miz
- mjl
- mjv
- mkl
- mkn
- mlg
- mmg
- mnb
- mnf
- mnk
- mnw
- mnx
- moa
- mog
- mon
- mop
- mor
- mos
- mox
- moz
- mpg
- mpm
- mpp
- mpx
- mqb
- mqf
- mqj
- mqn
- mrw
- msy
- mtd
- mtj
- mto
- muh
- mup
- mur
- muv
- muy
- mvp
- mwq
- mwv
- mxb
- mxq
- mxt
- mxv
- mya
- myb
- myk
- myl
- myv
- myx
- myy
- mza
- mzi
- mzj
- mzk
- mzm
- mzw
- nab
- nag
- nan
- nas
- naw
- nca
- nch
- ncj
- ncl
- ncu
- ndj
- ndp
- ndv
- ndy
- ndz
- neb
- new
- nfa
- nfr
- nga
- ngl
- ngp
- ngu
- nhe
- nhi
- nhu
- nhw
- nhx
- nhy
- nia
- nij
- nim
- nin
- nko
- nlc
- nld
- nlg
- nlk
- nmz
- nnb
- nnq
- nnw
- noa
- nod
- nog
- not
- npl
- npy
- nst
- nsu
- ntm
- ntr
- nuj
- nus
- nuz
- nwb
- nxq
- nya
- nyf
- nyn
- nyo
- nyy
- nzi
- obo
- ojb-script_latin
- ojb-script_syllabics
- oku
- old
- omw
- onb
- ood
- orm
- ory
- oss
- ote
- otq
- ozm
- pab
- pad
- pag
- pam
- pan
- pao
- pap
- pau
- pbb
- pbc
- pbi
- pce
- pcm
- peg
- pez
- pib
- pil
- pir
- pis
- pjt
- pkb
- pls
- plw
- pmf
- pny
- poh-dialect_eastern
- poh-dialect_western
- poi
- pol
- por
- poy
- ppk
- pps
- prf
- prk
- prt
- pse
- pss
- ptu
- pui
- pwg
- pww
- pxm
- qub
- quc-dialect_central
- quc-dialect_east
- quf
- quh
- qul
- quw
- quy
- quz
- qvc
- qve
- qvh
- qvm
- qvn
- qvo
- qvs
- qvw
- qvz
- qwh
- qxh
- qxl
- qxn
- qxo
- qxr
- rah
- rai
- rap
- rav
- raw
- rej
- rel
- rgu
- rhg
- rif-script_arabic
- rif-script_latin
- ril
- rim
- rjs
- rkt
- rmc-script_cyrillic
- rmc-script_latin
- rmo
- rmy-script_cyrillic
- rmy-script_latin
- rng
- rnl
- rol
- ron
- rop
- rro
- rub
- ruf
- rug
- run
- rus
- sab
- sag
- sah
- saj
- saq
- sas
- sba
- sbd
- sbl
- sbp
- sch
- sck
- sda
- sea
- seh
- ses
- sey
- sgb
- sgj
- sgw
- shi
- shk
- shn
- sho
- shp
- sid
- sig
- sil
- sja
- sjm
- sld
- slu
- sml
- smo
- sna
- sne
- snn
- snp
- snw
- som
- soy
- spa
- spp
- spy
- sqi
- sri
- srm
- srn
- srx
- stn
- stp
- suc
- suk
- sun
- sur
- sus
- suv
- suz
- swe
- swh
- sxb
- sxn
- sya
- syl
- sza
- tac
- taj
- tam
- tao
- tap
- taq
- tat
- tav
- tbc
- tbg
- tbk
- tbl
- tby
- tbz
- tca
- tcc
- tcs
- tcz
- tdj
- ted
- tee
- tel
- tem
- teo
- ter
- tes
- tew
- tex
- tfr
- tgj
- tgk
- tgl
- tgo
- tgp
- tha
- thk
- thl
- tih
- tik
- tir
- tkr
- tlb
- tlj
- tly
- tmc
- tmf
- tna
- tng
- tnk
- tnn
- tnp
- tnr
- tnt
- tob
- toc
- toh
- tom
- tos
- tpi
- tpm
- tpp
- tpt
- trc
- tri
- trn
- trs
- tso
- tsz
- ttc
- tte
- ttq-script_tifinagh
- tue
- tuf
- tuk-script_arabic
- tuk-script_latin
- tuo
- tur
- tvw
- twb
- twe
- twu
- txa
- txq
- txu
- tye
- tzh-dialect_bachajon
- tzh-dialect_tenejapa
- tzj-dialect_eastern
- tzj-dialect_western
- tzo-dialect_chamula
- tzo-dialect_chenalho
- ubl
- ubu
- udm
- udu
- uig-script_arabic
- uig-script_cyrillic
- ukr
- unr
- upv
- ura
- urb
- urd-script_arabic
- urd-script_devanagari
- urd-script_latin
- urk
- urt
- ury
- usp
- uzb-script_cyrillic
- vag
- vid
- vie
- vif
- vmw
- vmy
- vun
- vut
- wal-script_ethiopic
- wal-script_latin
- wap
- war
- waw
- way
- wba
- wlo
- wlx
- wmw
- wob
- wsg
- wwa
- xal
- xdy
- xed
- xer
- xmm
- xnj
- xnr
- xog
- xon
- xrb
- xsb
- xsm
- xsr
- xsu
- xta
- xtd
- xte
- xtm
- xtn
- xua
- xuo
- yaa
- yad
- yal
- yam
- yao
- yas
- yat
- yaz
- yba
- ybb
- ycl
- ycn
- yea
- yka
- yli
- yor
- yre
- yua
- yuz
- yva
- zaa
- zab
- zac
- zad
- zae
- zai
- zam
- zao
- zaq
- zar
- zas
- zav
- zaw
- zca
- zga
- zim
- ziw
- zlm
- zmz
- zne
- zos
- zpc
- zpg
- zpi
- zpl
- zpm
- zpo
- zpt
- zpu
- zpz
- ztq
- zty
- zyb
- zyp
- zza
📋 模型详情
属性 | 详情 |
---|---|
开发者 | Vineel Pratap等人 |
模型类型 | 多语言自动语音识别模型 |
支持语言 | 1000多种语言,见 支持的语言 |
许可证 | CC-BY-NC 4.0许可证 |
参数数量 | 10亿 |
音频采样率 | 16,000 kHz |
引用格式 | bibtex<br>@article{pratap2023mms,<br> title={Scaling Speech Technology to 1,000+ Languages},<br> author={Vineel Pratap and Andros Tjandra and Bowen Shi and Paden Tomasello and Arun Babu and Sayani Kundu and Ali Elkahky and Zhaoheng Ni and Apoorv Vyas and Maryam Fazel-Zarandi and Alexei Baevski and Yossi Adi and Xiaohui Zhang and Wei-Ning Hsu and Alexis Conneau and Michael Auli},<br> journal={arXiv},<br> year={2023}<br>}<br> |
🔗 更多链接



