Zabanshenas Roberta Base Mix
Zabanshenas is a Transformer-based solution for identifying the most probable language of written documents/text.
Downloads 23
Release Time : 3/2/2022
Model Overview
Zabanshenas is a Persian word with two meanings: someone who studies linguistics and a method for identifying the type of written language. This model supports detection for over 200 languages.
Model Features
Multilingual Support
Supports detection for over 200 languages, including many niche and low-resource languages.
High Accuracy
Achieves F1 scores above 90% for most languages.
Transformer-based
Utilizes advanced Transformer architecture to provide robust language identification capabilities.
Model Capabilities
Text Language Detection
Multilingual Recognition
Low-resource Language Support
Use Cases
Content Management
Multilingual Content Classification
Automatically identifies the language of user-generated content for classification and routing.
Accurately recognizes 200+ languages.
Localization Services
Automatic Language Detection
Provides language detection for input text in translation services.
High accuracy supports translation service workflows.
đ Zabanshenas - Language Detector
Zabanshenas is a Transformer-based solution designed to identify the most likely language of a written document or text. The term "Zabanshenas" is Persian, with two meanings:
- A person who studies linguistics.
- A method to identify the type of written language.
đ Quick Start
Follow Zabanshenas repo for more information!
⨠Features
Zabanshenas can handle a wide range of languages, including but not limited to:
multilingual
,ace
,afr
,als
,amh
,ang
,ara
,arg
,arz
,asm
,ast
,ava
,aym
,azb
,aze
,bak
,bar
,bcl
,bel
,ben
,bho
,bjn
,bod
,bos
,bpy
,bre
,bul
,bxr
,cat
,cbk
,cdo
,ceb
,ces
,che
,chr
,chv
,ckb
,cor
,cos
,crh
,csb
,cym
,dan
,deu
,diq
,div
,dsb
,dty
,egl
,ell
,eng
,epo
,est
,eus
,ext
,fao
,fas
,fin
,fra
,frp
,fry
,fur
,gag
,gla
,gle
,glg
,glk
,glv
,grn
,guj
,hak
,hat
,hau
,hbs
,heb
,hif
,hin
,hrv
,hsb
,hun
,hye
,ibo
,ido
,ile
,ilo
,ina
,ind
,isl
,ita
,jam
,jav
,jbo
,jpn
,kaa
,kab
,kan
,kat
,kaz
,kbd
,khm
,kin
,kir
,koi
,kok
,kom
,kor
,krc
,ksh
,kur
,lad
,lao
,lat
,lav
,lez
,lij
,lim
,lin
,lit
,lmo
,lrc
,ltg
,ltz
,lug
,lzh
,mai
,mal
,mar
,mdf
,mhr
,min
,mkd
,mlg
,mlt
,nan
,mon
,mri
,mrj
,msa
,mwl
,mya
,myv
,mzn
,nap
,nav
,nci
,nds
,nep
,new
,nld
,nno
,nob
,nrm
,nso
,oci
,olo
,ori
,orm
,oss
,pag
,pam
,pan
,pap
,pcd
,pdc
,pfl
,pnb
,pol
,por
,pus
,que
,roh
,ron
,rue
,rup
,rus
,sah
,san
,scn
,sco
,sgs
,sin
,slk
,slv
,sme
,sna
,snd
,som
,spa
,sqi
,srd
,srn
,srp
,stq
,sun
,swa
,swe
,szl
,tam
,tat
,tcy
,tel
,tet
,tgk
,tgl
,tha
,ton
,tsn
,tuk
,tur
,tyv
,udm
,uig
,ukr
,urd
,uzb
,vec
,vep
,vie
,vls
,vol
,vro
,war
,wln
,wol
,wuu
,xho
,xmf
,yid
,yor
,zea
,zho
It also supports languages in BCP 47 format:
be-tarask
,map-bms
,nds-nl
,roa-tara
,zh-yue
đ Documentation
Evaluation
The following tables summarize the scores obtained by the model overall and per each class.
By Paragraph
Language | Precision | Recall | F1-Score |
---|---|---|---|
Achinese (ace) | 1.000000 | 0.982143 | 0.990991 |
Afrikaans (afr) | 1.000000 | 1.000000 | 1.000000 |
Alemannic German (als) | 1.000000 | 0.946429 | 0.972477 |
Amharic (amh) | 1.000000 | 0.982143 | 0.990991 |
Old English (ang) | 0.981818 | 0.964286 | 0.972973 |
Arabic (ara) | 0.846154 | 0.982143 | 0.909091 |
Aragonese (arg) | 1.000000 | 1.000000 | 1.000000 |
Egyptian Arabic (arz) | 0.979592 | 0.857143 | 0.914286 |
Assamese (asm) | 0.981818 | 0.964286 | 0.972973 |
Asturian (ast) | 0.964912 | 0.982143 | 0.973451 |
Avar (ava) | 0.941176 | 0.905660 | 0.923077 |
Aymara (aym) | 0.964912 | 0.982143 | 0.973451 |
South Azerbaijani (azb) | 0.965517 | 1.000000 | 0.982456 |
Azerbaijani (aze) | 1.000000 | 1.000000 | 1.000000 |
Bashkir (bak) | 1.000000 | 0.978261 | 0.989011 |
Bavarian (bar) | 0.843750 | 0.964286 | 0.900000 |
Central Bikol (bcl) | 1.000000 | 0.982143 | 0.990991 |
Belarusian (Taraschkewiza) (be - tarask) | 1.000000 | 0.875000 | 0.933333 |
Belarusian (bel) | 0.870968 | 0.964286 | 0.915254 |
Bengali (ben) | 0.982143 | 0.982143 | 0.982143 |
Bhojpuri (bho) | 1.000000 | 0.928571 | 0.962963 |
Banjar (bjn) | 0.981132 | 0.945455 | 0.962963 |
Tibetan (bod) | 1.000000 | 0.982143 | 0.990991 |
Bosnian (bos) | 0.552632 | 0.375000 | 0.446809 |
Bishnupriya (bpy) | 1.000000 | 0.982143 | 0.990991 |
Breton (bre) | 1.000000 | 0.964286 | 0.981818 |
Bulgarian (bul) | 1.000000 | 0.964286 | 0.981818 |
Buryat (bxr) | 0.946429 | 0.946429 | 0.946429 |
Catalan (cat) | 0.982143 | 0.982143 | 0.982143 |
Chavacano (cbk) | 0.914894 | 0.767857 | 0.834951 |
Min Dong (cdo) | 1.000000 | 0.982143 | 0.990991 |
Cebuano (ceb) | 1.000000 | 1.000000 | 1.000000 |
Czech (ces) | 1.000000 | 1.000000 | 1.000000 |
Chechen (che) | 1.000000 | 1.000000 | 1.000000 |
Cherokee (chr) | 1.000000 | 0.963636 | 0.981481 |
Chuvash (chv) | 0.938776 | 0.958333 | 0.948454 |
Central Kurdish (ckb) | 1.000000 | 1.000000 | 1.000000 |
Cornish (cor) | 1.000000 | 1.000000 | 1.000000 |
Corsican (cos) | 1.000000 | 0.982143 | 0.990991 |
Crimean Tatar (crh) | 1.000000 | 0.946429 | 0.972477 |
Kashubian (csb) | 1.000000 | 0.963636 | 0.981481 |
Welsh (cym) | 1.000000 | 1.000000 | 1.000000 |
Danish (dan) | 1.000000 | 1.000000 | 1.000000 |
German (deu) | 0.828125 | 0.946429 | 0.883333 |
Dimli (diq) | 0.964912 | 0.982143 | 0.973451 |
Dhivehi (div) | 1.000000 | 1.000000 | 1.000000 |
Lower Sorbian (dsb) | 1.000000 | 0.982143 | 0.990991 |
Doteli (dty) | 0.940000 | 0.854545 | 0.895238 |
Emilian (egl) | 1.000000 | 0.928571 | 0.962963 |
Modern Greek (ell) | 1.000000 | 1.000000 | 1.000000 |
English (eng) | 0.588889 | 0.946429 | 0.726027 |
Esperanto (epo) | 1.000000 | 0.982143 | 0.990991 |
Estonian (est) | 0.963636 | 0.946429 | 0.954955 |
Basque (eus) | 1.000000 | 0.982143 | 0.990991 |
Extremaduran (ext) | 0.982143 | 0.982143 | 0.982143 |
Faroese (fao) | 1.000000 | 1.000000 | 1.000000 |
Persian (fas) | 0.948276 | 0.982143 | 0.964912 |
Finnish (fin) | 1.000000 | 1.000000 | 1.000000 |
French (fra) | 0.710145 | 0.875000 | 0.784000 |
Arpitan (frp) | 1.000000 | 0.946429 | 0.972477 |
Western Frisian (fry) | 0.982143 | 0.982143 | 0.982143 |
Friulian (fur) | 1.000000 | 0.982143 | 0.990991 |
Gagauz (gag) | 0.981132 | 0.945455 | 0.962963 |
Scottish Gaelic (gla) | 0.982143 | 0.982143 | 0.982143 |
Irish (gle) | 0.949153 | 1.000000 | 0.973913 |
Galician (glg) | 1.000000 | 1.000000 | 1.000000 |
Gilaki (glk) | 0.981132 | 0.945455 | 0.962963 |
Manx (glv) | 1.000000 | 1.000000 | 1.000000 |
Guarani (grn) | 1.000000 | 0.964286 | 0.981818 |
Gujarati (guj) | 1.000000 | 0.982143 | 0.990991 |
Hakka Chinese (hak) | 0.981818 | 0.964286 | 0.972973 |
Haitian Creole (hat) | 1.000000 | 1.000000 | 1.000000 |
Hausa (hau) | 1.000000 | 0.945455 | 0.971963 |
Serbo - Croatian (hbs) | 0.448276 | 0.464286 | 0.456140 |
Hebrew (heb) | 1.000000 | 0.982143 | 0.990991 |
Fiji Hindi (hif) | 0.890909 | 0.890909 | 0.890909 |
Hindi (hin) | 0.981481 | 0.946429 | 0.963636 |
Croatian (hrv) | 0.500000 | 0.636364 | 0.560000 |
Upper Sorbian (hsb) | 0.955556 | 1.000000 | 0.977273 |
Hungarian (hun) | 1.000000 | 1.000000 | 1.000000 |
Armenian (hye) | 1.000000 | 0.981818 | 0.990826 |
Igbo (ibo) | 0.918033 | 1.000000 | 0.957265 |
Ido (ido) | 1.000000 | 1.000000 | 1.000000 |
Interlingue (ile) | 1.000000 | 0.962264 | 0.980769 |
Iloko (ilo) | 0.947368 | 0.964286 | 0.955752 |
Interlingua (ina) | 1.000000 | 1.000000 | 1.000000 |
Indonesian (ind) | 0.761905 | 0.872727 | 0.813559 |
Icelandic (isl) | 1.000000 | 1.000000 | 1.000000 |
Italian (ita) | 0.861538 | 1.000000 | 0.925620 |
Jamaican Patois (jam) | 1.000000 | 0.946429 | 0.972477 |
Javanese (jav) | 0.964912 | 0.982143 | 0.973451 |
Lojban (jbo) | 1.000000 | 1.000000 | 1.000000 |
Japanese (jpn) | 1.000000 | 1.000000 | 1.000000 |
Karakalpak (kaa) | 0.965517 | 1.000000 | 0.982456 |
Kabyle (kab) | 1.000000 | 0.964286 | 0.981818 |
Kannada (kan) | 0.982143 | 0.982143 | 0.982143 |
Georgian (kat) | 1.000000 | 0.964286 | 0.981818 |
Kazakh (kaz) | 0.980769 | 0.980769 | 0.980769 |
Kabardian (kbd) | 1.000000 | 0.982143 | 0.990991 |
Central Khmer (khm) | 0.960784 | 0.875000 | 0.915888 |
Kinyarwanda (kin) | 0.981132 | 0.928571 | 0.954128 |
Kirghiz (kir) | 1.000000 | 1.000000 | 1.000000 |
Komi - Permyak (koi) | 0.962264 | 0.910714 | 0.935780 |
Konkani (kok) | 0.964286 | 0.981818 | 0.972973 |
Komi (kom) | 1.000000 | 0.962264 | 0.980769 |
Korean (kor) | 1.000000 | 1.000000 | 1.000000 |
Karachay - Balkar (krc) | 1.000000 | 0.982143 | 0.990991 |
Ripuarisch (ksh) | 1.000000 | 0.964286 | 0.981818 |
Kurdish (kur) | 1.000000 | 0.964286 | 0.981818 |
Ladino (lad) | 1.000000 | 1.000000 | 1.000000 |
Lao (lao) | 0.961538 | 0.909091 | 0.934579 |
Latin (lat) | 0.877193 | 0.943396 | 0.909091 |
Latvian (lav) | 0.963636 | 0.946429 | 0.954955 |
Lezghian (lez) | 1.000000 | 0.964286 | 0.981818 |
Ligurian (lij) | 1.000000 | 0.964286 | 0.981818 |
Limburgan (lim) | 0.938776 | 1.000000 | 0.968421 |
Lingala (lin) | 0.980769 | 0.927273 | 0.953271 |
Lithuanian (lit) | 0.982456 | 1.000000 | 0.991150 |
Lombard (lmo) | 1.000000 | 1.000000 | 1.000000 |
Northern Luri (lrc) | 1.000000 | 0.928571 | 0.962963 |
Latgalian (ltg) | 1.000000 | 0.982143 | 0.990991 |
Luxembourgish (ltz) | 0.949153 | 1.000000 | 0.973913 |
Luganda (lug) | 1.000000 | 1.000000 | 1.000000 |
Literary Chinese (lzh) | 1.000000 | 1.000000 | 1.000000 |
Maithili (mai) | 0.931034 | 0.964286 | 0.947368 |
Malayalam (mal) | 1.000000 | 0.982143 | 0.990991 |
Banyumasan (map - bms) | 0.977778 | 0.785714 | 0.871287 |
Marathi (mar) | 0.949153 | 1.000000 | 0.973913 |
Moksha (mdf) | 0.980000 | 0.890909 | 0.933333 |
Eastern Mari (mhr) | 0.981818 | 0.964286 | 0.972973 |
Minangkabau (min) | 1.000000 | 1.000000 | 1.000000 |
Macedonian (mkd) | 1.000000 | 0.981818 | 0.990826 |
Malagasy (mlg) | 0.981132 | 1.000000 | 0.990476 |
Maltese (mlt) | 0.982456 | 1.000000 | 0.991150 |
Min Nan Chinese (nan) | 1.000000 | 1.000000 | 1.000000 |
Mongolian (mon) | 1.000000 | 0.981818 | 0.990826 |
Maori (mri) | 1.000000 | 1.000000 | 1.000000 |
Western Mari (mrj) | 0.982456 | 1.000000 | 0.991150 |
Malay (msa) | 0.862069 | 0.892857 | 0.877193 |
Mirandese (mwl) | 1.000000 | 0.982143 | 0.990991 |
Burmese... (The table seems incomplete here, but we keep it as is) | ... | ... | ... |
đ License
This project is licensed under the apache - 2.0
license.
Distilbert Base Uncased Finetuned Sst 2 English
Apache-2.0
Text classification model fine-tuned on the SST-2 sentiment analysis dataset based on DistilBERT-base-uncased, with 91.3% accuracy
Text Classification English
D
distilbert
5.2M
746
Xlm Roberta Base Language Detection
MIT
Multilingual detection model based on XLM-RoBERTa, supporting text classification in 20 languages
Text Classification
Transformers Supports Multiple Languages

X
papluca
2.7M
333
Roberta Hate Speech Dynabench R4 Target
This model improves online hate detection through dynamic dataset generation, focusing on learning from worst-case scenarios to enhance detection effectiveness.
Text Classification
Transformers English

R
facebook
2.0M
80
Bert Base Multilingual Uncased Sentiment
MIT
A multilingual sentiment analysis model fine-tuned based on bert-base-multilingual-uncased, supporting sentiment analysis of product reviews in 6 languages
Text Classification Supports Multiple Languages
B
nlptown
1.8M
371
Emotion English Distilroberta Base
A fine-tuned English text emotion classification model based on DistilRoBERTa-base, capable of predicting Ekman's six basic emotions and neutral category.
Text Classification
Transformers English

E
j-hartmann
1.1M
402
Robertuito Sentiment Analysis
Spanish tweet sentiment analysis model based on RoBERTuito, supporting POS(positive)/NEG(negative)/NEU(neutral) three-class sentiment classification
Text Classification Spanish
R
pysentimiento
1.0M
88
Finbert Tone
FinBERT is a BERT model pre-trained on financial communication texts, specializing in the field of financial natural language processing. finbert-tone is its fine-tuned version for financial sentiment analysis tasks.
Text Classification
Transformers English

F
yiyanghkust
998.46k
178
Roberta Base Go Emotions
MIT
A multi-label sentiment classification model based on RoBERTa-base, trained on the go_emotions dataset, supporting recognition of 28 emotion labels.
Text Classification
Transformers English

R
SamLowe
848.12k
565
Xlm Emo T
XLM-EMO is a multilingual sentiment analysis model fine-tuned based on the XLM-T model, supporting 19 languages and specifically designed for sentiment prediction in social media texts.
Text Classification
Transformers Other

X
MilaNLProc
692.30k
7
Deberta V3 Base Mnli Fever Anli
MIT
DeBERTa-v3 model trained on MultiNLI, Fever-NLI, and ANLI datasets, excelling in zero-shot classification and natural language inference tasks
Text Classification
Transformers English

D
MoritzLaurer
613.93k
204
Featured Recommended AI Models
Š 2025AIbase