X

XLMR BERTovski

Developed by MaCoCu
A language model pretrained on large-scale Bulgarian and Macedonian texts, part of the MaCoCu project
Downloads 36
Release Time : 8/11/2022

Model Overview

XLMR-BERTovski is a Bulgarian and Macedonian language model based on continued training of XLM-RoBERTa-large, primarily used for natural language processing tasks

Model Features

Large-scale bilingual pretraining
Trained on 74GB of Bulgarian and Macedonian texts, containing over 7 billion tokens
Optimized data sampling
Doubled sampling for Macedonian data with smaller volume to balance training between the two languages
High-quality training data
Strictly filtered .bg and .mk domain data to avoid low-quality machine-translated content

Model Capabilities

Part-of-speech tagging (UPOS/XPOS)
Named entity recognition (NER)
Common sense reasoning (COPA)
Bulgarian text processing
Macedonian text processing

Use Cases

Language analysis
Bulgarian part-of-speech tagging
Performing part-of-speech tagging on Bulgarian texts
Test set accuracy reached 99.5% (UPOS)
Macedonian named entity recognition
Identifying named entities in Macedonian texts
Test set F1 score reached 96.3%
Language understanding
Common sense reasoning tasks
Solving COPA common sense reasoning problems in Bulgarian and Macedonian
Accuracy reached 54.6% and 55.6% respectively
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase