B

Bertovski

Developed by MaCoCu
BERTovski is a large pre-trained language model based on Bulgarian and Macedonian texts, utilizing the RoBERTa architecture, and is a product of the MaCoCu project.
Downloads 28
Release Time : 8/11/2022

Model Overview

BERTovski is a natural language processing model focused on Bulgarian and Macedonian, suitable for various language tasks such as part-of-speech tagging and named entity recognition.

Model Features

Multilingual Support
Focused on Bulgarian and Macedonian while also supporting multilingual processing tasks.
High-Quality Training Data
Training data is rigorously filtered, including only high-quality texts from original .bg/.mk domains to avoid low-quality machine translations.
Balanced Data Distribution
Balances corpus proportions by replicating Macedonian data to ensure equitable performance for both languages in the model.

Model Capabilities

Part-of-speech tagging
Named entity recognition
Commonsense reasoning
Text understanding

Use Cases

Natural Language Processing
Bulgarian Part-of-Speech Tagging
Performs part-of-speech tagging on the Universal Dependencies dataset.
Test set accuracy: 99.1%
Macedonian Named Entity Recognition
Performs named entity recognition on the babushka-bench dataset.
Test set accuracy: 94.6%
Language Understanding
Commonsense Reasoning
Performs commonsense reasoning on the COPA test set.
Bulgarian: 51.7%, Macedonian: 51.8%
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase