R

Robeczech Base

Developed by ufal
RobeCzech is a monolingual RoBERTa language representation model trained on Czech data, developed by the Institute of Formal and Applied Linguistics at Charles University in Prague.
Downloads 2,911
Release Time : 3/2/2022

Model Overview

This model is primarily used for masked language modeling tasks, supports Czech text processing, and is suitable for various natural language processing tasks.

Model Features

Improved Tokenizer
Version 1.1 includes significant improvements to the tokenizer, filling numbering gaps and assigning unique IDs to all tokens, enhancing model stability and compatibility.
Czech Language Optimization
Specially trained on Czech data, optimizing language representation capabilities for Czech-related natural language processing tasks.
Document Structure Preservation
Training preserves complete document structure, aiding the model in understanding contextual information.

Model Capabilities

Masked Language Modeling
Morphological Tagging
Lemmatization
Dependency Parsing
Named Entity Recognition
Semantic Parsing

Use Cases

Natural Language Processing
Morphological Analysis and Lemmatization
Performs Czech morphological analysis and lemmatization using frozen word embeddings.
Tagging accuracy reaches 98.50 (POS tagging) and 91.42 (fine-grained POS).
Named Entity Recognition
Identifies named entities in Czech text.
F1 scores reach 87.82 (nested) and 87.47 (flat).
Semantic Parsing
Performs semantic parsing on Czech text.
Average F1 score reaches 92.36.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase