R

Roberta Base Ca

Developed by PlanTL-GOB-ES
Catalan pre-trained language model based on RoBERTa architecture, developed by Spanish government agencies
Downloads 15.56k
Release Time : 3/2/2022

Model Overview

Transformer model pre-trained with masked language modeling for Catalan, suitable for various natural language processing tasks

Model Features

Specialized corpus training
Incorporates high-quality Catalan corpora including government gazettes, news, and Wikipedia
Comprehensive performance evaluation
Outperforms multilingual models like mBERT/XLM-RoBERTa in CLUB benchmark tests
Efficient pre-training
Completed training on 1.8 billion tokens in 48 hours using 16 V100 GPUs

Model Capabilities

Masked word prediction
Text classification
Named entity recognition
Semantic similarity calculation
Question answering systems

Use Cases

Government text processing
Government gazette analysis
Automated understanding of Catalan government gazettes (DOGC)
News media
News classification
Topic classification for Catalan News Agency (ACN) articles
74.16% accuracy (TeCla dataset)
Education & research
Language understanding evaluation
Serves as base model for CLUB benchmark tests
NER task F1 score 88.13, outperforming comparison models
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase