R

Roberta Large Bne Capitel Pos

Developed by PlanTL-GOB-ES
A RoBERTa-large model trained on data from the Spanish National Library (BNE), fine-tuned for Spanish POS tagging on the CAPITEL dataset
Downloads 186
Release Time : 3/2/2022

Model Overview

This model is specifically designed for Spanish POS tagging tasks, pre-trained on a large-scale Spanish corpus and fine-tuned on the CAPITEL-POS dataset

Model Features

Large-scale pre-training data
Pre-trained on 570GB of cleaned and deduplicated Spanish text, sourced from web-crawled data by the Spanish National Library between 2009-2019
High-performance POS tagging
Achieves an F1 score of 98.56 on the CAPITEL-POS test set, outperforming other Spanish language models
Domain-specific optimization
Fine-tuned using the IberLEF 2020 CAPITEL competition dataset, suitable for processing professional Spanish texts

Model Capabilities

Spanish POS tagging
Text token classification
Natural language processing

Use Cases

Text analysis
News text analysis
Analyzing POS distribution in Spanish news texts
Accurately identifies various POS in news texts
Academic research
For Spanish linguistics research and teaching
Provides professional-level POS tagging results
NLP applications
Information extraction systems
Serves as a preprocessing component for information extraction systems
Improves accuracy of subsequent processing tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase