SRoBERTa - F Open-source Model - Based on a large-scale dataset of Keese language, it supports masked language model tasks.

Home

Sroberta F

Developed by Andrija

RoBERTa model trained on a 43GB dataset of Croatian and Serbian languages, supporting masked language modeling tasks.

Large Language Model

Transformers

OtherOpen Source License:Apache-2.0 #Croatian-Serbian Bilingual #Masked Language Modeling #Low Perplexity Optimization

Downloads 51

Release Time : 3/2/2022

Model Overview

This is a RoBERTa model optimized for Croatian and Serbian, primarily used for natural language processing tasks, especially masked language modeling.

Model Features

Multi-source Training Data

Integrates multiple high-quality datasets including Leipzig, OSCAR, srWac, hrWac, cc100-hr, and cc100-sr, totaling 43GB of text data.

Continuous Training Potential

The training process shows no signs of stagnation, indicating further optimization is possible.

Bilingual Support

Specifically optimized for Croatian and Serbian languages.

Model Capabilities

Text Understanding

Language Modeling

Contextual Prediction

Use Cases

Natural Language Processing

Text Completion

Predict masked words in a sentence

Example: 'Ovo je početak <mask>.' can predict the completion of the sentence

Language Model Fine-tuning

Used as a base model for downstream NLP tasks

Property	Details
Model Type	`Andrija/SRoBERTa - F`
#params	80M
Arch.	Fifth
Training Data	Leipzig Corpus, OSCAR, srWac, hrWac, cc100 - hr and cc100 - sr (43 GB of text)

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Sroberta F

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 Transformer language model for Croatian and Serbian

🚀 Quick Start

✨ Features

📦 Installation

💻 Usage Examples

📚 Documentation

Model Performance

Model Information

📄 License