SinhalaBERTo Open-Source Model - Providing Free Basic Support for Low-Resource Sinhala Language

Sinhalaberto

Developed by keshan

This is a relatively small model trained on the deduplicated OSCAR Sinhala dataset, providing foundational support for the low-resource Sinhala language.

Large Language Model Other#Sinhala pretraining #Low-resource language model #Masked language modeling

Downloads 34

Release Time : 3/2/2022

Model Overview

This model is a Sinhala language model trained based on the Roberta architecture, primarily used for masked language modeling tasks to provide a pretraining foundation for downstream tasks.

Model Features

Low-resource language support

Specially optimized for the resource-scarce Sinhala language.

Lightweight architecture

Utilizes a streamlined Roberta architecture with 6 hidden layers, suitable for resource-constrained environments.

Large-scale pretraining data

Trained on the deduplicated OSCAR dataset, covering a wide range of linguistic features.

Model Capabilities

Text infilling

Language modeling

Context prediction

Use Cases

Natural Language Processing

Text completion

Automatically completes missing parts in Sinhala sentences.

Accurately predicts masked vocabulary in sentences.

Language model fine-tuning

Serves as a pretrained base model for downstream NLP tasks.

Provides a transfer learning foundation for various Sinhala NLP applications.

Property	Details
vocab_size	52000
max_position_embeddings	514
num_attention_heads	12
num_hidden_layers	6
type_vocab_size	1

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Sinhalaberto

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 SinhalaBERTo

🚀 Quick Start

✨ Features

📦 Installation

💻 Usage Examples

Basic Usage

📚 Documentation

Model Specification

🔧 Technical Details

📄 License