roberta-base-turkish-uncased Open-Source Model - Empowering Turkish Text Processing and Analysis

Roberta Base Turkish Uncased

Developed by TURKCELL

This is a RoBERTa base model based on Turkish. The pre - training data is sourced from Turkish Wikipedia, the Turkish OSCAR corpus, and some news websites.

Large Language Model

Transformers

Open Source License:MIT #Turkish pre - training #Text fill - in prediction #Large - scale corpus training

Downloads 109

Release Time : 12/7/2023

Model Overview

This model is a case - insensitive RoBERTa model for Turkish, mainly used for Turkish text understanding and generation tasks.

Model Features

Large - scale pre - training data

Trained with 38GB of Turkish text data, containing 329,720,508 sentences.

High - performance hardware training

Trained using Intel Xeon Gold processors and Tesla V100 graphics cards.

Turkish optimization

Specifically optimized for Turkish characteristics, including Turkish Wikipedia and news data.

Model Capabilities

Turkish text understanding

Masked language modeling

Text fill - in tasks

Use Cases

Natural language processing

Text fill - in

Predict the masked words in a sentence

As shown in the example, it can accurately predict the blank word in 'iki ülke arasında <mask> başladı'

Text generation

Generate coherent Turkish text based on the context

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Roberta Base Turkish Uncased

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 RoBERTaTurkish

🚀 Quick Start

💻 Usage Examples

Basic Usage

Advanced Usage

📄 License