gpt2-turkish-cased Open Source Model - A Free Starting Point for Turkish Text Generation Tasks

Gpt2 Turkish Cased

Developed by redrussianarmy

GPT-2 model trained for Turkish text, serving as a starting point for text generation tasks

Large Language Model Other#Turkish Text Generation #BPE Encoding Optimization #Multi-round Pretraining

Downloads 1,060

Release Time : 3/2/2022

Model Overview

This is a Turkish language model based on the GPT-2 architecture, specifically trained for Turkish text, suitable for various Turkish text generation tasks.

Model Features

Turkish-specific

Specially trained for Turkish text, optimizing Turkish text generation capabilities

Based on Large-scale Corpus

Trained using the Turkish corpus from oscar-corpus

Multi-framework Support

Provides weight files compatible with both PyTorch and Tensorflow

Model Capabilities

Turkish Text Generation

Language Model Fine-tuning Foundation

Use Cases

Text Generation

Creative Writing

Generate creative Turkish texts such as stories and poems

Content Completion

Complete Turkish sentences or paragraphs based on given prompts

Education

Language Learning

Serve as an auxiliary tool for Turkish language learning

🚀 Turkish GPT-2 Model

This repository presents a GPT-2 model trained on diverse Turkish texts. It serves as a starting point for fine - tuning on other text datasets.

🚀 Quick Start

The model can be used as follows:

Basic Usage

from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased")
model = AutoModelWithLMHead.from_pretrained("redrussianarmy/gpt2-turkish-cased")

Advanced Usage

Here's an example of using the Transformers Pipelines for text generation:

from transformers import pipeline
pipe = pipeline('text-generation', model="redrussianarmy/gpt2-turkish-cased",
                 tokenizer="redrussianarmy/gpt2-turkish-cased", config={'max_length':800})   
text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
print(text)

✨ Features

The model is trained on various Turkish texts, making it suitable as an entry - point for fine - tuning on other Turkish text data.
It provides both PyTorch and TensorFlow compatible weights.

📦 Installation

How to clone the model repo?

git lfs install
git clone https://huggingface.co/redrussianarmy/gpt2-turkish-cased

📚 Documentation

Training corpora

A Turkish corpora from oscar - corpus was used. With Huggingface's Tokenizers library, a byte - level BPE was created. A 52K byte - level BPE vocab was built based on the training corpora. After that, the GPT - 2 for Turkish was trained on two 2080TI over the complete training corpus for five epochs. Logs during training: https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars

Model weights

Both PyTorch and TensorFlow compatible weights are available.

Property	Details
Model Type	`redrussianarmy/gpt2-turkish-cased`
Downloads	`config.json` • `merges.txt` • `pytorch_model.bin` • `special_tokens_map.json` • `tf_model.h5` • `tokenizer_config.json` • `traning_args.bin` • `vocab.json`

📄 License

No license information is provided in the original README, so this section is skipped.

🔧 Technical Details

The model training process involves using a Turkish corpora from oscar - corpus. Huggingface's Tokenizers library is used to create a byte - level BPE and a 52K byte - level BPE vocab. The training is carried out on two 2080TI over the complete training corpus for five epochs. The training logs can be accessed at https://tensorboard.dev/experiment/3AWKv8bBTaqcqZP5frtGkw/#scalars.

📞 Contact (Bugs, Feedback, Contribution and more)

For questions about the GPT2 - Turkish model, just open an issue here 🤗

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご