Kanarya-750m (Canary-750M) Open-Source Model - Free to Use for Turkish Language Content Processing

Kanarya 750m

Developed by asafaya

Canary-750M is a pre-trained Turkish GPT-J 750M model, part of the Turkish Data Depository initiative.

Large Language Model OtherOpen Source License:Apache-2.0 #Turkish text generation #Multi-task fine-tuning #Large context window

Downloads 2,749

Release Time : 1/17/2024

Model Overview

Canary-750M is a Turkish large language model based on the GPT-J architecture, primarily used for fine-tuning Turkish NLP tasks.

Model Features

Large-scale Turkish pre-training

Trained on extensive Turkish text corpora from OSCAR and mC4 datasets

Diverse corpus

Training data collected from various sources including news, articles, and websites to build a diverse and high-quality corpus

JAX/Flax implementation

Training conducted using GPT-J architecture implemented with JAX/Flax

Model Capabilities

Turkish text generation

Turkish text comprehension

Fine-tuning for Turkish NLP tasks

Use Cases

Natural Language Processing

Turkish text generation

Generate Turkish text content

Turkish translation

Can be used for translation tasks between Turkish and other languages

Turkish summarization

Generate summaries of Turkish texts

🚀 Kanarya-750M: Turkish Language Model

Kanarya is a pre - trained Turkish GPT - J 750M model. As part of the Turkish Data Depository initiative, the Kanarya family offers two versions: Kanarya - 2B (the larger one) and Kanarya - 0.7B (the smaller one). Both are trained on a large - scale Turkish text corpus, filtered from the OSCAR and mC4 datasets. The training data, sourced from news, articles, and websites, forms a diverse and high - quality dataset. These models are trained using a JAX/Flax implementation of the [GPT - J](https://github.com/kingoflolz/mesh - transformer - jax) architecture. They are pre - trained and designed for fine - tuning on a wide array of Turkish NLP tasks.

Kanarya Logo

✨ Features

Pre - trained Turkish Model: Specifically tailored for the Turkish language, offering a solid foundation for various NLP tasks.
Two Model Versions: The Kanarya family provides flexibility with different model sizes to suit different requirements.
Diverse Training Data: Trained on data from multiple sources, ensuring a broad understanding of the Turkish language.
GPT - J Architecture: Utilizes the well - known GPT - J architecture implemented in JAX/Flax.

📦 Installation

No installation steps were provided in the original document, so this section is skipped.

📚 Documentation

Model Details

Property	Details
Model Name	Kanarya - 750M
Model Size	750M parameters
Training Data	OSCAR, mC4
Language	Turkish
Layers	12
Hidden Size	2048
Number of Heads	16
Context Size	2048
Positional Embeddings	Rotary
Vocabulary Size	32,768

Intended Use

This model is pre - trained on Turkish text data and is meant to be fine - tuned for a wide range of Turkish NLP tasks, such as text generation, translation, summarization, etc. It should not be used for downstream tasks without fine - tuning.

Limitations and Ethical Considerations

The model, despite being trained on a high - quality and diverse Turkish text corpus, may generate toxic, biased, or unethical content. Users are strongly advised to use the model responsibly and ensure the generated content is appropriate for the use case. Please report any issues.

License

The model is licensed under the Apache 2.0 License, which allows free use for any purpose, including commercial use. We encourage users to contribute to the model and report any issues. However, the model is provided "as is" without any warranty.

Citation

If you use the model, please cite the following paper:

@inproceedings{safaya-etal-2022-mukayese,
    title = "Mukayese: {T}urkish {NLP} Strikes Back",
    author = "Safaya, Ali  and
      Kurtulu{\c{s}}, Emirhan  and
      Goktogan, Arda  and
      Yuret, Deniz",
    editor = "Muresan, Smaranda  and
      Nakov, Preslav  and
      Villavicencio, Aline",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-acl.69",
    doi = "10.18653/v1/2022.findings-acl.69",
    pages = "846--863",
}

Acknowledgments

During this work, Ali Safaya was supported by KUIS AI Center fellowship. Additionally, the pre - training of these models was carried out at TUBITAK ULAKBIM, High Performance and Grid Computing Center ([TRUBA](https://www.truba.gov.tr/index.php/en/main - page/) resources).

💻 Usage Examples

No code examples were provided in the original document, so this section is skipped.

🔧 Technical Details

No specific technical implementation details (more than 50 words) were provided in the original document, so this section is skipped.

📄 License

The model is licensed under the Apache 2.0 License. It is free to use for any purpose, including commercial use. We encourage users to contribute to the model and report any issues. However, the model is provided "as is" without warranty of any kind.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご