đ Tatar mGPT 1.3B
Tatar mGPT 1.3B is a language model specifically designed for the Tatar language. As the name suggests, this model has 1.3 billion parameters. The Tatar language belongs to the Turkic language family, a sonorous language spoken by approximately 5.3 million people. Here are some key facts about it:
- It is primarily spoken by the Tatars, mainly in the Republic of Tatarstan, Russia.
- Historically, it has used both Arabic and Cyrillic scripts, and the Latin script is also becoming more popular.
- The Tatars have a rich history, especially their association with the Golden Horde from the 13th to the 15th centuries.
đ Quick Start
The model is ready to be used for Tatar language - related tasks. You can start leveraging its capabilities right away.
⨠Features
- Tatar Language Focus: Tailored specifically for the Tatar language, enabling high - quality language processing for Tatar text.
- Derived from mGPT - XL: Built upon the foundation of the [mGPT - XL (1.3B)](https://huggingface.co/ai - forever/mGPT) model, which has been pre - trained on a diverse set of languages.
đ§ Technical Details
This model is one of the derivatives of the base [mGPT - XL (1.3B)](https://huggingface.co/ai - forever/mGPT) model. The base model was initially trained on 61 languages from 25 language families using Wikipedia and the C4 corpus.
We discovered additional data for 23 languages, most of which are considered minor languages. Then we decided to further fine - tune the base model. Tatar mGPT 1.3B was trained for an additional 5000 steps with a batch_size of 4 and a context window of 2048 tokens on 1 A100.
The final perplexity of this model on the validation set is 3.69.
Chart of the training loss and perplexity:

đ Documentation
Other mGPT - 1.3B models
Here are some other mGPT - 1.3B models for different languages:
- [đĻđ˛ mGPT - 1.3B Armenian](https://huggingface.co/ai - forever/mGPT - 1.3B - armenian)
- [đĻđŋ mGPT - 1.3B Azerbaijan](https://huggingface.co/ai - forever/mGPT - 1.3B - azerbaijan)
- [đ¯ mGPT - 1.3B Bashkir](https://huggingface.co/ai - forever/mGPT - 1.3B - bashkir)
- [đ§đž mGPT - 1.3B Belorussian](https://huggingface.co/ai - forever/mGPT - 1.3B - belorussian)
- [đ§đŦ mGPT - 1.3B Bulgarian](https://huggingface.co/ai - forever/mGPT - 1.3B - bulgarian)
- [đ mGPT - 1.3B Buryat](https://huggingface.co/ai - forever/mGPT - 1.3B - buryat)
- [đŗ mGPT - 1.3B Chuvash](https://huggingface.co/ai - forever/mGPT - 1.3B - chuvash)
- [đŦđĒ mGPT - 1.3B Georgian](https://huggingface.co/ai - forever/mGPT - 1.3B - georgian)
- [đ¸ mGPT - 1.3B Kalmyk](https://huggingface.co/ai - forever/mGPT - 1.3B - kalmyk)
- [đ°đŋ mGPT - 1.3B Kazakh](https://huggingface.co/ai - forever/mGPT - 1.3B - kazakh)
- [đ°đŦ mGPT - 1.3B Kirgiz](https://huggingface.co/ai - forever/mGPT - 1.3B - kirgiz)
- [đģ mGPT - 1.3B Mari](https://huggingface.co/ai - forever/mGPT - 1.3B - mari)
- [đ˛đŗ mGPT - 1.3B Mongol](https://huggingface.co/ai - forever/mGPT - 1.3B - mongol)
- [đ mGPT - 1.3B Ossetian](https://huggingface.co/ai - forever/mGPT - 1.3B - ossetian)
- [đŽđˇ mGPT - 1.3B Persian](https://huggingface.co/ai - forever/mGPT - 1.3B - persian)
- [đˇđ´ mGPT - 1.3B Romanian](https://huggingface.co/ai - forever/mGPT - 1.3B - romanian)
- [đšđ¯ mGPT - 1.3B Tajik](https://huggingface.co/ai - forever/mGPT - 1.3B - tajik)
- [đšđ˛ mGPT - 1.3B Turkmen](https://huggingface.co/ai - forever/mGPT - 1.3B - turkmen)
- [đ mGPT - 1.3B Tuvan](https://huggingface.co/ai - forever/mGPT - 1.3B - tuvan)
- [đēđĻ mGPT - 1.3B Ukranian](https://huggingface.co/ai - forever/mGPT - 1.3B - ukranian)
- [đēđŋ mGPT - 1.3B Uzbek](https://huggingface.co/ai - forever/mGPT - 1.3B - uzbek)
- [đ mGPT - 1.3B Yakut](https://huggingface.co/ai - forever/mGPT - 1.3B - yakut)
đ License
This model is released under the MIT license.
đĄ Usage Tip
If you find a bug or have additional data for training the model on your language, please provide us with feedback. The model will be continuously improved, so stay tuned!