đ đ Yakut mGPT 1.3B
Yakut mGPT 1.3B is a language model specifically designed for the Yakut language. As the name suggests, this model has 1.3 billion parameters, which enables it to handle various language - related tasks effectively.
The Yakut language belongs to the Turkic language family. It is a profound language with around half a million speakers. Here are some key facts about it:
- It is also referred to as Sakha.
- It is spoken in the Sakha Republic in Russia.
- Despite its Turkic origin, it has been influenced by Tungusic languages.
⨠Features
- Specialized for the Yakut language, catering to the needs of its speakers.
- Derived from a well - trained base model, ensuring a solid foundation.
đ Documentation
đ§ Technical Details
This model is one of the derivatives of the base [mGPT - XL (1.3B)](https://huggingface.co/ai - forever/mGPT) model. The base model was initially trained on 61 languages from 25 language families using Wikipedia and the C4 corpus.
We discovered additional data for 23 languages, most of which are considered minor. Then, we decided to fine - tune the base model further. The Yakut mGPT 1.3B was trained for an additional 2000 steps with a batch_size of 4 and a context window of 2048 tokens on 1 A100.
The final perplexity of this model on the validation set is 10.65.
Chart of the training loss and perplexity:

Other mGPT - 1.3B models
- [đĻđ˛ mGPT - 1.3B Armenian](https://huggingface.co/ai - forever/mGPT - 1.3B - armenian)
- [đĻđŋ mGPT - 1.3B Azerbaijan](https://huggingface.co/ai - forever/mGPT - 1.3B - azerbaijan)
- [đ¯ mGPT - 1.3B Bashkir](https://huggingface.co/ai - forever/mGPT - 1.3B - bashkir)
- [đ§đž mGPT - 1.3B Belorussian](https://huggingface.co/ai - forever/mGPT - 1.3B - belorussian)
- [đ§đŦ mGPT - 1.3B Bulgarian](https://huggingface.co/ai - forever/mGPT - 1.3B - bulgarian)
- [đ mGPT - 1.3B Buryat](https://huggingface.co/ai - forever/mGPT - 1.3B - buryat)
- [đŗ mGPT - 1.3B Chuvash](https://huggingface.co/ai - forever/mGPT - 1.3B - chuvash)
- [đŦđĒ mGPT - 1.3B Georgian](https://huggingface.co/ai - forever/mGPT - 1.3B - georgian)
- [đ¸ mGPT - 1.3B Kalmyk](https://huggingface.co/ai - forever/mGPT - 1.3B - kalmyk)
- [đ°đŋ mGPT - 1.3B Kazakh](https://huggingface.co/ai - forever/mGPT - 1.3B - kazakh)
- [đ°đŦ mGPT - 1.3B Kirgiz](https://huggingface.co/ai - forever/mGPT - 1.3B - kirgiz)
- [đģ mGPT - 1.3B Mari](https://huggingface.co/ai - forever/mGPT - 1.3B - mari)
- [đ˛đŗ mGPT - 1.3B Mongol](https://huggingface.co/ai - forever/mGPT - 1.3B - mongol)
- [đ mGPT - 1.3B Ossetian](https://huggingface.co/ai - forever/mGPT - 1.3B - ossetian)
- [đŽđˇ mGPT - 1.3B Persian](https://huggingface.co/ai - forever/mGPT - 1.3B - persian)
- [đˇđ´ mGPT - 1.3B Romanian](https://huggingface.co/ai - forever/mGPT - 1.3B - romanian)
- [đšđ¯ mGPT - 1.3B Tajik](https://huggingface.co/ai - forever/mGPT - 1.3B - tajik)
- [â mGPT - 1.3B Tatar](https://huggingface.co/ai - forever/mGPT - 1.3B - tatar)
- [đšđ˛ mGPT - 1.3B Turkmen](https://huggingface.co/ai - forever/mGPT - 1.3B - turkmen)
- [đ mGPT - 1.3B Tuvan](https://huggingface.co/ai - forever/mGPT - 1.3B - tuvan)
- [đēđĻ mGPT - 1.3B Ukranian](https://huggingface.co/ai - forever/mGPT - 1.3B - ukranian)
- [đēđŋ mGPT - 1.3B Uzbek](https://huggingface.co/ai - forever/mGPT - 1.3B - uzbek)
đ License
This project is licensed under the MIT license.
đĄ Usage Tip
If you find a bug or have additional data to train the model for your language, please provide us with feedback. The model will be continuously improved over time, so stay tuned!