đ Ukrainian mGPT 1.3B
This is a language model designed for the Ukrainian language. As the name suggests, the model has 1.3 billion parameters.
Ukrainian belongs to the Indo - European language family. It's a highly melodious language spoken by approximately 40 million people. Here are some key facts about it:
- It is one of the East Slavic languages, along with Russian and Belarusian.
- It is the official language of Ukraine and is written using a version of the Cyrillic script.
- Ukrainian has a rich literary history and maintains a vibrant cultural presence, particularly in poetry and music.
⨠Features
- Tailored for the Ukrainian language.
- Derived from a multi - language base model, offering a solid foundation.
- Further tuned with additional data for better performance.
đ Documentation
Technical Details
This model is derived from the base [mGPT - XL (1.3B)](https://huggingface.co/ai - forever/mGPT) model (see the list below). The base model was initially trained on 61 languages from 25 language families using Wikipedia and the C4 corpus.
We discovered additional data for 23 languages, most of which are considered minor, and decided to further fine - tune the base model. Ukrainian mGPT 1.3B was trained for an additional 10,000 steps with a batch size of 4 and a context window of 2048 tokens on 1 A100.
The final perplexity of this model on the validation set is 7.1.
Chart of the training loss and perplexity:

Other mGPT - 1.3B models
- [đĻđ˛ mGPT - 1.3B Armenian](https://huggingface.co/ai - forever/mGPT - 1.3B - armenian)
- [đĻđŋ mGPT - 1.3B Azerbaijan](https://huggingface.co/ai - forever/mGPT - 1.3B - azerbaijan)
- [đ¯ mGPT - 1.3B Bashkir](https://huggingface.co/ai - forever/mGPT - 1.3B - bashkir)
- [đ§đž mGPT - 1.3B Belorussian](https://huggingface.co/ai - forever/mGPT - 1.3B - belorussian)
- [đ§đŦ mGPT - 1.3B Bulgarian](https://huggingface.co/ai - forever/mGPT - 1.3B - bulgarian)
- [đ mGPT - 1.3B Buryat](https://huggingface.co/ai - forever/mGPT - 1.3B - buryat)
- [đŗ mGPT - 1.3B Chuvash](https://huggingface.co/ai - forever/mGPT - 1.3B - chuvash)
- [đŦđĒ mGPT - 1.3B Georgian](https://huggingface.co/ai - forever/mGPT - 1.3B - georgian)
- [đ¸ mGPT - 1.3B Kalmyk](https://huggingface.co/ai - forever/mGPT - 1.3B - kalmyk)
- [đ°đŋ mGPT - 1.3B Kazakh](https://huggingface.co/ai - forever/mGPT - 1.3B - kazakh)
- [đ°đŦ mGPT - 1.3B Kirgiz](https://huggingface.co/ai - forever/mGPT - 1.3B - kirgiz)
- [đģ mGPT - 1.3B Mari](https://huggingface.co/ai - forever/mGPT - 1.3B - mari)
- [đ˛đŗ mGPT - 1.3B Mongol](https://huggingface.co/ai - forever/mGPT - 1.3B - mongol)
- [đ mGPT - 1.3B Ossetian](https://huggingface.co/ai - forever/mGPT - 1.3B - ossetian)
- [đŽđˇ mGPT - 1.3B Persian](https://huggingface.co/ai - forever/mGPT - 1.3B - persian)
- [đˇđ´ mGPT - 1.3B Romanian](https://huggingface.co/ai - forever/mGPT - 1.3B - romanian)
- [đšđ¯ mGPT - 1.3B Tajik](https://huggingface.co/ai - forever/mGPT - 1.3B - tajik)
- [â mGPT - 1.3B Tatar](https://huggingface.co/ai - forever/mGPT - 1.3B - tatar)
- [đšđ˛ mGPT - 1.3B Turkmen](https://huggingface.co/ai - forever/mGPT - 1.3B - turkmen)
- [đ mGPT - 1.3B Tuvan](https://huggingface.co/ai - forever/mGPT - 1.3B - tuvan)
- [đēđŋ mGPT - 1.3B Uzbek](https://huggingface.co/ai - forever/mGPT - 1.3B - uzbek)
- [đ mGPT - 1.3B Yakut](https://huggingface.co/ai - forever/mGPT - 1.3B - yakut)
đ License
This project is licensed under the MIT license.
đĄ Usage Tip
If you find a bug or have additional data to train a model for your language, please provide us with feedback.
The model will be improved over time. Stay tuned!