đ đŽđˇ Persian mGPT 1.3B
A language model tailored for Persian, boasting 1.3B parameters as the name implies.
Persian, a member of the Indo - European language family, is a highly poetic language spoken by around 110 million people. Here are some key facts:
- Also known as Farsi, it is predominantly spoken in Iran, Afghanistan, and Tajikistan.
- It has a rich literary heritage with renowned poets like Rumi, Hafez, and Ferdowsi.
- It uses the Persian script, a variant of the Arabic script.
đ Quick Start
This model is designed to offer high - quality language processing capabilities for Persian text. You can start leveraging its power right away for various NLP tasks.
⨠Features
- Specialized for Persian: Specifically tuned to handle the nuances of the Persian language.
- Derived from a Strong Base: Built upon the robust [mGPT - XL (1.3B)](https://huggingface.co/ai - forever/mGPT) model.
đ§ Technical Details
This model is one of the derivatives of the base [mGPT - XL (1.3B)](https://huggingface.co/ai - forever/mGPT) model. The base model was initially trained on 61 languages from 25 language families using Wikipedia and the C4 corpus.
We sourced additional data for 23 languages, many of which are considered minor, and decided to fine - tune the base model further. Persian mGPT 1.3B was trained for an additional 200 steps with a batch_size of 4 and a context window of 2048 tokens on 1 A100.
The final perplexity of this model on the validation set is 33.44.
Chart of the training loss and perplexity:

đ Other mGPT - 1.3B Models
- [đĻđ˛ mGPT - 1.3B Armenian](https://huggingface.co/ai - forever/mGPT - 1.3B - armenian)
- [đĻđŋ mGPT - 1.3B Azerbaijan](https://huggingface.co/ai - forever/mGPT - 1.3B - azerbaijan)
- [đ¯ mGPT - 1.3B Bashkir](https://huggingface.co/ai - forever/mGPT - 1.3B - bashkir)
- [đ§đž mGPT - 1.3B Belorussian](https://huggingface.co/ai - forever/mGPT - 1.3B - belorussian)
- [đ§đŦ mGPT - 1.3B Bulgarian](https://huggingface.co/ai - forever/mGPT - 1.3B - bulgarian)
- [đ mGPT - 1.3B Buryat](https://huggingface.co/ai - forever/mGPT - 1.3B - buryat)
- [đŗ mGPT - 1.3B Chuvash](https://huggingface.co/ai - forever/mGPT - 1.3B - chuvash)
- [đŦđĒ mGPT - 1.3B Georgian](https://huggingface.co/ai - forever/mGPT - 1.3B - georgian)
- [đ¸ mGPT - 1.3B Kalmyk](https://huggingface.co/ai - forever/mGPT - 1.3B - kalmyk)
- [đ°đŋ mGPT - 1.3B Kazakh](https://huggingface.co/ai - forever/mGPT - 1.3B - kazakh)
- [đ°đŦ mGPT - 1.3B Kirgiz](https://huggingface.co/ai - forever/mGPT - 1.3B - kirgiz)
- [đģ mGPT - 1.3B Mari](https://huggingface.co/ai - forever/mGPT - 1.3B - mari)
- [đ˛đŗ mGPT - 1.3B Mongol](https://huggingface.co/ai - forever/mGPT - 1.3B - mongol)
- [đ mGPT - 1.3B Ossetian](https://huggingface.co/ai - forever/mGPT - 1.3B - ossetian)
- [đˇđ´ mGPT - 1.3B Romanian](https://huggingface.co/ai - forever/mGPT - 1.3B - romanian)
- [đšđ¯ mGPT - 1.3B Tajik](https://huggingface.co/ai - forever/mGPT - 1.3B - tajik)
- [â mGPT - 1.3B Tatar](https://huggingface.co/ai - forever/mGPT - 1.3B - tatar)
- [đšđ˛ mGPT - 1.3B Turkmen](https://huggingface.co/ai - forever/mGPT - 1.3B - turkmen)
- [đ mGPT - 1.3B Tuvan](https://huggingface.co/ai - forever/mGPT - 1.3B - tuvan)
- [đēđĻ mGPT - 1.3B Ukranian](https://huggingface.co/ai - forever/mGPT - 1.3B - ukranian)
- [đēđŋ mGPT - 1.3B Uzbek](https://huggingface.co/ai - forever/mGPT - 1.3B - uzbek)
- [đ mGPT - 1.3B Yakut](https://huggingface.co/ai - forever/mGPT - 1.3B - yakut)
đ License
This project is licensed under the MIT license.
đĄ Usage Tip
If you find a bug or have additional data to train the model on your language, please provide us with feedback. The model will be continuously improved over time, so stay tuned!