🚀 Maral 7B Alpha 1
Maral 7B Alpha 1 is a new large language model specializing in the Persian language, based on Mistral and trained on an Alpaca Persian dataset. It can also generate English answers.
🚀 Quick Start
Maral is a large language model focusing on the Persian language. Based on Mistral and trained on the Alpaca Persian dataset, it aims to revitalize the Persian language in the AI era. Also, it can produce English answers due to its base model.
What does "Maral" mean?
Maral is the Persian name of Red Deer, a native deer species in Iran. The name was chosen for environmental concerns and to represent its Iranian origin as a Persian LLM.
✨ Features
- Specialized in the Persian language.
- Capable of generating English answers.
📦 Installation
pip install transformers accelerate bitsandbytes
NOTE: The bitsandbytes
library is only needed for the 8-bit version. Otherwise, it's not necessary.
💻 Usage Examples
Prompt Format
This model requires the Guanaco format:
### Human: <prompt>
### Assistant: <answer>
In your code, you can write prompts like this:
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"
4-bit Quantization
If you want to use 4-bit quantization, we have a PEFT here. You can also find Google Colab notebooks here.
Inference on a Big GPU
If you have a large GPU like an A100:
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_name_or_id = "MaralGPT/Maral-7B-alpha-1"
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.5,
max_new_tokens=300,
pad_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference on a Small GPU (Consumer Hardware/Free Colab)
The code is similar, but with some differences:
- Make sure
bitsandbytes
is installed correctly.
- Load the model with
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, load_in_8bit=True, torch_dtype=torch.bfloat16, device_map="auto")
On the free version of Google Colab, you may encounter RAM issues. Using low_cpu_mem_usage=True
when loading the model might help.
📚 Documentation
Known Issues
- The model produces GPT-3.5 level answers in terms of grammar (especially in Persian) but can have extreme hallucinations. This can be solved with a better dataset and training procedures (e.g., DPO).
- It can generate misinforming answers, especially for Persian reasoning problems.
- The model is large and requires many resources. We may provide GPTQ or GGUF versions.
- The prompt format works, but since
eos_token
and bos_token
are not changed, the model may generate unnecessary information.
- The model may repeat itself. Keep the temperature below 1 (ideally between 0.5 and 0.7) to temporarily solve this.
Our Team
Special Thanks
- Mistral Team for providing the best open-source base model.
- Sina Rashidi for translating the Alpaca dataset to Persian.
- Jupyto team for providing the infrastructure.
📄 License
This project is licensed under the MIT license.