🚀 Oolel: A High-Performing Open LLM for Wolof
Oolel is the first open - source language model for Wolof, developed by Soynade Research. It addresses the under - representation of African languages in large language models. Built on the Qwen 2.5 architecture, it combines advanced AI technology with in - depth Wolof linguistic knowledge, trained for various NLP tasks in Wolof.
Despite numerous open - source innovations in large language models, African languages have remained underrepresented.
Soynade Research is transforming this landscape with Oolel, the first open - source language model for Wolof.
Built on the Qwen 2.5 architecture, Oolel combines state - of - the - art AI technology with deep Wolof linguistic expertise. With carefully curated high - quality data, we trained and optimized Oolel for the following tasks:
- RAG: Supporting Wolof queries with English, French, or Wolof context.
- Bidirectional translation between English and Wolof
- Natural text generation in Wolof
- Math in Wolof
- And many other standard NLP tasks:
- Summarization
- Text edition
- etc
🚀 Quick Start
✨ Features
- Based on the Qwen 2.5 architecture, integrating advanced AI technology and Wolof language knowledge.
- Capable of handling multiple NLP tasks in Wolof, including RAG, translation, text generation, and math.
📦 Installation
No installation steps are provided in the original document, so this section is skipped.
💻 Usage Examples
Basic Usage
!!! It's important to add your system prompt !!!
Here provides a code snippet with apply_chat_template
to show you how to load the tokenizer and model and how to generate contents.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(
"soynade-research/Oolel-v0.1",
torch_dtype = torch.bfloat16,
device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("soynade-research/Oolel-v0.1")
def generate_response(messages, max_new_tokens=1024, temperature=0.1):
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=max_new_tokens, temperature=temperature)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
return response
Advanced Usage
1. Translation Tasks
system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Translate to Wolof: Bassirou Diomaye Faye is the new Senegalese president. He is 44 years old"}
]
print(generate_response(messages))
2. Code generation
system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Bindal ab klaas Python buy wone ni ñuy jëfandikoo dataframe yi ci Pandas"}
]
print(generate_response(messages))
3. Problem Solving
system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Ndax nga mën ma won ni ñuy resolver problème bii: Fatou dafa jënd 3 kilo ceeb, 2 kilo diw ak 5 kilo sukër. Ceeb gi wenn kilo 500 CFA la, diw gi 1200 CFA kilo bi, sukër gi 750 CFA kilo bi. Ñaata la wara fay?"}
]
from pprint import pprint
pprint(generate_response(messages))
4. Text Generation (e.g. story generation)
system_prompt = "You are a skilled Wolof storyteller (Gewël) with deep knowledge of African folktales and traditions. Write engaging stories in Wolof that reflect African cultural values and wisdom."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "Bindal ab léeb ci gaynde gi lekk muus mi"}
]
print(generate_response(messages, temperature=0.9))
5. Multi - turn conversations
Oolel is not optimized for multi - turn conversations, but you can try it!
messages = [
{"role": "user", "content": "Wax ma clan mooy CEDEAO ? Ci lan la liggeey?"},
{"role": "assistant", "content": "CEDEAO mooy 'organisation' gu boole reew yi nekk ci pennc Afrika bi. Mu ngi sukkandiku ci wàll économie, politig, ak déggoo diggante reew yi"},
{"role": "user", "content": "ñaata reew ñoo ci bokk?"}
]
print(generate_response(messages))
📚 Documentation
No detailed documentation is provided in the original document, so this section is skipped.
🔧 Technical Details
No technical details are provided in the original document, so this section is skipped.
📄 License
Property |
Details |
License |
apache - 2.0 |
Authors