Oolel-v0.1 Open-Source Language Model - Free Deployment, Supports Bidirectional Translation between Wolof and English and Text Generation

Oolel V0.1

Developed by soynade-research

The first high-performance open-source language model for Wolof, built on the Qwen 2.5 architecture, supporting bidirectional translation between Wolof and English, text generation, and other tasks

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Wolof Large Language Model #African Language AI #Multilingual RAG System

Downloads 145

Release Time : 12/8/2024

Model Overview

An open-source large language model optimized for Wolof, supporting NLP tasks such as text generation, translation, and mathematical operations, filling the gap for African languages in AI

Model Features

Wolof-Specific Optimization

Deep optimization for Wolof grammar and expression habits

Multi-Task Support

Supports various NLP tasks including translation, text generation, and mathematical operations

Cultural Adaptation

Training data includes traditional African cultural elements, ensuring outputs align with local cultural contexts

Model Capabilities

Wolof text generation

Bidirectional English-Wolof translation

Mathematical operations in Wolof

Text summarization

Text refinement

Code generation

Use Cases

Language Services

Wolof Content Creation

Generates text content that aligns with Wolof language conventions

Useful for news writing, storytelling, and other scenarios

Cross-Language Communication

Enables real-time translation between English and Wolof

Facilitates information exchange between English-speaking and Wolof-speaking regions

Education

Language Learning Assistance

Helps learners master Wolof grammar and expressions

Provides example sentence generation and translation exercises

🚀 Oolel: A High-Performing Open LLM for Wolof

Oolel is the first open - source language model for Wolof, developed by Soynade Research. It addresses the under - representation of African languages in large language models. Built on the Qwen 2.5 architecture, it combines advanced AI technology with in - depth Wolof linguistic knowledge, trained for various NLP tasks in Wolof.

Despite numerous open - source innovations in large language models, African languages have remained underrepresented.

Soynade Research is transforming this landscape with Oolel, the first open - source language model for Wolof.

Built on the Qwen 2.5 architecture, Oolel combines state - of - the - art AI technology with deep Wolof linguistic expertise. With carefully curated high - quality data, we trained and optimized Oolel for the following tasks:

RAG: Supporting Wolof queries with English, French, or Wolof context.
Bidirectional translation between English and Wolof
Natural text generation in Wolof
Math in Wolof
And many other standard NLP tasks:
- Summarization
- Text edition
- etc

🚀 Quick Start

✨ Features

Based on the Qwen 2.5 architecture, integrating advanced AI technology and Wolof language knowledge.
Capable of handling multiple NLP tasks in Wolof, including RAG, translation, text generation, and math.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

!!! It's important to add your system prompt !!!

Here provides a code snippet with apply_chat_template to show you how to load the tokenizer and model and how to generate contents.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = "cuda" 

model = AutoModelForCausalLM.from_pretrained(
    "soynade-research/Oolel-v0.1",
    torch_dtype = torch.bfloat16,
    device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("soynade-research/Oolel-v0.1")

def generate_response(messages, max_new_tokens=1024, temperature=0.1):
    text = tokenizer.apply_chat_template(
          messages,
          tokenize=False,
          add_generation_prompt=True
)
    model_inputs = tokenizer([text], return_tensors="pt").to(device)
    generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=max_new_tokens, temperature=temperature)
    
    generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    return response

Advanced Usage

1. Translation Tasks

system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Translate to Wolof: Bassirou Diomaye Faye is the new Senegalese president. He is 44 years old"}
]
print(generate_response(messages))

2. Code generation

system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries"
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Bindal ab klaas Python buy wone ni ñuy jëfandikoo dataframe yi ci Pandas"}
]
print(generate_response(messages))

3. Problem Solving

system_prompt = "You're a Wolof AI assistant. Please always provide detailed and useful answers to the user queries."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Ndax nga mën ma won ni ñuy resolver problème bii: Fatou dafa jënd 3 kilo ceeb, 2 kilo diw ak 5 kilo sukër. Ceeb gi wenn kilo 500 CFA la, diw gi 1200 CFA kilo bi, sukër gi 750 CFA kilo bi. Ñaata la wara fay?"}
]
from pprint import pprint
pprint(generate_response(messages))

4. Text Generation (e.g. story generation)

system_prompt = "You are a skilled Wolof storyteller (Gewël) with deep knowledge of African folktales and traditions. Write engaging stories in Wolof that reflect African cultural values and wisdom."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Bindal ab léeb ci gaynde gi lekk muus mi"}
]
print(generate_response(messages, temperature=0.9))

5. Multi - turn conversations

Oolel is not optimized for multi - turn conversations, but you can try it!

messages = [
   {"role": "user", "content": "Wax ma clan mooy CEDEAO ? Ci lan la liggeey?"},
   {"role": "assistant", "content": "CEDEAO mooy 'organisation' gu boole reew yi nekk ci pennc Afrika bi. Mu ngi sukkandiku ci wàll économie, politig, ak déggoo diggante reew yi"},
   {"role": "user", "content": "ñaata reew ñoo ci bokk?"}
]
print(generate_response(messages))

📚 Documentation

No detailed documentation is provided in the original document, so this section is skipped.

🔧 Technical Details

No technical details are provided in the original document, so this section is skipped.

📄 License

Property	Details
License	apache - 2.0

Authors

Yaya SY: NLP Researcher (Efficient Continued Pretraining)
Dioula DOUCOURE: Data & NLP Engineer

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご