Bee1reason Arabic Qwen 14B
Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Bee1reason-arabic-Qwen-14B: A Qwen3 14B Model Fine-tuned for Arabic Logical Reasoning
Bee1reason-arabic-Qwen-14B is a Large Language Model (LLM) fine-tuned for enhanced Arabic logical reasoning while maintaining general conversational abilities.
🚀 Quick Start
Bee1reason-arabic-Qwen-14B is a merged 16-bit model, which can be directly loaded and used with the transformers
library. Here is a simple example:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch
model_id = "beetlware/Bee1reason-arabic-Qwen-14B"
# Load the Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load the Model
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16, # or torch.float16 if bfloat16 is not supported
device_map="auto", # Distributes the model on available devices (GPU/CPU)
)
# Ensure the model is in evaluation mode for inference
model.eval()
✨ Features
- Built on
unsloth/Qwen3-14B
: Leverages the power and performance of the Qwen3 14 - billion parameter base model. - Fine - tuned for Arabic Logical Reasoning: Trained on a dataset containing Arabic logical reasoning tasks.
- Conversational Format: The model follows a conversational format, expecting user and assistant roles. It was trained on data that may include "thinking steps" (often within
<think>...</think>
tags) before providing the final answer, which is beneficial for tasks requiring explanation or complex inference. - Unsloth Efficiency: The Unsloth library was used for the fine - tuning process, enabling faster training and reduced GPU memory consumption.
- Merged 16 - bit Model: The final weights are a full float16 precision model, ready for direct use without needing to apply LoRA adapters to a separate base model.
📦 Installation
Install VLLM
pip install vllm
(VLLM installation might have specific CUDA and PyTorch version requirements. Refer to the VLLM documentation for the latest installation prerequisites.)
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
import torch
model_id = "beetlware/Bee1reason-arabic-Qwen-14B"
# Load the Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Load the Model
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16, # or torch.float16 if bfloat16 is not supported
device_map="auto", # Distributes the model on available devices (GPU/CPU)
)
# Ensure the model is in evaluation mode for inference
model.eval()
Advanced Usage - Inference with Thinking Steps
user_prompt_with_thinking_request = "استخدم التفكير المنطقي خطوة بخطوة: إذا كان لدي 4 تفاحات والشجرة فيها 20 تفاحة، فكم تفاحة لدي إجمالاً؟" # "Use step-by-step logical thinking: If I have 4 apples and the tree has 20 apples, how many apples do I have in total?"
messages_with_thinking = [
{"role": "user", "content": user_prompt_with_thinking_request}
]
# Apply the chat template
# Qwen3 uses a specific chat template. tokenizer.apply_chat_template is the correct way to format it.
chat_prompt_with_thinking = tokenizer.apply_chat_template(
messages_with_thinking,
tokenize=False,
add_generation_prompt=True # Important for adding the assistant's generation prompt
)
inputs_with_thinking = tokenizer(chat_prompt_with_thinking, return_tensors="pt").to(model.device)
print("\n--- Inference with Thinking Request (Example) ---")
streamer_think = TextStreamer(tokenizer, skip_prompt=True)
with torch.no_grad(): # Important to disable gradients during inference
outputs_think = model.generate(
**inputs_with_thinking,
max_new_tokens=512,
temperature=0.6, # Recommended settings for reasoning by Qwen team
top_p=0.95,
top_k=20,
pad_token_id=tokenizer.eos_token_id,
streamer=streamer_think
)
Advanced Usage - Normal Inference
# --- Example for Normal Inference (Conversation without explicit thinking request) ---
user_prompt_normal = "ما هي عاصمة مصر؟" # "What is the capital of Egypt?"
messages_normal = [
{"role": "user", "content": user_prompt_normal}
]
chat_prompt_normal = tokenizer.apply_chat_template(
messages_normal,
tokenize=False,
add_generation_prompt=True
)
inputs_normal = tokenizer(chat_prompt_normal, return_tensors="pt").to(model.device)
print("\n\n--- Normal Inference (Example) ---")
streamer_normal = TextStreamer(tokenizer, skip_prompt=True)
with torch.no_grad():
outputs_normal = model.generate(
**inputs_normal,
max_new_tokens=100,
temperature=0.7, # Recommended settings for normal chat
top_p=0.8,
top_k=20,
pad_token_id=tokenizer.eos_token_id,
streamer=streamer_normal
)
Usage with VLLM
Run the VLLM OpenAI - Compatible Server
python -m vllm.entrypoints.openai.api_server \
--model beetlware/Bee1reason-arabic-Qwen-14B \
--tokenizer beetlware/Bee1reason-arabic-Qwen-14B \
--dtype bfloat16 \
--max-model-len 2048 \
# --tensor-parallel-size N # If you have multiple GPUs
# --gpu-memory-utilization 0.9 # To adjust GPU memory usage
- Replace --dtype bfloat16 with float16 if needed.
- max - model - len should match the max_seq_length you used.
Send Requests to the VLLM Server
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1", # VLLM server address
api_key="dummy_key" # VLLM doesn't require an actual API key by default
)
completion = client.chat.completions.create(
model="beetlware/Bee1reason-arabic-Qwen-14B", # Model name as specified in VLLM
messages=[
{"role": "user", "content": "اشرح نظرية النسبية العامة بكلمات بسيطة."} # "Explain the theory of general relativity in simple terms."
],
max_tokens=256,
temperature=0.7,
stream=True # To enable streaming
)
print("Streaming response from VLLM:")
full_response = ""
for chunk in completion:
if chunk.choices[0].delta.content is not None:
token = chunk.choices[0].delta.content
print(token, end="", flush=True)
full_response += token
print("\n--- End of stream ---")
📚 Documentation
Model Overview
Bee1reason - arabic - Qwen - 14B is a Large Language Model (LLM) fine - tuned from the unsloth/Qwen3 - 14B
base model (which itself is based on Qwen/Qwen2 - 14B
). This model has been specifically tailored to enhance logical and deductive reasoning capabilities in the Arabic language, while also maintaining its general conversational abilities. The fine - tuning process utilized LoRA (Low - Rank Adaptation) with the Unsloth library for high training efficiency. The LoRA weights were then merged with the base model to produce this standalone 16 - bit (float16) precision model.
Training Data
The model was primarily fine - tuned on a custom Arabic logical reasoning dataset, beetlware/arabic - reasoning - dataset - logic
, available on the Hugging Face Hub. This dataset includes various types of reasoning tasks (deduction, induction, abduction), with each task comprising the question text, a proposed answer, and a detailed solution including thinking steps.
This data was converted into a conversational format for training, typically with:
- User Role: Containing the problem/question text.
- Assistant Role: Containing the detailed solution, including thinking steps (often within
<think>...</think>
tags) followed by the final answer.
Fine - tuning Details
- Base Model:
unsloth/Qwen3 - 14B
- Fine - tuning Technique: LoRA (Low - Rank Adaptation)
r
(rank): 32lora_alpha
: 32target_modules
:["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
lora_dropout
: 0bias
: "none"
- Libraries Used: Unsloth (for efficient model loading and PEFT application) and Hugging Face TRL (
SFTTrainer
) - Max Sequence Length (
max_seq_length
): 2048 tokens - Training Parameters (example from notebook):
per_device_train_batch_size
: 2gradient_accumulation_steps
: 4 (simulating a total batch size of 8)warmup_steps
: 5max_steps
: 30 (in the notebook, adjustable for a full run)learning_rate
: 2e - 4 (recommended to reduce to 2e - 5 for longer training runs)optim
: "adamw_8bit"
- Final Save: LoRA weights were merged with the base model and saved in
merged_16bit
(float16) precision.
🔧 Technical Details
The fine - tuning process of Bee1reason - arabic - Qwen - 14B used LoRA (Low - Rank Adaptation) with the Unsloth library. LoRA allows for efficient fine - tuning by only training a small set of low - rank matrices, which significantly reduces the number of trainable parameters and GPU memory consumption. The Unsloth library further optimizes the training process, enabling faster training and better resource utilization.
The model was trained on a custom Arabic logical reasoning dataset, which was carefully designed to enhance the model's logical and deductive reasoning capabilities in Arabic. The data was formatted into a conversational style, with user and assistant roles, and included "thinking steps" in some cases to help the model learn how to reason step - by - step.
📄 License
This model is licensed under the apache - 2.0 license.
Limitations and Potential Biases
The model's performance is highly dependent on the quality and diversity of the training data. It may exhibit biases present in the data it was trained on. Despite fine - tuning for logical reasoning, the model might still make errors on very complex or unfamiliar reasoning tasks. The model may "hallucinate" or produce incorrect information, especially for topics not well - covered in its training data. Capabilities in languages other than Arabic (if primarily trained on Arabic) might be limited.
Additional Information
- Developed by: [loai abdalslam/Organization - beetleware]
- Upload/Release Date: [21 - 5 - 2025]
- Contact / Issue Reporting: [loai.abdalsalm@beetleware.com]
Beetleware
We are a software house and digital transformation service provider that was founded six years ago and is based in Saudi Arabia.
All rights reserved@2025
Our Offices
- KSA Office
- Phone: (+966) 54 597 3282
- Email: ahmed.taha@beetleware.com
- Egypt Office
- Phone: (+2) 010 67 256 306
- Email: ahmed.abullah@beetleware.com
- Oman Office
- Phone: (+968) 9522 8632
Uploaded model
- Developed by: beetlware AI Team
- License: apache - 2.0
- Finetuned from model: unsloth/qwen3 - 14b - unsloth - bnb - 4bit
This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.

