Mizan-Rerank-v1 Open-Source Model - Efficient and Accurate Re-ranking of Long Arabic Texts

Mizan Rerank V1

Developed by ALJIACHI

A revolutionary open-source model capable of reordering long Arabic texts with exceptional efficiency and accuracy.

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Arabic Text Reordering #Long Text Processing #Efficient Inference

Downloads 167

Release Time : 3/31/2025

Model Overview

A leading open-source model based on Transformer architecture, specifically designed for reordering Arabic text search results. Achieves a perfect balance between performance and efficiency.

Model Features

Lightweight and Efficient

Only 149 million parameters, significantly lower than competitors' 278-568 million parameters.

Long Text Processing

Supports texts up to 8192 tokens through sliding window technology.

High-Speed Inference

3 times faster than similar models.

Arabic-Optimized

Fine-tuned specifically for the linguistic characteristics of Arabic.

Resource-Efficient

75% less memory consumption compared to competitors.

Model Capabilities

Arabic Text Reordering

Long Text Processing

Efficient Inference

Use Cases

Information Retrieval

Arabic Search Engine

Improves the ranking quality of Arabic search results.

Achieved an ndcg@10 score of 0.8865 on the MIRACL dataset.

Digital Library

Optimizes the ranking of Arabic document retrieval results.

Achieved an ndcg@10 score of 1.0000 on the reordering dataset.

Educational Technology

E-Learning Platform

Provides precise ranking for Arabic learning resources.

🚀 Mizan-Rerank-v1

A revolutionary open-source model for reranking Arabic long texts with exceptional efficiency and accuracy.

🚀 Quick Start

Mizan-Rerank-v1 is a leading open-source model based on the Transformer architecture, specifically designed for reranking search results in Arabic texts. With only 149 million parameters, it offers a perfect balance between performance and efficiency, outperforming larger models while using significantly fewer resources.

✨ Features

Lightweight & Efficient: 149M parameters vs competitors with 278 - 568M parameters
Long Text Processing: Handles up to 8192 tokens with sliding window technique
High-Speed Inference: 3x faster than comparable models
Arabic Language Optimization: Specifically fine-tuned for Arabic language nuances
Resource Efficient: 75% less memory consumption than competitors

📊 Performance Benchmarks

Hardware Performance (RTX 4090 24GB)

Property	Details
Model	RAM Usage
Mizan-Rerank-v1	1 GB
bg-rerank-v2-m3	4 GB
jina-reranker-v2-base-multilingual	2.5 GB

MIRACL Dataset Results (ndcg@10)

Model	Score
Mizan-Rerank-v1	0.8865
bge-reranker-v2-m3	0.8863
jina-reranker-v2-base-multilingual	0.8481
Namaa-ARA-Reranker-V1	0.7941
Namaa-Reranker-v1	0.7176
ms-marco-MiniLM-L12-v2	0.1750

Reranking and Triplet Datasets (ndcg@10)

Model	Reranking Dataset	Triplet Dataset
Mizan-Rerank-v1	1.0000	1.0000
bge-reranker-v2-m3	1.0000	0.9998
jina-reranker-v2-base-multilingual	1.0000	1.0000
Namaa-ARA-Reranker-V1	1.0000	0.9989
Namaa-Reranker-v1	1.0000	0.9994
ms-marco-MiniLM-L12-v2	0.8906	0.9087

🔧 Technical Details

Mizan-Rerank-v1 was trained on a diverse corpus of 741,159,981 tokens from:

Authentic Arabic open-source datasets
Manually crafted and processed text
Purpose-generated synthetic data

This comprehensive training approach enables deep understanding of Arabic linguistic contexts.

📚 Documentation

How It Works

Query reception: The model receives a user query and candidate texts
Content analysis: Analyzes semantic relationships between query and each text
Relevance scoring: Assigns a relevance score to each text
Reranking: Sorts results by descending relevance score

💻 Usage Examples

Basic Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("ALJIACHI/Mizan-Rerank-v1")
tokenizer = AutoTokenizer.from_pretrained("ALJIACHI/Mizan-Rerank-v1")

# Function to calculate relevance score
def get_relevance_score(query, passage):
    inputs = tokenizer(query, passage, return_tensors="pt", padding=True, truncation=True, max_length=8192)
    outputs = model(**inputs)
    return outputs.logits.item()

# Example usage
query = "ما هو تفسير الآية وجعلنا من الماء كل شيء حي"
passages = [
    "تعني الآية أن الماء هو عنصر أساسي في حياة جميع الكائنات الحية، وهو ضروري لاستمرار الحياة.",
    "تم اكتشاف كواكب خارج المجموعة الشمسية تحتوي على مياه متجمدة.",
    "تحدث القرآن الكريم عن البرق والرعد في عدة مواضع مختلفة."
]

# Get scores for each passage
scores = [(passage, get_relevance_score(query, passage)) for passage in passages]

# Rerank passages
reranked_passages = sorted(scores, key=lambda x: x[1], reverse=True)

# Print results
for passage, score in reranked_passages:
    print(f"Score: {score:.4f} | Passage: {passage}")

Practical Examples

Example 1

Question: What is the new tax law in 2024?

Text	Score
The official newspaper published a new law in 2024 stating a 5% increase in taxes on large companies.	0.9989
Taxes are an important source of national income and their rates vary from country to country.	0.0001
The government launched a new renewable energy project in 2024.	0.0001

Example 2

Question: What is the interpretation of the verse "And We made from water every living thing"?

Text	Score
The verse means that water is an essential element in the life of all living things and is necessary for the continuation of life.	0.9996
Planets outside the solar system containing frozen water have been discovered.	0.0000
The Holy Quran mentions lightning and thunder in several different places.	0.0000

Example 3

Question: What are the benefits of vitamin D?

Text	Score
Vitamin D helps strengthen bone health and the immune system and plays an important role in calcium absorption.	0.9991
Vitamin D is used as a preservative in some food industries.	0.9941
Vitamin D can be obtained through sun exposure or by taking dietary supplements.	0.9938

📈 Applications

Mizan-Rerank-v1 opens new horizons for Arabic NLP applications:

Specialized Arabic search engines
Archiving systems and digital libraries
Conversational AI applications
E-learning platforms
Information retrieval systems

📝 Citation

If you use Mizan-Rerank-v1 in your research, please cite:

@software{Mizan_Rerank_v1_2025,
  author = {Ali Aljiachi},
  title = {Mizan-Rerank-v1: A Revolutionary Arabic Text Reranking Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/Aljiachi/Mizan-Rerank-v1}
}

@misc{modernbert,
      title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
      author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
      year={2024},
      eprint={2412.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.13663}, 
}

📄 License

We release the Mizan-Rerank model model weights under the Apache 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご