AraModernBert-Base-STS Open Source Arabic Model - Easily Calculate Semantic Similarity and Generate Text Embeddings

Aramodernbert Base STS

Developed by NAMAA-Space

This is an Arabic sentence transformer model fine-tuned based on AraModernBert-Base-V1.0, excelling in semantic similarity computation and text embedding generation.

Text Embedding

Safetensors

ArabicOpen Source License:Apache-2.0 #Arabic Semantic Similarity #768-dimensional Dense Vectors #MTEB Benchmark Optimization

Downloads 118

Release Time : 3/9/2025

Model Overview

The model generates 768-dimensional dense vectors suitable for tasks such as semantic similarity computation, search, paraphrase mining, text clustering, and classification, with special optimization for Arabic text processing.

Model Features

Powerful Arabic Embeddings

768-dimensional dense vector representations specifically optimized for Arabic.

Efficient Semantic Understanding

Trained with multiple negative ranking loss to enhance semantic similarity computation accuracy.

Multi-task Adaptability

Supports various downstream applications such as search, clustering, and classification.

Model Capabilities

Semantic Similarity Computation

Text Embedding Generation

Arabic Text Processing

Cross-language Semantic Matching

Use Cases

Information Retrieval

Intelligent Search Engine

Build a semantic-based Arabic search engine rather than keyword-based.

Improves search result relevance and accuracy.

Dialogue Systems

Arabic Chatbot

Enhance semantic understanding in dialogue systems.

Improves dialogue coherence and contextual understanding.

Knowledge Management

Document Clustering

Semantic clustering of Arabic documents.

Automatically discovers related document collections.

🚀 SentenceTransformer based on NAMAA-Space/AraModernBert-Base-V1.0

This SentenceTransformer is fine - tuned from [NAMAA - Space/AraModernBert - Base - V1.0](https://huggingface.co/NAMAA - Space/AraModernBert - Base - V1.0), offering powerful Arabic embeddings suitable for various use cases.

This SentenceTransformer provides 768 - dimensional dense vectors. It excels in semantic similarity, search, paraphrase mining, clustering, text classification, and more. It is optimized for speed and efficiency without sacrificing performance. Whether you're building intelligent search engines, chatbots, or AI - powered knowledge graphs, this model can deliver precise and in - depth representations of Arabic text. Try it out to take Arabic NLP to the next level! 🔥✨

🚀 Quick Start

This SentenceTransformer is fine - tuned from [NAMAA - Space/AraModernBert - Base - V1.0](https://huggingface.co/NAMAA - Space/AraModernBert - Base - V1.0), offering strong Arabic embeddings useful for multiple use cases.

✨ Features

🔹 768 - dimensional dense vectors 🎯
🔹 Excels in: Semantic Similarity, Search, Paraphrase Mining, Clustering, Text Classification & More!
🔹 Optimized for speed & efficiency without sacrificing performance

📦 Installation

First, you need to install the Sentence Transformers library. You can do this using the following command:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

After installing the library, you can load the model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("NAMAA-Space/AraModernBert-Base-STS")
# Run inference
sentences = [
    'الذكاء الاصطناعي يغير طريقة تفاعلنا مع التكنولوجيا.',
    'التكنولوجيا تتطور بسرعة بفضل الذكاء الاصطناعي.',
    'الذكاء الاصطناعي يسهم في تطوير التطبيقات الذكية.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

📚 Documentation

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Evaluation

Metrics

Semantic Similarity

Datasets: STS17 and STS22.v2
Evaluated with EmbeddingSimilarityEvaluator

Metric	STS17	STS22.v2
pearson_cosine	0.8249	0.5259
spearman_cosine	0.831	0.6169

Framework Versions

Property	Details
Python	3.10.12
Sentence Transformers	3.4.1
Transformers	4.49.0
PyTorch	2.1.0+cu118
Accelerate	1.4.0
Datasets	2.21.0
Tokenizers	0.21.0

📄 License

This project is licensed under the apache - 2.0 license.

📄 Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al - Rfou and Brian Strope and Yun - hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご