A

Aramodernbert Base V1.0

Developed by NAMAA-Space
AraModernBert is an advanced Arabic language model built on the ModernBERT architecture, combining Transformer design innovations with large-scale training on 100GB of Arabic text.
Downloads 660
Release Time : 2/1/2025

Model Overview

This model is specifically designed for Arabic language understanding, suitable for various NLP tasks such as text embedding, information retrieval, and text classification.

Model Features

Cross-tokenization techniques
Utilizes cross-tokenization techniques to optimize the initialization of embedding layers for MLM tasks, enhancing model performance.
Long-context support
Supports a context window of 8,192 tokens, making it suitable for processing long texts.
Dedicated Arabic tokenizer
Uses a custom tokenizer with a vocabulary of 50,280 words, specifically optimized for Arabic language processing.
Alternating attention mechanism
Features a hybrid attention architecture with global attention every 3 layers and a local window of 128 tokens.

Model Capabilities

Arabic text understanding
Masked language modeling
Semantic text similarity computation
Text classification
Named entity recognition

Use Cases

Text analysis
Semantic text similarity
Computes the semantic similarity between two Arabic texts.
STS17: 0.831, STS22: 0.617
Text classification
Classifies Arabic texts.
Accuracy 94.32%, F1 score 94.31%
Information retrieval
Retrieval-Augmented Generation (RAG)
Used as a retrieval component for Arabic question-answering systems.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase