B

Bert Base Arabic Camelbert Msa

Developed by CAMeL-Lab
CAMeLBERT is a collection of pre-trained models for Arabic NLP tasks. This model is the Modern Standard Arabic (MSA) variant, trained on 12.6 billion tokens
Downloads 1,212
Release Time : 3/2/2022

Model Overview

BERT model pre-trained on Modern Standard Arabic text, supporting masked language modeling and fine-tuning for downstream NLP tasks

Model Features

Multi-dialect support
Provides dedicated models for Classical Arabic (CA), Dialectal Arabic (DA), and Modern Standard Arabic (MSA) variants
Scalable data size
Offers pre-trained models ranging from full data to 1/16 data sizes to accommodate different computational needs
Specialized preprocessing
Employs Arabic-specific preprocessing pipeline including diacritic handling and character normalization

Model Capabilities

Arabic text understanding
Masked language modeling
Named entity recognition
POS tagging
Sentiment analysis
Dialect identification

Use Cases

Text analysis
Arabic news classification
Topic classification for MSA news texts
Achieved 93% F1 score on ArSAS dataset
Linguistic research
Classical poetry classification
Identifying period and style of classical Arabic poetry
80.9% accuracy on APCD dataset (best for CA variant)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase