Saudibert
S
Saudibert
Developed by faisalq
SaudiBERT is the first large-scale pre-trained language model specifically focused on Saudi dialect text, trained on a massive corpus of Saudi tweets and forum posts.
Downloads 233
Release Time : 4/1/2024
Model Overview
This model is specifically designed for the Saudi Arabian dialect, suitable for processing social media and forum texts from Saudi Arabia, supporting tasks such as masked language modeling.
Model Features
Specialized for Saudi Dialect
The first large-scale pre-trained language model specifically designed for the Saudi Arabian dialect.
Large-Scale Corpus
Trained on a massive 26.3GB corpus comprising 141 million Saudi tweets and 70 million forum sentences.
Social Media Optimization
Particularly suitable for processing Twitter and forum texts from Saudi Arabia.
Model Capabilities
Saudi Dialect Text Understanding
Masked Language Modeling
Social Media Text Processing
Use Cases
Social Media Analysis
Saudi Twitter Text Completion
Example: 'اللي ما يعرف الصقر [MASK].' (Those who don't know the falcon...)
Can predict missing words in Saudi dialect.
Dialect Research
Saudi Dialect Language Model Research
Used to study linguistic features and usage of the Saudi Arabian dialect.
Featured Recommended AI Models