Tavbert Ar
T
Tavbert Ar
Developed by tau
A character-level Arabic BERT-style masked language model pretrained by masking character segments
Downloads 32
Release Time : 4/9/2022
Model Overview
A character-level Arabic BERT-style masked language model pretrained by masking character segments, similar to SpanBERT (Joshi et al., 2020).
Model Features
Character-level Pretraining
Utilizes character-level masked language modeling to better handle the complex morphology of Arabic
SpanBERT-based Approach
Adopts a SpanBERT-like method for pretraining, masking character segments rather than individual tokens
Large-scale Training Data
Trained on the Arabic portion of OSCAR (32GB text, 67 million sentences)
Model Capabilities
Arabic Text Understanding
Masked Character Prediction
Language Model Fine-tuning
Use Cases
Natural Language Processing
Arabic Text Completion
Predicts masked Arabic character segments
Capable of accurately predicting masked character sequences
Downstream Task Fine-tuning
Can serve as a base model for Arabic text classification, question answering, and other tasks
Featured Recommended AI Models