AraBertMo_base_V2 Open-Source Arabic Language Model - Free Support for Masked Language Modeling Tasks

Home

Arabertmo Base V2

Developed by Ebtihal

Arabic pre-trained language model based on BERT architecture, supporting masked language modeling tasks

Large Language Model

Transformers

Arabic#Arabic Masked Language Modeling #BERT Architecture Optimization #Trained on OSCAR Corpus

Downloads 17

Release Time : 3/2/2022

Model Overview

AraBERTMo is an Arabic pre-trained language model based on Google's BERT architecture, primarily used for Arabic text masked language modeling tasks.

Model Features

Arabic Language Optimization

Specifically pre-trained and optimized for Arabic text

BERT Architecture

Based on Google's BERT-base architecture with powerful language understanding capabilities

Multi-version Support

Provides 10 different model variants to accommodate various needs

Model Capabilities

Arabic text understanding

Masked language modeling prediction

Contextual semantic analysis

Use Cases

Text Processing

Arabic Text Completion

Automatically completes missing parts in Arabic text

Arabic Grammar Checking

Identifies and corrects grammatical errors in Arabic text

🚀 Arabic BERT Model

AraBERTMo is an Arabic pre - trained language model based on Google's BERT architecture, offering multiple variants and achieving good results in tasks like Fill - Mask.

🚀 Quick Start

You can use this model by installing torch or tensorflow and Huggingface library transformers. And you can use it directly by initializing it like this:

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Ebtihal/AraBertMo_base_V2")
model = AutoModelForMaskedLM.from_pretrained("Ebtihal/AraBertMo_base_V2")

✨ Features

Multiple Variants: AraBERTMo_base now comes in 10 new variants.
Available on HuggingFace: All models are available on the HuggingFace model page under the Ebtihal name.
PyTorch Format: Checkpoints are available in PyTorch formats.

📦 Installation

To use this model, you need to install torch or tensorflow and the Huggingface library transformers.

📚 Documentation

Pretraining Corpus

The AraBertMo_base_V2 model was pre - trained on ~3 million words:

OSCAR - Arabic version "unshuffled_deduplicated_ar".

Training Results

This model achieves the following results:

Task	Num examples	Num Epochs	Batch Size	steps	Wall time	training loss
Fill - Mask	20020	2	64	626	19m 2s	8.437

Model Usage

You can use this model by initializing it as shown in the code example in the "Quick Start" section.

Research Background

This model was built for master's degree research in an organization:

University of kufa.
Faculty of Computer Science and Mathematics.
Department of Computer Science

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("Ebtihal/AraBertMo_base_V2")
model = AutoModelForMaskedLM.from_pretrained("Ebtihal/AraBertMo_base_V2")

📄 License

No license information provided in the original document.

Additional Information

Language: Arabic
Tags: Fill - Mask
Datasets: OSCAR
Widget Examples:
- " السلام عليكم ورحمة[MASK] وبركاتة"
- " اهلا وسهلا بكم في [MASK] من سيربح المليون"
- " مرحبا بك عزيزي الزائر [MASK] موقعنا "

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご