B

Bert Large Arabertv02

Developed by aubmindlab
AraBERT is an Arabic pre-trained language model based on the BERT architecture, optimized for Arabic natural language understanding tasks.
Downloads 2,444
Release Time : 3/2/2022

Model Overview

AraBERT is an Arabic pre-trained language model based on Google's BERT architecture, available in base and large sizes, and supports multiple Arabic NLP tasks. The v2 version improves the preprocessing and vocabulary strategies and uses a larger-scale training dataset.

Model Features

Optimized Arabic preprocessing
Use the Farasa tokenizer for prefix/suffix segmentation preprocessing and improve the handling of numbers and punctuation.
Extended training data
Use an Arabic corpus of 200M sentences/77GB/8.6 billion words, which is 3.5 times larger than the v1 version.
Multiple size options
Provide model variants with two parameter scales: base (136M) and large (371M).
HuggingFace integration
All models are hosted on HuggingFace and support the PyTorch/TensorFlow frameworks.

Model Capabilities

Arabic text understanding
Sentiment analysis
Named entity recognition
Question answering system

Use Cases

Sentiment analysis
Social media sentiment monitoring
Analyze the sentiment tendency of Arabic social media posts.
Perform well on datasets such as HARD and ASTD-Balanced.
Information extraction
Named entity recognition
Identify entities such as person names and place names from Arabic text.
Evaluate on the ANERcorp dataset.
Intelligent question answering
Arabic question answering system
Question answering applications based on the Arabic-SQuAD/ARCD datasets.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase