B

Bert Base Arabertv2

Developed by aubmindlab
AraBERT is an Arabic pre-trained language model based on the BERT architecture, optimized for Arabic language understanding tasks, with multiple variant versions.
Downloads 24.20k
Release Time : 3/2/2022

Model Overview

AraBERT is a pre-trained language model specifically designed for Arabic, based on Google's BERT architecture, and excels in various Arabic NLP tasks.

Model Features

Arabic-optimized Segmentation
Uses the Farasa segmenter for pre-segmentation of Arabic prefixes and suffixes to improve language understanding accuracy.
Large-scale Training Data
Trained on 77GB of Arabic data (200 million sentences/8.6 billion words), sourced from authoritative corpora like Wikipedia and OSCAR.
Multi-version Support
Offers base and large versions, as well as variants with or without pre-segmentation, catering to different application needs.
Outstanding Downstream Task Performance
Surpasses baseline models like mBERT in various Arabic NLP tasks such as sentiment analysis, NER, and question answering.

Model Capabilities

Arabic text understanding
Sentiment analysis
Named entity recognition
Question answering systems
Text classification

Use Cases

Sentiment Analysis
Arabic Social Media Sentiment Analysis
Analyzes sentiment tendencies in Arabic social media texts.
Performs excellently on datasets like HARD and ASTD.
Information Extraction
Arabic Named Entity Recognition
Identifies entities such as person names and locations in Arabic texts.
Achieves good results on the ANERcorp dataset.
Question Answering Systems
Arabic Reading Comprehension
Answers questions based on Arabic articles.
Performs well on datasets like Arabic-SQuAD and ARCD.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase