B

Bert Base Arabertv01

Developed by aubmindlab
An Arabic pre - trained language model based on the BERT architecture, supporting multiple Arabic NLP tasks
Downloads 293
Release Time : 3/2/2022

Model Overview

AraBERT is an Arabic pre - trained language model based on Google's BERT architecture, specifically designed for Arabic natural language understanding tasks. The model has two versions, v0.1 and v1. The main difference is that the v1 version uses the Farasa tokenizer to perform prefix/suffix segmentation pre - processing on the text.

Model Features

Arabic optimization
Optimized specifically for Arabic characteristics, including character sets and word segmentation processing
Multi - version support
Two versions, v0.1 and v1, are provided. The v1 version uses the Farasa tokenizer for more refined pre - processing
Large - scale pre - training
Trained on an Arabic corpus of 77 million sentences/23GB/2.7 billion words

Model Capabilities

Text masked prediction
Sentiment analysis
Named entity recognition
Question - answering system

Use Cases

Sentiment analysis
Arabic social media sentiment analysis
Analyze the sentiment tendency of Arabic social media posts
Performs excellently on 6 Arabic sentiment analysis datasets such as HARD and ASTD - Balanced
Information extraction
Arabic named entity recognition
Identify entities such as person names and place names in Arabic text
Performs well on the ANERcorp dataset
Question - answering system
Arabic question - answering
Build an Arabic question - answering system
Performs well on the Arabic - SQuAD and ARCD datasets
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase