T5 Darija Summarization
A dataset containing 19,806 Moroccan Arabic dialect news articles and their headlines for automatic text summarization tasks
Text Generation
Transformers Supports Multiple Languages#Moroccan Arabic Summarization#Dialectal Text Processing#News Auto-Summarization

Downloads 170
Release Time : 3/2/2022
Model Overview
This dataset includes Moroccan Arabic dialect news articles scraped from Goud.ma between 2018-2020, primarily for research on automatic summarization of Moroccan Arabic dialect.
Model Features
Large-Scale Moroccan Dialect Dataset
Contains 19,806 news articles, making it one of the largest Moroccan Arabic dialect summarization datasets available
Bilingual Mixed Content
Articles contain mixed content in Moroccan Arabic dialect (Darija) and Modern Standard Arabic (MSA), with headlines exclusively in Darija
Clear Temporal Scope
All articles were collected between January 1, 2018 and December 31, 2020, ensuring strong data timeliness
Model Capabilities
Moroccan Arabic dialect text summarization
Mixed-language text processing
News content analysis
Use Cases
Natural Language Processing
Moroccan Dialect Summarization Model Training
Using this dataset to train automatic summarization models for Moroccan Arabic dialect
Dialect Linguistics Research
Analyzing grammatical structures and lexical usage characteristics of Moroccan Arabic dialect
News Analysis
Moroccan News Trend Analysis
Analyzing social hot topics in Morocco from 2018-2020 based on the news content in the dataset
Featured Recommended AI Models
Š 2025AIbase