T

T5 Darija Summarization

Developed by Kamel
A dataset containing 19,806 Moroccan Arabic dialect news articles and their headlines for automatic text summarization tasks
Downloads 170
Release Time : 3/2/2022

Model Overview

This dataset includes Moroccan Arabic dialect news articles scraped from Goud.ma between 2018-2020, primarily for research on automatic summarization of Moroccan Arabic dialect.

Model Features

Large-Scale Moroccan Dialect Dataset
Contains 19,806 news articles, making it one of the largest Moroccan Arabic dialect summarization datasets available
Bilingual Mixed Content
Articles contain mixed content in Moroccan Arabic dialect (Darija) and Modern Standard Arabic (MSA), with headlines exclusively in Darija
Clear Temporal Scope
All articles were collected between January 1, 2018 and December 31, 2020, ensuring strong data timeliness

Model Capabilities

Moroccan Arabic dialect text summarization
Mixed-language text processing
News content analysis

Use Cases

Natural Language Processing
Moroccan Dialect Summarization Model Training
Using this dataset to train automatic summarization models for Moroccan Arabic dialect
Dialect Linguistics Research
Analyzing grammatical structures and lexical usage characteristics of Moroccan Arabic dialect
News Analysis
Moroccan News Trend Analysis
Analyzing social hot topics in Morocco from 2018-2020 based on the news content in the dataset
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase