bert-base-arabic-camelbert-da-poetry开源模型 - 精准进行方言阿拉伯语诗歌分类

首页

Bert Base Arabic Camelbert Da Poetry

由 CAMeL-Lab 开发

基于CAMeLBERT方言阿拉伯语模型微调的诗歌分类模型，使用APCD数据集训练

文本分类

Transformers

阿拉伯语开源协议:Apache-2.0 #阿拉伯诗歌分类 #方言阿拉伯语处理 #双行诗体识别

下载量 16

发布时间 : 3/2/2022

模型简介

用于阿拉伯语诗歌分类的BERT模型，能够识别诗歌风格类型

模型特点

方言阿拉伯语优化

基于方言阿拉伯语(DA)专用预训练模型微调，对阿拉伯诗歌有更好理解

诗歌风格识别

能够准确区分不同风格的阿拉伯诗歌，如简约体和流畅体

双行诗处理

专门设计用于处理由[SEP]标记连接的双行诗句输入

模型能力

阿拉伯语文本分类

诗歌风格识别

双行诗分析

使用案例

文学分析

古典诗歌分类

对阿拉伯古典诗歌进行风格分类

在APCD数据集上表现优异

教育应用

辅助阿拉伯文学教学中的诗歌分析

🚀 CAMeLBERT-DA诗歌分类模型

CAMeLBERT-DA诗歌分类模型 是一个诗歌分类模型，它通过微调 CAMeLBERT方言阿拉伯语（DA）模型构建而成。该模型可用于对阿拉伯语诗歌进行分类，为相关的自然语言处理任务提供支持。

🚀 快速开始

你可以将CAMeLBERT - DA诗歌分类模型作为transformers管道的一部分使用。该模型很快也将在 CAMeL Tools 中可用。

如何使用

要使用transformers管道来使用该模型：

>>> from transformers import pipeline
>>> poetry = pipeline('text-classification', model='CAMeL-Lab/bert-base-arabic-camelbert-da-poetry')
>>> # A list of verses where each verse consists of two parts.
>>> verses = [
        ['الخيل والليل والبيداء تعرفني' ,'والسيف والرمح والقرطاس والقلم'],
        ['قم للمعلم وفه التبجيلا' ,'كاد المعلم ان يكون رسولا']
    ]
>>> # A function that concatenates the halves of each verse by using the [SEP] token.
>>> join_verse = lambda half: ' [SEP] '.join(half)
>>> # Apply this to all the verses in the list.
>>> verses = [join_verse(verse) for verse in verses]
>>> poetry(sentences)
[{'label': 'البسيط', 'score': 0.9874765276908875},
 {'label': 'السلسلة', 'score': 0.6877778172492981}]

⚠️ 重要提示

要下载我们的模型，你需要 transformers>=3.5.0。否则，你可以手动下载模型。

✨ 主要特性

基于微调：通过微调 CAMeLBERT方言阿拉伯语（DA）模型构建，利用了预训练模型的优势。
特定数据集训练：在 APCD 数据集上进行微调，使模型更适合诗歌分类任务。
多途径使用：既可以通过transformers管道使用，也将很快集成到 CAMeL Tools 中。

📚 详细文档

模型描述

CAMeLBERT - DA诗歌分类模型 是通过微调 CAMeLBERT方言阿拉伯语（DA）模型构建的诗歌分类模型。在微调过程中，我们使用了 APCD 数据集。

我们的微调过程和所使用的超参数可以在我们的论文 "The Interplay of Variant, Size, and Task Type in Arabic Pre - trained Language Models" 中找到。我们的微调代码可以在这里找到。

引用

@inproceedings{inoue-etal-2021-interplay,
    title = "The Interplay of Variant, Size, and Task Type in {A}rabic Pre-trained Language Models",
    author = "Inoue, Go  and
      Alhafni, Bashar  and
      Baimukan, Nurpeiis  and
      Bouamor, Houda  and
      Habash, Nizar",
    booktitle = "Proceedings of the Sixth Arabic Natural Language Processing Workshop",
    month = apr,
    year = "2021",
    address = "Kyiv, Ukraine (Online)",
    publisher = "Association for Computational Linguistics",
    abstract = "In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models. To do so, we build three pre-trained language models across three variants of Arabic: Modern Standard Arabic (MSA), dialectal Arabic, and classical Arabic, in addition to a fourth language model which is pre-trained on a mix of the three. We also examine the importance of pre-training data size by building additional models that are pre-trained on a scaled-down set of the MSA variant. We compare our different models to each other, as well as to eight publicly available models by fine-tuning them on five NLP tasks spanning 12 datasets. Our results suggest that the variant proximity of pre-training data to fine-tuning data is more important than the pre-training data size. We exploit this insight in defining an optimized system selection model for the studied tasks.",
}