bert-base-arabic-camelbert-mix-pos-glf Open-source Model - Free for Pos Tagging of Gulf Arabic

Home

Bert Base Arabic Camelbert Mix Pos Glf

Developed by CAMeL-Lab

Gulf Arabic POS tagging model fine-tuned from CAMeLBERT-Mix, trained on Gumar dataset

Sequence Labeling

Transformers

ArabicOpen Source License:Apache-2.0 #Arabic POS tagging #Gulf dialect processing #Multi-dialect adaptation

Downloads 22

Release Time : 3/2/2022

Model Overview

This is a pre-trained model specifically designed for Gulf Arabic POS tagging, capable of identifying part-of-speech categories for words in text

Model Features

Dialect adaptation capability

Specially optimized for Gulf Arabic dialect, effectively processing dialectal texts

Mixed pre-training

Pre-trained model based on mixed Arabic variants (MSA, dialects and Classical Arabic)

Academic research support

Model development based on systematic academic research, validating the impact of different Arabic variants on NLP tasks

Model Capabilities

Arabic POS tagging

Gulf dialect processing

Natural language understanding

Use Cases

Language processing

Arabic text analysis

POS tagging analysis for Gulf Arabic texts

Accurately identifies noun, verb, preposition and other POS categories

Educational applications

Assisting Arabic learners in understanding grammatical functions of vocabulary

🚀 CAMeLBERT-Mix POS-GLF Model

CAMeLBERT-Mix POS-GLF Model is a Gulf Arabic POS tagging model. It offers accurate part - of - speech tagging for Gulf Arabic text, enhancing natural language processing tasks in this dialect.

🚀 Quick Start

You can use the CAMeLBERT-Mix POS-GLF model as part of the transformers pipeline. This model will also be available in CAMeL Tools soon.

✨ Features

Fine - tuned Model: Built by fine - tuning the [CAMeLBERT - Mix](https://huggingface.co/CAMeL - Lab/bert - base - arabic - camelbert - mix/) model.
Dedicated Dataset: Utilized the [Gumar](https://camel.abudhabi.nyu.edu/annotated - gumar - corpus/) dataset for fine - tuning.
Research - Backed: The fine - tuning procedure and hyperparameters are detailed in the paper "The Interplay of Variant, Size, and Task Type in Arabic Pre - trained Language Models."

📦 Installation

To use the model, you need transformers>=3.5.0 to download our models. Otherwise, you could download the models manually.

💻 Usage Examples

Basic Usage

To use the model with a transformers pipeline:

>>> from transformers import pipeline
>>> pos = pipeline('token - classification', model='CAMeL - Lab/bert - base - arabic - camelbert - mix - pos - glf')
>>> text = 'شلونك ؟ شخبارك ؟'
>>> pos(text)
[{'entity': 'pron_interrog', 'score': 0.82657206, 'index': 1, 'word': 'شلون', 'start': 0, 'end': 4}, {'entity': 'prep', 'score': 0.9771731, 'index': 2, 'word': '##ك', 'start': 4, 'end': 5}, {'entity': 'punc', 'score': 0.9999568, 'index': 3, 'word': '؟', 'start': 6, 'end': 7}, {'entity': 'noun', 'score': 0.9977217, 'index': 4, 'word': 'ش', 'start': 8, 'end': 9}, {'entity': 'noun', 'score': 0.99993783, 'index': 5, 'word': '##خبار', 'start': 9, 'end': 13}, {'entity': 'prep', 'score': 0.5309442, 'index': 6, 'word': '##ك', 'start': 13, 'end': 14}, {'entity': 'punc', 'score': 0.9999575, 'index': 7, 'word': '؟', 'start': 15, 'end': 16}]

📚 Documentation

Model description

CAMeLBERT - Mix POS - GLF Model is a Gulf Arabic POS tagging model that was built by fine - tuning the [CAMeLBERT - Mix](https://huggingface.co/CAMeL - Lab/bert - base - arabic - camelbert - mix/) model. For the fine - tuning, we used the [Gumar](https://camel.abudhabi.nyu.edu/annotated - gumar - corpus/) dataset. Our fine - tuning procedure and the hyperparameters we used can be found in our paper "The Interplay of Variant, Size, and Task Type in Arabic Pre - trained Language Models." Our fine - tuning code can be found [here](https://github.com/CAMeL - Lab/CAMeLBERT).

📄 License

This project is licensed under the Apache - 2.0 license.

📚 Citation

@inproceedings{inoue-etal-2021-interplay,
    title = "The Interplay of Variant, Size, and Task Type in {A}rabic Pre-trained Language Models",
    author = "Inoue, Go  and
      Alhafni, Bashar  and
      Baimukan, Nurpeiis  and
      Bouamor, Houda  and
      Habash, Nizar",
    booktitle = "Proceedings of the Sixth Arabic Natural Language Processing Workshop",
    month = apr,
    year = "2021",
    address = "Kyiv, Ukraine (Online)",
    publisher = "Association for Computational Linguistics",
    abstract = "In this paper, we explore the effects of language variants, data sizes, and fine-tuning task types in Arabic pre-trained language models. To do so, we build three pre-trained language models across three variants of Arabic: Modern Standard Arabic (MSA), dialectal Arabic, and classical Arabic, in addition to a fourth language model which is pre-trained on a mix of the three. We also examine the importance of pre-training data size by building additional models that are pre-trained on a scaled-down set of the MSA variant. We compare our different models to each other, as well as to eight publicly available models by fine-tuning them on five NLP tasks spanning 12 datasets. Our results suggest that the variant proximity of pre-training data to fine-tuning data is more important than the pre-training data size. We exploit this insight in defining an optimized system selection model for the studied tasks.",
}

⚠️ Important Note

To download our models, you would need transformers>=3.5.0. Otherwise, you could download the models manually.

Property	Details
Model Type	Gulf Arabic POS tagging model
Training Data	[Gumar](https://camel.abudhabi.nyu.edu/annotated - gumar - corpus/) dataset

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご