T

Tookabert Base

Developed by PartAI
TookaBERT is a family of encoder models trained on Persian, including base and large versions, suitable for various natural language processing tasks.
Downloads 127
Release Time : 4/29/2024

Model Overview

The TookaBERT model is a family of encoder models trained on Persian, suitable for masked language modeling tasks, supporting various downstream tasks such as sentiment analysis, text classification, multiple-choice, question answering, and named entity recognition.

Model Features

Multi-topic Pretraining
Pretrained on over 500GB of Persian data, covering various topics such as news, blogs, forums, and books.
Masked Language Modeling
Pretrained using the Whole Word Masking (WWM) objective, supporting masked language modeling tasks.
Multi-task Support
Supports various downstream tasks, including sentiment analysis, text classification, multiple-choice, question answering, and named entity recognition.

Model Capabilities

Masked Language Modeling
Sentiment Analysis
Text Classification
Multiple-Choice
Question Answering
Named Entity Recognition

Use Cases

Sentiment Analysis
DeepSentiPers
Used for Persian sentiment analysis tasks
f1/acc: 85.66/85.78 (TookaBERT-large)
Named Entity Recognition
MultiCoNER-v2
Used for Persian named entity recognition tasks
f1/acc: 69.69/94.07 (TookaBERT-large)
Question Answering
PQuAD
Used for Persian question answering tasks
best_exact/best_f1/HasAns_exact/HasAns_f1: 75.56/88.06/70.24/87.83 (TookaBERT-large)
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase