T

Tookabert Large

Developed by PartAI
TookaBERT is a series of encoder models trained on Persian, including a base version and a large - model version. It is pre - trained on over 500GB of Persian data covering various topics.
Downloads 271
Release Time : 4/29/2024

Model Overview

TookaBERT is a pre - trained language model specifically designed for Persian, using the MLM (WWM) objective function and supporting multiple NLP downstream tasks. TookaBERT - Large is the first large encoder model pre - trained on Persian, leading in Persian tasks.

Model Features

Large - scale Persian pre - training
Pre - trained on over 500GB of Persian data covering various topics such as news, blogs, forums, and books.
Two model specifications
Offers a base version and a large - model version to meet different computing resource and performance requirements.
Advanced training objective
Uses the MLM (WWM) objective function and is pre - trained under two context lengths to improve the model's understanding ability.
Leading performance
TookaBERT - Large is the first large encoder model pre - trained on Persian, performing best in multiple Persian NLP tasks.

Model Capabilities

Masked language modeling
Text classification
Sentiment analysis
Named entity recognition
Question - answering system
Multiple - choice tasks
Reading comprehension

Use Cases

Sentiment analysis
DeepSentiPers sentiment analysis
Used for sentiment analysis tasks of Persian texts
F1 score: 85.66, Accuracy: 85.78
Named entity recognition
MultiCoNER - v2 entity recognition
Used for Persian named entity recognition tasks
F1 score: 69.69, Accuracy: 94.07
Question - answering system
PQuAD question - answering task
Used for Persian question - answering tasks
Best exact match: 75.56, Best F1 score: 88.06
Text reasoning
FarsTail text reasoning
Used for Persian text reasoning tasks
F1 score: 89.71, Accuracy: 89.72
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase