Tooka SBERT V2 Small
Tooka-SBERT-V2-Small is a trained sentence transformer model for semantic text similarity and embedding tasks. It can map sentences and paragraphs to a dense vector space where semantically similar texts are close to each other.
Downloads 110
Release Time : 5/13/2025
Model Overview
This model is specifically designed to handle semantic similarity and embedding tasks for Persian texts, and its performance is optimized through two-stage training (pretraining and fine-tuning).
Model Features
Two-stage training
The model goes through two stages of pretraining and fine-tuning, which are optimized on the Targoman News dataset and multiple synthetic datasets respectively.
Asymmetric input processing
It supports adding specific prefixes (such as 'سوال:' and 'متن:') before input to distinguish different types of texts and optimize semantic understanding.
Efficient performance
It performs excellently on the PTEB Benchmark, and its average performance is better than that of the mE5-Base model.
Model Capabilities
Semantic text similarity calculation
Text embedding generation
Persian text processing
Use Cases
Information retrieval
Document retrieval
Use the embeddings generated by the model for document similarity search
It performs well on datasets such as MIRACLRetrieval
Text classification
Sentiment analysis
Use text embeddings for sentiment classification
It is effective in tasks such as PersianFoodSentimentClassification
Re-ranking
Search result optimization
Perform semantic re-ranking on the initial retrieval results
It performs excellently in tasks such as WikipediaRerankingMultilingual
Featured Recommended AI Models