Ruri V3 Pt 30m
Ruri is a Japanese universal text embedding model based on ModernBERT-Ja, offering versions with different parameter scales suitable for various text processing tasks.
Downloads 250
Release Time : 3/20/2025
Model Overview
Ruri is a Japanese universal text embedding model primarily used for sentence similarity calculation and feature extraction. It is based on the ModernBERT-Ja architecture and supports prefix differentiation for various text types.
Model Features
Multiple Parameter Scale Versions
Offers model versions ranging from 30M to 310M parameters to meet different computational resource needs.
1+3 Prefix Scheme
Uses special prefixes to differentiate text types: empty string for semantic encoding, 'トピック:' for classification/clustering, '検索クエリ:' for search queries, and '検索文書:' for documents to be retrieved.
High Performance
Achieves an average score of 74.51 to 77.24 on the JMTEB benchmark (varies by parameter scale version).
Model Capabilities
Sentence Similarity Calculation
Text Feature Extraction
Semantic Encoding
Classification/Clustering Encoding
Search Query Encoding
Document Retrieval Encoding
Use Cases
Information Retrieval
Document Retrieval
Use '検索クエリ:' and '検索文書:' prefixes to encode queries and documents for efficient retrieval.
Text Analysis
Topic Classification
Use the 'トピック:' prefix to encode text for topic classification.
Semantic Similarity Calculation
Compare embedding vectors of different texts to calculate semantic similarity.
Featured Recommended AI Models