Ruri V3 130m
Ruri v3 is a Japanese general text embedding model based on ModernBERT-Ja, achieving state-of-the-art performance in Japanese text embedding tasks, supporting sequences up to 8192 tokens.
Downloads 597
Release Time : 4/9/2025
Model Overview
Ruri v3 is a high-performance Japanese text embedding model designed for tasks such as Japanese text similarity calculation, retrieval, and classification.
Model Features
Ultra-Long Sequence Support
Supports sequences up to 8192 tokens, a significant improvement over the previous version (512 tokens)
Expanded Vocabulary
Vocabulary expanded to 100K tokens, a significant increase from the previous version (32K), resulting in shorter input sequences and improved efficiency
High-Performance Architecture
Integrates FlashAttention and adopts the ModernBERT architecture, enabling faster inference and fine-tuning
Simplified Tokenization
Uses only SentencePiece for tokenization, eliminating the need for external tokenization tools
Model Capabilities
Japanese text embedding
Sentence similarity calculation
Text retrieval
Text classification
Text clustering
Semantic analysis
Use Cases
Information Retrieval
Document Retrieval
Uses model embeddings for documents and queries to achieve efficient semantic retrieval
Achieved a high score of 81.89 in the JMTEB retrieval task
Text Analysis
Sentence Similarity Calculation
Calculates the semantic similarity between two Japanese sentences
Achieved a score of 79.25 in the JMTEB STS task
Text Classification
Classifies Japanese text
Achieved a score of 77.16 in the JMTEB classification task
Featured Recommended AI Models