R

Ruri V3 130m

Developed by cl-nagoya
Ruri v3 is a Japanese general text embedding model based on ModernBERT-Ja, achieving state-of-the-art performance in Japanese text embedding tasks, supporting sequences up to 8192 tokens.
Downloads 597
Release Time : 4/9/2025

Model Overview

Ruri v3 is a high-performance Japanese text embedding model designed for tasks such as Japanese text similarity calculation, retrieval, and classification.

Model Features

Ultra-Long Sequence Support
Supports sequences up to 8192 tokens, a significant improvement over the previous version (512 tokens)
Expanded Vocabulary
Vocabulary expanded to 100K tokens, a significant increase from the previous version (32K), resulting in shorter input sequences and improved efficiency
High-Performance Architecture
Integrates FlashAttention and adopts the ModernBERT architecture, enabling faster inference and fine-tuning
Simplified Tokenization
Uses only SentencePiece for tokenization, eliminating the need for external tokenization tools

Model Capabilities

Japanese text embedding
Sentence similarity calculation
Text retrieval
Text classification
Text clustering
Semantic analysis

Use Cases

Information Retrieval
Document Retrieval
Uses model embeddings for documents and queries to achieve efficient semantic retrieval
Achieved a high score of 81.89 in the JMTEB retrieval task
Text Analysis
Sentence Similarity Calculation
Calculates the semantic similarity between two Japanese sentences
Achieved a score of 79.25 in the JMTEB STS task
Text Classification
Classifies Japanese text
Achieved a score of 77.16 in the JMTEB classification task
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase