R

Ruri V3 310m

Developed by cl-nagoya
Ruri v3 is a general Japanese text embedding model based on ModernBERT-Ja, achieving industry-leading performance in Japanese text embedding tasks and supporting sequences up to 8192 tokens long.
Downloads 3,395
Release Time : 4/9/2025

Model Overview

Ruri v3 is a high-performance Japanese text embedding model specifically designed for processing Japanese text, suitable for various scenarios such as semantic encoding, classification/clustering/topic encoding, and retrieval tasks.

Model Features

Long Sequence Support
Supports sequences up to 8192 tokens, a significant improvement over the previous version (512 tokens).
Expanded Vocabulary
Vocabulary expanded to 100K tokens (previously 32K), which can shorten input sequences and improve efficiency.
FlashAttention Technology
Integrated FlashAttention technology for faster inference and fine-tuning.
Pure SentencePiece Tokenizer
Only requires SentencePiece for tokenization, eliminating the need for external word segmentation tools.

Model Capabilities

Japanese Text Embedding
Sentence Similarity Calculation
Semantic Encoding
Topic Encoding
Retrieval Task Processing

Use Cases

Information Retrieval
Document Retrieval
Use '検索クエリ:' (search query) and '検索文書:' (search document) prefixes for retrieval tasks.
Scored 81.89 in retrieval tasks on JMTEB evaluation.
Text Classification
Topic Classification
Use 'トピック:' (topic) prefix for classification/clustering/topic encoding.
Scored 78.66 in classification tasks on JMTEB evaluation.
Semantic Analysis
Sentence Similarity Calculation
Use an empty string prefix for semantic encoding.
Scored 81.22 in STS tasks on JMTEB evaluation.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase