R

Ruri V3 70m

Developed by cl-nagoya
Ruri v3 is a Japanese general-purpose text embedding model based on ModernBERT-Ja, supporting sequences up to 8192 tokens long and achieving state-of-the-art performance in Japanese text embedding tasks.
Downloads 865
Release Time : 4/9/2025

Model Overview

Ruri v3 is a high-performance Japanese text embedding model designed for tasks such as Japanese text similarity, retrieval, and classification. It employs a pure SentencePiece tokenizer, supports long sequence processing, and integrates FlashAttention technology for improved efficiency.

Model Features

Long Sequence Support
Supports processing sequences up to 8192 tokens, far exceeding the previous 512-token limit.
Expanded Vocabulary
Vocabulary expanded to 100,000 tokens (previously 32,000), improving processing efficiency.
FlashAttention Integration
Incorporates FlashAttention technology from the ModernBERT architecture for faster inference and fine-tuning.
Pure SentencePiece Tokenization
Eliminates the need for external tokenization tools, simplifying preprocessing.
Multi-Task Prefix Scheme
Uses a 1+3 prefix scheme to distinguish different text input types (semantic, topic, query, document).

Model Capabilities

Japanese Text Embedding
Sentence Similarity Calculation
Text Retrieval
Text Classification
Clustering Analysis
Re-ranking Tasks

Use Cases

Information Retrieval
Document Retrieval
Build efficient retrieval systems using 'search query' and 'search document' prefixes.
Achieved 79.96 points in JMTEB retrieval tasks.
Text Analysis
Topic Classification
Use 'topic' prefix for text topic encoding and classification.
Achieved 76.97 points in JMTEB classification tasks.
Semantic Analysis
Sentence Similarity Calculation
Calculate semantic similarity between two Japanese sentences.
Achieved 79.82 points in JMTEB STS tasks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase