R

Ruri V3 30m

Developed by cl-nagoya
Ruri v3 is a Japanese general-purpose text embedding model based on ModernBERT-Ja, supporting sequence processing of up to 8192 tokens and delivering top-tier performance in Japanese text embedding tasks.
Downloads 1,135
Release Time : 4/7/2025

Model Overview

Ruri v3 is a Japanese general-purpose text embedding model primarily used for sentence similarity computation and feature extraction, supporting encoding of various text types.

Model Features

Long Sequence Processing
Supports sequence processing of up to 8192 tokens, a significant improvement over the previous version (512 tokens).
Expanded Vocabulary
100K token expanded vocabulary (previously 32K), which shortens input sequences and improves efficiency.
FlashAttention Technology
Incorporates FlashAttention technology for faster inference and fine-tuning speeds.
Pure SentencePiece Tokenizer
No external tokenization tools required; tokenization can be completed using only SentencePiece.

Model Capabilities

Japanese Text Embedding
Sentence Similarity Computation
Feature Extraction
Long Text Processing

Use Cases

Text Retrieval
Document Retrieval
Use the '検索文ド' prefix to encode documents for retrieval, enabling efficient document retrieval.
Query Retrieval
Use the '検索クエリ' prefix to encode query statements, improving retrieval accuracy.
Text Classification
Topic Classification
Use the 'トピック' prefix to encode text for topic classification and clustering.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase