Ruri Large
Ruri-Large is a high-performance embedding model specialized in Japanese text similarity calculation, based on transformer architecture with support for long text processing (maximum length 8192).
Downloads 6,784
Release Time : 8/28/2024
Model Overview
This model is primarily used for semantic similarity calculation and feature extraction of Japanese texts, demonstrating outstanding performance in the JMTEB benchmark. It is specially optimized for distinguishing between query and passage texts, requiring the addition of 'クエリ:' (query:) or '文章:' (passage:) prefixes before input.
Model Features
High-performance Japanese Processing
Achieved an average score of 73.31 in the JMTEB benchmark, outperforming similar Japanese embedding models
Long Text Support
Supports text processing up to 8192 tokens, suitable for long document analysis
Query/Passage Distinction
Optimizes retrieval effectiveness by distinguishing query texts from passage texts through prefix markers
Model Capabilities
Japanese Text Embedding
Semantic Similarity Calculation
Text Feature Extraction
Information Retrieval
Text Clustering
Use Cases
Information Retrieval
Q&A Systems
Enables precise question answering by calculating similarity between queries and knowledge base passages
Achieved 73.02 points in JMTEB retrieval tasks
Content Analysis
Text Clustering
Performs semantic clustering analysis on large volumes of Japanese texts
Achieved 51.82 points in JMTEB clustering tasks
Featured Recommended AI Models