R

Rosetta Base Ja

Developed by pkshatech
RoSEtta is a general-purpose Japanese text embedding model, excelling in retrieval tasks, supporting sequence lengths of up to 1024 tokens, and suitable for sentence similarity calculation and paragraph retrieval.
Downloads 1,760
Release Time : 8/22/2024

Model Overview

A Japanese text embedding model based on the RoFormer architecture, optimized through distillation and multi-stage contrastive learning, specifically designed for retrieval tasks, supporting long sentence input and CPU operation.

Model Features

Long text processing capability
Supports sequence lengths of up to 1024 tokens, effectively handling long sentence input
Retrieval-optimized design
Performance for retrieval tasks is specifically optimized through multi-stage contrastive learning and distillation training
Efficient inference
Moderate model size (0.2B parameters) allows efficient operation on CPUs
Rotary position encoding
Utilizes RoPE (Rotary Position Encoding) technology to enhance position information processing capability

Model Capabilities

Calculate sentence semantic similarity
Text feature extraction
Query-based paragraph retrieval
Long text semantic understanding

Use Cases

Information retrieval
QA system retrieval
Quickly retrieves the most relevant answer paragraphs to questions in a QA system
Achieves a recall@5 of 79.3 on the MIRACL-ja dataset
Document similarity analysis
Calculates semantic similarity between documents or sentences
Scores 81.39 on the STS task in the JMTEB evaluation
Content management
Duplicate content detection
Identifies duplicate or highly similar content in websites or document collections
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase