S

Sarashina Embedding V1 1b

Developed by sbintuitions
A text embedding model developed based on a 1.2 billion parameter Japanese large language model, excelling in JMTEB benchmark tests
Downloads 23.85k
Release Time : 11/22/2024

Model Overview

The Sarashina Embedding Model v1-1B is a text embedding model based on a Japanese large language model, capable of mapping sentences and paragraphs into a 1792-dimensional dense vector space, suitable for various scenarios such as semantic text similarity calculation and semantic search

Model Features

High-dimensional dense vector
Outputs 1792-dimensional dense vectors, capable of capturing semantic information more finely
Long text support
Supports processing of long texts up to 8192 tokens
Multi-stage training
Enhances model performance through two-stage training with weakly supervised learning and supervised fine-tuning
Japanese optimization
Specifically optimized for Japanese text, demonstrating excellent performance in JMTEB benchmark tests

Model Capabilities

Semantic text similarity calculation
Semantic search
Paraphrase mining
Text classification
Clustering analysis

Use Cases

Information retrieval
Document retrieval
Quickly retrieves relevant documents based on query semantics
Scored 77.61 in JMTEB retrieval tasks
Text analysis
Text similarity calculation
Calculates the semantic similarity between two texts
Scored 82.71 in JMTEB semantic similarity tasks
Text clustering
Automatically groups semantically similar texts
Scored 53.86 in JMTEB clustering tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase