E5-all-nli-triplet-Matryoshka Open Source Model - A Sentence Vector Mapping Tool for Semantic Similarity and Search

E5 All Nli Triplet Matryoshka

Developed by Omartificial-Intelligence-Space

This is a sentence-transformers model fine-tuned on intfloat/multilingual-e5-small, designed to map sentences and paragraphs into a 384-dimensional dense vector space, supporting tasks such as semantic text similarity and semantic search.

Text Embedding

Safetensors

#Multilingual sentence embedding #Semantic similarity calculation #Arabic optimization

Downloads 14

Release Time : 7/15/2024

Model Overview

This model is specifically designed for semantic representation of sentences and paragraphs, capable of generating high-quality embedding vectors suitable for various natural language processing tasks.

Model Features

Multilingual support

Based on the multilingual-e5-small model, it supports text processing in multiple languages.

Efficient semantic representation

Converts text into 384-dimensional dense vectors, capturing deep semantic information.

MatryoshkaLoss training

Trained using MatryoshkaLoss and MultipleNegativesRankingLoss to optimize representation capabilities across different dimensions.

High performance

Demonstrates outstanding performance on multiple evaluation datasets, with Spearman cosine similarity reaching up to 0.7972.

Model Capabilities

Calculate sentence similarity

Semantic search

Text feature extraction

Text classification

Text clustering

Paraphrase mining

Use Cases

Information retrieval

Document retrieval

Quickly retrieve relevant documents based on query semantics

Achieved a score of 33.441 on the MTEB MIRACLRetrievalHardNegatives (ar) dataset

Question answering system

Match user questions with answers in the knowledge base

Achieved a score of 64.488 on the MTEB MLQARetrieval (ara-ara) dataset

Text analysis

Semantic similarity calculation

Compare the semantic similarity between two sentences or paragraphs

Spearman cosine similarity on the sts-test-384 dataset is 0.7972

Text clustering

Automatically group semantically similar texts

🚀 SentenceTransformer based on intfloat/multilingual-e5-small

This is a Sentence Transformer model fine - tuned from intfloat/multilingual-e5-small. It maps sentences and paragraphs to a 384 - dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

🚀 Quick Start

This model can be used directly for semantic tasks after installation. First, you need to install the Sentence Transformers library, and then load the model for inference.

✨ Features

Semantic Mapping: Maps sentences and paragraphs to a 384 - dimensional dense vector space.
Multiple Applications: Can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, etc.

📦 Installation

First install the Sentence Transformers library:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Omartificial-Intelligence-Space/E5-Matro")
# Run inference
sentences = [
    'يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة شابة.',
    'ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه',
    'الشاب نائم بينما الأم تقود ابنتها إلى الحديقة',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

📚 Documentation

Model Details

Model Description

Property	Details
Model Type	Sentence Transformer
Base model	intfloat/multilingual-e5-small
Maximum Sequence Length	512 tokens
Output Dimensionality	384 tokens
Similarity Function	Cosine Similarity
Training Dataset	Omartificial-Intelligence-Space/arabic-n_li-triplet

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Evaluation

Metrics

Semantic Similarity (Dataset: `sts-test-384`)

Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7883
spearman_cosine	0.7972
pearson_manhattan	0.7846
spearman_manhattan	0.794
pearson_euclidean	0.7883
spearman_euclidean	0.7972
pearson_dot	0.7883
spearman_dot	0.7972
pearson_max	0.7883
spearman_max	0.7972

Semantic Similarity (Dataset: `sts-test-256`)

Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7852
spearman_cosine	0.7968
pearson_manhattan	0.7853
spearman_manhattan	0.7936
pearson_euclidean	0.7882
spearman_euclidean	0.7963
pearson_dot	0.7786
spearman_dot	0.7868
pearson_max	0.7882
spearman_max	0.7968

Semantic Similarity (Dataset: `sts-test-128`)

Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7755
spearman_cosine	0.7933
pearson_manhattan	0.7833
spearman_manhattan	0.7908
pearson_euclidean	0.7868
spearman_euclidean	0.7936
pearson_dot	0.7317
spearman_dot	0.7336
pearson_max	0.7868
spearman_max	0.7936

Semantic Similarity (Dataset: `sts-test-64`)

Evaluated with EmbeddingSimilarityEvaluator

Metric	Value
pearson_cosine	0.7625
spearman_cosine	0.7837
pearson_manhattan	0.7753
spearman_manhattan	0.7791
pearson_euclidean	0.778
spearman_euclidean	0.7816
pearson_dot	0.6685
spearman_dot	0.6621
pearson_max	0.778
spearman_max	0.7837

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

E5 All Nli Triplet Matryoshka

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 SentenceTransformer based on intfloat/multilingual-e5-small

🚀 Quick Start

✨ Features

📦 Installation

💻 Usage Examples

Basic Usage

📚 Documentation

Model Details

Model Description

Model Sources

Full Model Architecture

Evaluation

Metrics

Semantic Similarity (Dataset: sts-test-384)

Semantic Similarity (Dataset: sts-test-256)

Semantic Similarity (Dataset: sts-test-128)

Semantic Similarity (Dataset: sts-test-64)

Semantic Similarity (Dataset: `sts-test-384`)

Semantic Similarity (Dataset: `sts-test-256`)

Semantic Similarity (Dataset: `sts-test-128`)

Semantic Similarity (Dataset: `sts-test-64`)