FinguMv3 Open-source Model - Generate Sentence and Paragraph Vectors for Free to Boost Semantic Similarity Search!

Fingumv3

Developed by FINGU-AI

This is a sentence-transformers model fine-tuned from dunzhang/stella_en_1.5B_v5, designed to generate 1024-dimensional dense vector representations for sentences and paragraphs, suitable for tasks like semantic text similarity and semantic search.

Text Embedding

Safetensors

#Long Text Embedding #High Precision Retrieval #Semantic Similarity

Downloads 26

Release Time : 7/24/2024

Model Overview

The model maps sentences and paragraphs into a 1024-dimensional dense vector space, applicable for semantic text similarity, semantic search, paraphrase mining, text classification, clustering, and other tasks.

Model Features

High-Dimensional Vector Representation

Generates 1024-dimensional dense vector representations capable of capturing rich semantic information

Long Text Processing Capability

Supports sequences up to 8096 tokens, ideal for handling long texts

High-Performance Retrieval

Excels in information retrieval tasks with a cosine accuracy@1 of 94.48%

Multiple Loss Functions

Trained using nested loss and multiple negative ranking loss to enhance model performance

Model Capabilities

Semantic Text Similarity Calculation

Semantic Search

Paraphrase Mining

Text Classification

Text Clustering

Information Retrieval

Use Cases

Information Retrieval

Web Search Query Matching

Retrieves relevant paragraphs based on user queries

Cosine accuracy@1 reaches 94.48%

Text Similarity

Sentence Similarity Calculation

Computes semantic similarity between two sentences

High-dimensional vector representations accurately capture semantic relationships

🚀 SentenceTransformer based on dunzhang/stella_en_1.5B_v5

This is a sentence-transformers model finetuned from dunzhang/stella_en_1.5B_v5. It maps sentences and paragraphs to a 1024 - dimensional dense vector space, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

✨ Features

Maps sentences and paragraphs to a 1024 - dimensional dense vector space.
Applicable for various NLP tasks such as semantic textual similarity, semantic search, etc.

📦 Installation

First, you need to install the Sentence Transformers library:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'The Tchaikovsky Symphony Orchestra is a Russian classical music orchestra established in 1930. It was founded as the Moscow Radio Symphony Orchestra, and served as the official symphony for the Soviet All - Union Radio network. Following the dissolution of the, Soviet Union in 1991, the orchestra was renamed in 1993 by the Russian Ministry of Culture in recognition of the central role the music of Tchaikovsky plays in its repertoire. The current music director is Vladimir Fedoseyev, who has been in that position since 1974.',
    'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Tchaikovsky Symphony Orchestra',
    'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Sierra del Lacandón',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

📚 Documentation

Model Details

Model Description

Property	Details
Model Type	Sentence Transformer
Base model	dunzhang/stella_en_1.5B_v5
Maximum Sequence Length	8096 tokens
Output Dimensionality	1024 tokens
Similarity Function	Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8096, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1536, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

🔧 Technical Details

Evaluation

Metrics

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.9448
cosine_accuracy@3	0.9687
cosine_accuracy@5	0.9764
cosine_accuracy@10	0.9811
cosine_precision@1	0.9448
cosine_precision@3	0.3229
cosine_precision@5	0.1953
cosine_precision@10	0.0981
cosine_recall@1	0.9448
cosine_recall@3	0.9687
cosine_recall@5	0.9764
cosine_recall@10	0.9811
cosine_ndcg@10	0.9637
cosine_mrr@10	0.958

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご