Finance2_embedding_small_en-V1.5 Open-source Model - For Financial Semantic Similarity and Search Tasks

Finance2 Embedding Small En V1.5

Developed by baconnier

This is a sentence embedding model fine-tuned on financial datasets based on BAAI/bge-small-en-v1.5, designed for semantic text similarity, semantic search, and related tasks.

Text Embedding

Safetensors

#Financial Semantic Embedding #High-precision Similarity Calculation #Multiple Negative Sample Training

Downloads 2,120

Release Time : 6/9/2024

Model Overview

The model maps sentences and paragraphs into a 384-dimensional dense vector space, particularly suitable for financial text processing tasks such as semantic similarity calculation, text classification, and clustering analysis.

Model Features

Financial Domain Optimization

Fine-tuned on professional financial datasets for better understanding of financial terms and concepts

Efficient Vector Representation

Converts text into 384-dimensional dense vectors, suitable for large-scale semantic search

Multiple Similarity Metrics Support

Supports various similarity calculation methods including cosine, dot product, Manhattan, and Euclidean distances

Model Capabilities

Semantic Text Similarity Calculation

Financial Text Feature Extraction

Semantic Search

Text Classification

Clustering Analysis

Use Cases

Financial Information Retrieval

Financial Q&A System

Used to match user financial questions with the most relevant answers in the knowledge base

High-accuracy semantic matching

Financial Document Processing

Financial Document Clustering

Automatically classifies and organizes large volumes of financial documents

Improved document management efficiency

🚀 SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5 on the baconnier/finance_dataset_small_private dataset. It maps sentences & paragraphs to a 384 - dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

✨ Features

Maps sentences and paragraphs to a 384 - dimensional dense vector space.
Applicable for multiple NLP tasks such as semantic textual similarity, semantic search, etc.

📦 Installation

First install the Sentence Transformers library:

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("baconnier/Finance2_embedding_small_en-V1.5")
# Run inference
sentences = [
    'What is industrial production, and how is it measured by the Federal Reserve Board?',
    'Industrial production is a statistic determined by the Federal Reserve Board that measures the total output of all US factories and mines on a monthly basis. The Fed collects data from various government agencies and trade associations to calculate the industrial production index, which serves as an important economic indicator, providing insight into the health of the manufacturing and mining sectors.\nIndustrial production is a monthly statistic calculated by the Federal Reserve Board, measuring the total output of US factories and mines using data from government agencies and trade associations, serving as a key economic indicator for the manufacturing and mining sectors.',
    'Industrial production is a statistic that measures the output of factories and mines in the US. It is released by the Federal Reserve Board every quarter.\nIndustrial production measures factory and mine output, released quarterly by the Fed.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

📚 Documentation

Model Details

Model Description

Property	Details
Model Type	Sentence Transformer
Base model	BAAI/bge-small-en-v1.5
Maximum Sequence Length	512 tokens
Output Dimensionality	384 tokens
Similarity Function	Cosine Similarity
Training Dataset	baconnier/finance_dataset_small_private

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

🔧 Technical Details

Evaluation

Metrics

Triplet

Dataset: Finance_Embedding_Metric
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.9791
dot_accuracy	0.0209
manhattan_accuracy	0.978
euclidean_accuracy	0.9791
max_accuracy	0.9791

Training Details

Training Dataset

baconnier/finance_dataset_small_private

Dataset: baconnier/finance_dataset_small_private at d7e6492
Size: 15,525 training samples
Columns: anchor, positive, and negative
Approximate statistics based on the first 1000 samples: | | anchor | positive | negative | |:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | string |

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご