🚀 SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5 on the baconnier/finance_dataset_small_private dataset. It maps sentences & paragraphs to a 384 - dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
✨ Features
- Maps sentences and paragraphs to a 384 - dimensional dense vector space.
- Applicable for multiple NLP tasks such as semantic textual similarity, semantic search, etc.
📦 Installation
First install the Sentence Transformers library:
pip install -U sentence-transformers
💻 Usage Examples
Basic Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("baconnier/Finance2_embedding_small_en-V1.5")
sentences = [
'What is industrial production, and how is it measured by the Federal Reserve Board?',
'Industrial production is a statistic determined by the Federal Reserve Board that measures the total output of all US factories and mines on a monthly basis. The Fed collects data from various government agencies and trade associations to calculate the industrial production index, which serves as an important economic indicator, providing insight into the health of the manufacturing and mining sectors.\nIndustrial production is a monthly statistic calculated by the Federal Reserve Board, measuring the total output of US factories and mines using data from government agencies and trade associations, serving as a key economic indicator for the manufacturing and mining sectors.',
'Industrial production is a statistic that measures the output of factories and mines in the US. It is released by the Federal Reserve Board every quarter.\nIndustrial production measures factory and mine output, released quarterly by the Fed.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
📚 Documentation
Model Details
Model Description
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
🔧 Technical Details
Evaluation
Metrics
Triplet
Metric |
Value |
cosine_accuracy |
0.9791 |
dot_accuracy |
0.0209 |
manhattan_accuracy |
0.978 |
euclidean_accuracy |
0.9791 |
max_accuracy |
0.9791 |
Training Details
Training Dataset
baconnier/finance_dataset_small_private
- Dataset: baconnier/finance_dataset_small_private at d7e6492
- Size: 15,525 training samples
- Columns:
anchor
, positive
, and negative
- Approximate statistics based on the first 1000 samples:
| | anchor | positive | negative |
|:--------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string | string |