đ SentenceTransformer based on BAAI/bge-base-en-v1.5
This model is a finetuned version of BAAI/bge-base-en-v1.5 using sentence-transformers. It maps sentences and paragraphs to a 768-dimensional dense vector space, which can be applied in semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and other tasks.
đ Quick Start
First, you need to install the Sentence Transformers library:
pip install -U sentence-transformers
Then, you can load this model and perform inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("datasocietyco/bge-base-en-v1.5-course-recommender-v5")
sentences = [
'The weather is lovely today.',
"It's so sunny outside!",
'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
⨠Features
- Semantic Mapping: Maps sentences and paragraphs to a 768 - dimensional dense vector space.
- Multiple Applications: Can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, etc.
đĻ Installation
pip install -U sentence-transformers
đģ Usage Examples
Basic Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("datasocietyco/bge-base-en-v1.5-course-recommender-v5")
sentences = [
'The weather is lovely today.',
"It's so sunny outside!",
'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
đ Documentation
Model Details
Model Description
Property |
Details |
Model Type |
Sentence Transformer |
Base model |
BAAI/bge-base-en-v1.5 |
Maximum Sequence Length |
512 tokens |
Output Dimensionality |
768 tokens |
Similarity Function |
Cosine Similarity |
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Training Details
Training Dataset
Unnamed Dataset
- Size: 45 training samples
- Columns:
anchor
and positive
- Approximate statistics based on the first 45 samples:
|
anchor |
positive |
type |
string |
string |
details |
- min: 143 tokens
- mean: 178.76 tokens
- max: 258 tokens
|
- min: 141 tokens
- mean: 176.76 tokens
- max: 256 tokens
|
- Samples:
anchor |
positive |
Creating and Querying Data in SQL. This course builds on foundational data skills to teach learners how to effectively manipulate data in Structured Query Language (SQL). By the end of this course, learners will be able to describe database structures, import data into a database, and combine and manipulate data within a single table. Learners will also have gained exposure to working with data of different types, including string, numerical, and temporal.. tags: 'DML', 'analytics', 'ERD', 'SQL', 'functions', 'DQL', 'DDL'. Languages: Course language: TBD. Prerequisites: No prerequisite course required. Target audience: Professionals with limited or no experience in SQL or similar languages. Junior analysts or analysts familiar with other similar programming languages and frameworks. |
Course Name:Creating and Querying Data in SQL |
Clustering Categorical and Mixed Data in Python. In this course, learners will prepare data for, implement, and optimize three advanced clustering models in Python while comparing their different use cases. In particular, this course focuses on the suitability of different clustering methods for different kinds of data: numerical, categorical, and mixed. Learners will distinguish between k - modes, mean shift, and k - prototypes models, developing their understanding of when each model will best meet their needs.. tags: 'k - prototypes', 'mean - shift', 'clustering', 'k - modes'. Languages: Course language: Python. Prerequisites: No prerequisite course required. Target audience: This is an introductory level course for data scientists who want to learn to detect and visualize underlying patterns and groups in unlabelled data and how to handle different types of data.. |
Course Name:Clustering Categorical and Mixed Data in Python |
Hierarchical and Density - Based Clustering in Python. In this course, learners will encounter more sophisticated methods for generating clusters within unlabeled data using Python. The first method, hierarchical clustering, creates easy - to - read, tree branch - like clusters in order of increasing specificity. The second method, DBSCAN (Density - Based Spatial Clustering of Applications with Noise), creates groups based on the concentration of data points within a region, facilitating analysis of irregularly shaped data. By the end of this course, learners will prepare data for, implement, and optimize these models.. tags: 'Hierarchical', 'clustering', 'DBSCAN'. Languages: Course language: Python. Prerequisites: No prerequisite course required. Target audience: This is an introductory level course for data scientists who... |
... |