bge-base-en-v1.5-course-recommender-v5 Open-source Model - Free Mapping of Sentences and Paragraphs to Vector Space

Bge Base En V1.5 Course Recommender V5

Developed by datasocietyco

This is a sentence-transformers model fine-tuned from BAAI/bge-base-en-v1.5, which maps sentences and paragraphs to a 768-dimensional dense vector space.

Text Embedding

Safetensors

#Course Recommendation Semantic Matching #Few-shot Fine-tuning #Multiple Negative Ranking

Downloads 15.87k

Release Time : 4/25/2025

Model Overview

This model is primarily used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and similar tasks.

Model Features

High-dimensional Vector Representation

Maps sentences and paragraphs to a 768-dimensional dense vector space

Multiple Negative Ranking Loss

Trained using multiple negative ranking loss to improve similarity calculation accuracy

Long Text Processing Capability

Maximum sequence length of 512 tokens, suitable for processing longer texts

Model Capabilities

Calculate sentence similarity

Semantic search

Text classification

Text clustering

Paraphrase mining

Use Cases

Education Domain

Course Recommendation

Course recommendations based on semantic similarity of course descriptions

Information Retrieval

Semantic Search

Search system based on semantics rather than keyword matching

🚀 SentenceTransformer based on BAAI/bge-base-en-v1.5

This model is a finetuned version of BAAI/bge-base-en-v1.5 using sentence-transformers. It maps sentences and paragraphs to a 768-dimensional dense vector space, which can be applied in semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and other tasks.

🚀 Quick Start

First, you need to install the Sentence Transformers library:

pip install -U sentence-transformers

Then, you can load this model and perform inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("datasocietyco/bge-base-en-v1.5-course-recommender-v5")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

✨ Features

Semantic Mapping: Maps sentences and paragraphs to a 768 - dimensional dense vector space.
Multiple Applications: Can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, etc.

📦 Installation

pip install -U sentence-transformers

💻 Usage Examples

Basic Usage

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("datasocietyco/bge-base-en-v1.5-course-recommender-v5")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

📚 Documentation

Model Details

Model Description

Property	Details
Model Type	Sentence Transformer
Base model	BAAI/bge-base-en-v1.5
Maximum Sequence Length	512 tokens
Output Dimensionality	768 tokens
Similarity Function	Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Training Details

Training Dataset

Unnamed Dataset

Size: 45 training samples
Columns: anchor and positive
Approximate statistics based on the first 45 samples:
anchor positive
type string string
details
min: 143 tokens
mean: 178.76 tokens
max: 258 tokens
min: 141 tokens
mean: 176.76 tokens
max: 256 tokens

	anchor	positive
type	string	string
details	min: 143 tokens mean: 178.76 tokens max: 258 tokens	min: 141 tokens mean: 176.76 tokens max: 256 tokens

Samples:

anchor	positive
Creating and Querying Data in SQL. This course builds on foundational data skills to teach learners how to effectively manipulate data in Structured Query Language (SQL). By the end of this course, learners will be able to describe database structures, import data into a database, and combine and manipulate data within a single table. Learners will also have gained exposure to working with data of different types, including string, numerical, and temporal.. tags: 'DML', 'analytics', 'ERD', 'SQL', 'functions', 'DQL', 'DDL'. Languages: Course language: TBD. Prerequisites: No prerequisite course required. Target audience: Professionals with limited or no experience in SQL or similar languages. Junior analysts or analysts familiar with other similar programming languages and frameworks.	`Course Name:Creating and Querying Data in SQL`
Clustering Categorical and Mixed Data in Python. In this course, learners will prepare data for, implement, and optimize three advanced clustering models in Python while comparing their different use cases. In particular, this course focuses on the suitability of different clustering methods for different kinds of data: numerical, categorical, and mixed. Learners will distinguish between k - modes, mean shift, and k - prototypes models, developing their understanding of when each model will best meet their needs.. tags: 'k - prototypes', 'mean - shift', 'clustering', 'k - modes'. Languages: Course language: Python. Prerequisites: No prerequisite course required. Target audience: This is an introductory level course for data scientists who want to learn to detect and visualize underlying patterns and groups in unlabelled data and how to handle different types of data..	`Course Name:Clustering Categorical and Mixed Data in Python`
Hierarchical and Density - Based Clustering in Python. In this course, learners will encounter more sophisticated methods for generating clusters within unlabeled data using Python. The first method, hierarchical clustering, creates easy - to - read, tree branch - like clusters in order of increasing specificity. The second method, DBSCAN (Density - Based Spatial Clustering of Applications with Noise), creates groups based on the concentration of data points within a region, facilitating analysis of irregularly shaped data. By the end of this course, learners will prepare data for, implement, and optimize these models.. tags: 'Hierarchical', 'clustering', 'DBSCAN'. Languages: Course language: Python. Prerequisites: No prerequisite course required. Target audience: This is an introductory level course for data scientists who...	...

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご