đ SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a Sentence Transformer model fine-tuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences and paragraphs to a 384-dimensional dense vector space, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
⨠Features
- Semantic Understanding: Effectively captures the semantic meaning of sentences and paragraphs, enabling accurate similarity comparisons.
- Versatile Applications: Can be used in various natural language processing tasks such as semantic search, text classification, and clustering.
- Fine-tuned Model: Built upon the sentence-transformers/all-MiniLM-L6-v2 base model, fine-tuned for specific tasks.
đĻ Installation
First, install the Sentence Transformers library:
pip install -U sentence-transformers
đģ Usage Examples
Basic Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("anass1209/resume-job-matcher-all-MiniLM-L6-v2")
sentences = [
'Developed and maintained core backend services using Python and Django, focusing on scalability and efficiency. Implemented RESTful APIs for data retrieval and manipulation. Worked extensively with PostgreSQL for data storage and retrieval. Responsible for optimizing database queries and improving API response times. Experience with model fine-tuning for semantic search and document retrieval using pre-trained embedding models like Sentence Transformers or similar libraries, specifically for improving the relevance of search results and document matching within the web application. Experience using vector databases (e.g., ChromaDB, Weaviate) preferred.',
'## Senior Backend Engineer\n\n* **ABC Corp** | 2020 - Present\n* Led development of a new REST API for user authentication and profile management using Python and Django.\n* Managed a PostgreSQL database, optimizing queries and schema design for improved performance, resulting in a 20% reduction in average API response time.\n* Improved system scalability through efficient code design and load balancing techniques.\n* Experience using pre-trained embedding models (BERT) for natural language processing tasks to improve search accuracy, with focus on keyphrase extraction and content similarity comparison for the recommendations engine. Proficient in Flask.',
"PhD in Computer Science, University of California, Berkeley (2018-2023). Dissertation: 'Adversarial Robustness in NLP for Cybersecurity Applications.' Focused on fine-tuning BERT for malware detection and social engineering attacks. Proficient in Python, TensorFlow, and AWS. Published in top-tier NLP and security conferences. Experienced with large datasets and model evaluation metrics.\n\nMaster of Science in Cybersecurity, Johns Hopkins University (2016-2018). Relevant coursework included Machine Learning, Data Mining, and Network Security. Developed a system for anomaly detection using a recurrent neural network (RNN). Familiar with Python and cloud computing platforms. Good understanding of NLP concepts, but limited experience fine-tuning transformer models. Strong understanding of Information Security Principles.\n\nBachelor of Science in Computer Engineering, Carnegie Mellon University (2012-2016). Relevant coursework: Artificial Intelligence, Database Management, and Software Engineering. Project experience: Developed a web application using Python. No direct experience with fine-tuning NLP models, but a strong foundation in programming and data structures. Familiar with cloud infrastructure concepts. Possess CISSP certification.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
đ Documentation
Model Details
Model Description
Property |
Details |
Model Type |
Sentence Transformer |
Base model |
sentence-transformers/all-MiniLM-L6-v2 |
Maximum Sequence Length |
256 tokens |
Output Dimensionality |
384 dimensions |
Similarity Function |
Cosine Similarity |
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
đ§ Technical Details
The model is based on the Sentence Transformers framework, which uses a pre-trained Transformer model (BertModel in this case) for encoding sentences. The pooling layer aggregates the token embeddings into a single sentence embedding, and the normalization layer normalizes the embeddings to have a unit length. This allows for efficient computation of cosine similarity between sentences.
đ License
No license information provided in the original document.
Evaluation
Metrics
Semantic Similarity
- Datasets:
dev_evaluation
and test_evaluation
- Metrics:
- Pearson Cosine: 0.5378933775375572
- Spearman Cosine: 0.6213226022358173
The model's performance on these metrics indicates its ability to capture semantic similarity between sentences. A higher value for these metrics suggests better performance in semantic similarity tasks.