Nomic V2 Tuned 1
Model Overview
Model Features
Model Capabilities
Use Cases
🚀 Tuned Nomic V2
This is a sentence-transformers model finetuned from nomic-ai/nomic-embed-text-v2-moe on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
✨ Features
- Semantic Understanding: Maps sentences and paragraphs to a 768 - dimensional dense vector space, enabling accurate semantic similarity calculations.
- Versatile Applications: Can be used for various NLP tasks such as semantic search, paraphrase mining, text classification, and clustering.
- Fine - Tuned Performance: Finetuned on a json dataset for better performance on specific tasks.
📦 Installation
First, you need to install the Sentence Transformers library:
pip install -U sentence-transformers
💻 Usage Examples
Basic Usage
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'The company "Alpha" signed a contract with engineer Petrovsky to develop new software. During the work, Petrovsky created a patent - protected invention. The contract did not contain clauses regarding intellectual property rights. Who is the patent holder of the invention, and can the company "Alpha" use this invention without paying additional compensation?',
'<p>1. The right to obtain a patent and the exclusive right to an invention, utility model, or industrial design created during the execution of a contract for work or a contract for scientific research, experimental design, or technological work that did not directly provide for their creation belong to the contractor (executor), unless otherwise provided by the contract between them and the customer. (As amended by Federal Law <a href="102171743">dated 12.03.2014 No. 35 - FZ</a>)</p><p>In this case, the customer has the right, unless otherwise provided by the contract, to use the invention, utility model, or industrial design created in this way for the purposes for which the corresponding contract was concluded, on the terms of a simple (non - exclusive) license throughout the term of the patent without paying additional compensation for this use. When the contractor (executor) transfers the right to obtain a patent or alienates the patent itself to another person, the customer retains the right to use the invention, utility model, or industrial design on the specified terms.</p><p>2. In the case where, in accordance with the contract between the contractor (executor) and the customer, the right to obtain a patent or the exclusive right to an invention, utility model, or industrial design is transferred to the customer or a third party specified by the customer, the contractor (executor) has the right to use the created invention, utility model, or industrial design for their own needs on the terms of a free simple (non - exclusive) license throughout the term of the patent, unless otherwise provided by the contract.</p><p>3. The author of the invention, utility model, or industrial design specified in paragraph 1 of this article, who is not the patent holder, is paid compensation in accordance with paragraph 4 of Article 1370 of this Code.</p>',
'<p>In the transfer in the manner established by Part 11 of Article 154 of Federal Law <a href="102088491">dated 22 August 2004 No. 122 - FZ</a> "On Amending the Legislative Acts of the Russian Federation and Recognizing Certain Legislative Acts of the Russian Federation as Having Lost Force in Connection with the Adoption of Federal Laws <a href="102067003">"On Amending and Supplementing the Federal Law "On the General Principles of the Organization of the Legislative (Representative) and Executive Bodies of State Power of the Subjects of the Russian Federation"</a> and <a href="102083574">"On the General Principles of the Organization of Local Self - Government in the Russian Federation"</a>, shares from federal property to the property of a subject of the Russian Federation or municipal property, from the property of a subject of the Russian Federation to federal property or municipal property, from municipal property to federal property or the property of a subject of the Russian Federation. (Added by paragraph - Federal Law <a href="102157372">dated 14.06.2012 No. 77 - FZ</a>)</p>'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
📚 Documentation
Model Details
Model Description
Property | Details |
---|---|
Model Type | Sentence Transformer |
Base model | nomic-ai/nomic-embed-text-v2-moe |
Maximum Sequence Length | 512 tokens |
Output Dimensionality | 768 dimensions |
Similarity Function | Cosine Similarity |
Training Dataset | json |
Language | en |
License | apache - 2.0 |
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NomicBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
🔧 Technical Details
Evaluation
Metrics
Information Retrieval
- Datasets:
dim_768
,dim_512
,dim_256
,dim_128
anddim_64
- Evaluated with
InformationRetrievalEvaluator
Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
---|---|---|---|---|---|
cosine_accuracy@1 | 0.0075 | 0.0068 | 0.0082 | 0.0082 | 0.0075 |
cosine_accuracy@3 | 0.06 | 0.0593 | 0.0621 | 0.0559 | 0.0628 |
cosine_accuracy@5 | 0.4304 | 0.4311 | 0.4168 | 0.384 | 0.3793 |
cosine_accuracy@10 | 0.7619 | 0.7449 | 0.7271 | 0.6951 | 0.6412 |
cosine_precision@1 | 0.0075 | 0.0068 | 0.0082 | 0.0082 | 0.0075 |
cosine_precision@3 | 0.02 | 0.0198 | 0.0207 | 0.0186 | 0.0209 |
cosine_precision@5 | 0.0861 | 0.0862 | 0.0834 | 0.0768 | 0.0759 |
cosine_precision@10 | 0.0762 | 0.0745 | 0.0727 | 0.0695 | 0.0641 |
cosine_recall@1 | 0.0075 | 0.0068 | 0.0082 | 0.0082 | 0.0075 |
cosine_recall@3 | 0.06 | 0.0593 | 0.0621 | 0.0559 | 0.0628 |
cosine_recall@5 | 0.4304 | 0.4311 | 0.4168 | 0.384 | 0.3793 |
cosine_recall@10 | 0.7619 | 0.7449 | 0.7271 | 0.6951 | 0.6412 |
cosine_ndcg@10 | 0.3 | 0.2938 | 0.2877 | 0.2752 | 0.2547 |
cosine_mrr@10 | 0.1604 | 0.1574 | 0.1548 | 0.1482 | 0.1378 |
cosine_map@100 | 0.1731 | 0.171 | 0.1693 | 0.1639 | 0.1541 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 13,186 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 18 tokens
- mean: 61.09 tokens
- max: 162 tokens
- min: 40 tokens
- mean: 258.71 tokens
- max: 512 tokens
- Samples:
anchor positive A student developed a database and filed an application for state registration, attaching a description of the database, an abstract, and a statement indicating himself as the right - holder and author. In the application, he indicated two different databases. Does the student's application meet the requirements of the law article?
1. The right - holder can, at his discretion, register such a program or database with the federal executive body for intellectual property during the term of the exclusive right to a computer program or database.
Computer programs and databases containing information constituting state secrets are not subject to state registration. The person filing the application for state registration (the applicant) is responsible for disclosing information about computer programs and databases containing information constituting state secrets in accordance with the legislation of the Russian Federation.
2. An application for state registration of a computer program or database (registration application) must relate to one computer program or one database.
The registration application must contain:
a statement for state registration of a computer program or database indicating...
Suppose an inheritance agreement provides for the transfer of ownership of a house to A, but A refuses the inheritance. Can B, the other party to the agreement, demand the execution of the agreement regarding the house if the agreement did not contain conditions for transferring the house to B in case of A's refusal?
1. The heir has the right to conclude an agreement with any of the persons who may be called to inherit (Article 1116), the terms of which determine the circle of heirs and the order of transfer of rights to the property of the testator after his death to the surviving parties to the agreement or to surviving third parties who may be called to inherit (inheritance agreement). The inheritance agreement may also contain a condition regarding the soul - executor and impose on the persons participating in the inheritance agreement who may be called to inherit the obligation to perform certain non - contrary - to - law actions of a property or non - property nature, including to execute testamentary refusals or testamentary burdens.
The consequences provided for by the inheritance agreement may be made dependent on the circumstances that have occurred by the day of the opening of the inheritance, regarding which it was unknown at the conclusion of the inheritance agreement whether they would occur or not, in particular...
What is the difference in the procedure for challenging a patent depending on the nature of the violation specified in paragraph 1 of this article?
2. A patent for an invention, utility model, or industrial design can be challenged during its term, established by paragraphs 1 - 3 of Article 1363 of this Code, by filing an objection with the federal executive body for intellectual property by any person who has learned of the violations provided for in sub - paragraphs 1 - 4 of paragraph 1 of this article.
A patent for an invention, utility model, or industrial design can be challenged in court by any person who has learned of the violations provided for in sub - paragraph 5 of paragraph 1 of this article.
A patent for an invention, utility model, or industrial design can be challenged by an interested person even after the expiration of its term on the grounds and in the manner established by the first and second paragraphs of this paragraph.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non - Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 2per_device_eval_batch_size
: 2gradient_accumulation_steps
: 16learning_rate
: 2e - 05num_train_epochs
: 4lr_scheduler_type
: cosinewarmup_ratio
: 0.1bf16
: Trueload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
Training Logs
Click to expand
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|---|
0.0243 | 10 | 0.2108 | - | - | - | - | - |
0.0485 | 20 | 0.1169 | - | - | - | - | - |
0.0728 | 30 | 0.1334 | - | - | - | - | - |
0.0971 | 40 | 0.0963 | - | - | - | - | - |
0.1213 | 50 | 0.127 | - | - | - | - | - |
0.1456 | 60 | 0.1254 | - | - | - | - | - |
0.1699 | 70 | 0.048 | - | - | - | - | - |
0.1941 | 80 | 0.0358 | - | - | - | - | - |
0.2184 | 90 | 0.0673 | - | - | - | - | - |
0.2427 | 100 | 0.049 | - | - | - | - | - |
0.2669 | 110 | 0.0222 | - | - | - | - | - |
0.2912 | 120 | 0.0657 | - | - | - | - | - |
0.3155 | 130 | 0.0878 | - | - | - | - | - |
0.3398 | 140 | 0.0396 | - | - | - | - | - |
0.3640 | 150 | 0.033 | - | - | - | - | - |
0.3883 | 160 | 0.0562 | - | - | - | - | - |
0.4126 | 170 | 0.0329 | - | - | - | - | - |
0.4368 | 180 | 0.0918 | - | - | - | - | - |
0.4611 | 190 | 0.0198 | - | - | - | - | - |
0.4854 | 200 | 0.0181 | - | - | - | - | - |
0.5096 | 210 | 0.0119 | - | - | - | - | - |
0.5339 | 220 | 0.0139 | - | - | - | - | - |
0.5582 | 230 | 0.057 | - | - | - | - | - |
0.5824 | 240 | 0.0293 | - | - | - | - | - |
0.6067 | 250 | 0.0482 | - | - | - | - | - |
0.6310 | 260 | 0.017 | - | - | - | - | - |
0.6552 | 270 | 0.0927 | - | - | - | - | - |
0.6795 | 280 | 0.0187 | - | - | - | - | - |
0.7038 | 290 | 0.0553 | - | - | - | - | - |
0.7280 | 300 | 0.015 | - | - | - | - | - |
0.7523 | 310 | 0.0438 | - | - | - | - | - |
0.7766 | 320 | 0.0087 | - | - | - | - | - |
0.8008 | 330 | 0.038 | - | - | - | - | - |
0.8251 | 340 | 0.0243 | - | - | - | - | - |
0.8494 | 350 | 0.015 | - | - | - | - | - |
0.8737 | 360 | 0.0199 | - | - | - | - | - |
0.8979 | 370 | 0.0516 | - | - | - | - | - |
0.9222 | 380 | 0.0561 | - | - | - | - | - |
0.9465 | 390 | 0.0851 | - | - | - | - | - |
0.9707 | 400 | 0.0394 | - | - | - | - | - |
0.9950 | 410 | 0.0114 | - | - | - | - | - |
0.9998 | 412 | - | 0.2806 | 0.2779 | 0.2742 | 0.2597 | 0.2253 |
1.0193 | 420 | 0.0136 | - | - | - | - | - |
1.0435 | 430 | 0.1219 | - | - | - | - | - |
1.0678 | 440 | 0.0164 | - | - | - | - | - |
1.0921 | 450 | 0.0927 | - | - | - | - | - |
1.1163 | 460 | 0.0268 | - | - | - | - | - |
1.1406 | 470 | 0.0384 | - | - | - | - | - |
1.1649 | 480 | 0.0034 | - | - | - | - | - |
1.1891 | 490 | 0.0183 | - | - | - | - | - |
1.2134 | 500 | 0.0594 | - | - | - | - | - |
1.2377 | 510 | 0.0145 | - | - | - | - | - |
1.2619 | 520 | 0.0768 | - | - | - | - | - |
1.2862 | 530 | 0.0084 | - | - | - | - | - |
1.3105 | 540 | 0.0528 | - | - | - | - | - |
1.3347 | 550 | 0.0619 | - | - | - | - | - |
1.3590 | 560 | 0.0326 | - | - | - | - | - |
1.3833 | 570 | 0.0135 | - | - | - | - | - |
1.4076 | 580 | 0.0143 | - | - | - | - | - |
1.4318 | 590 | 0.0952 | - | - | - | - | - |
1.4561 | 600 | 0.0188 | - | - | - | - | - |
1.4804 | 610 | 0.01 | - | - | - | - | - |
1.5046 | 620 | 0.091 | - | - | - | - | - |
1.5289 | 630 | 0.0205 | - | - | - | - | - |
1.5532 | 640 | 0.0156 | - | - | - | - | - |
1.5774 | 650 | 0.0101 | - | - | - | - | - |
1.6017 | 660 | 0.022 | - | - | - | - | - |
1.6260 | 670 | 0.0135 | - | - | - | - | - |
1.6502 | 680 | 0.0226 | - | - | - | - | - |
1.6745 | 690 | 0.0032 | - | - | - | - | - |
1.6988 | 700 | 0.0071 | - | - | - | - | - |
1.7230 | 710 | 0.028 | - | - | - | - | - |
1.7473 | 720 | 0.0351 | - | - | - | - | - |
1.7716 | 730 | 0.0021 | - | - | - | - | - |
1.7958 | 740 | 0.0073 | - | - | - | - | - |
1.8201 | 750 | 0.0103 | - | - | - | - | - |
1.8444 | 760 | 0.0219 | - | - | - | - | - |
1.8686 | 770 | 0.0035 | - | - | - | - | - |
1.8929 | 780 | 0.0579 | - | - | - | - | - |
1.9172 | 790 | 0.0298 | - | - | - | - | - |
1.9415 | 800 | 0.0076 | - | - | - | - | - |
1.9657 | 810 | 0.0038 | - | - | - | - | - |
1.9900 | 820 | 0.0016 | - | - | - | - | - |
1.9997 | 824 | - | 0.2945 | 0.2886 | 0.2856 | 0.2642 | 0.2411 |
2.0143 | 830 | 0.0715 | - | - | - | - | - |
2.0385 | 840 | 0.0021 | - | - | - | - | - |
2.0628 | 850 | 0.0065 | - | - | - | - | - |
2.0871 | 860 | 0.0105 | - | - | - | - | - |
2.1113 | 870 | 0.0024 | - | - | - | - | - |
2.1356 | 880 | 0.0025 | - | - | - | - | - |
2.1599 | 890 | 0.014 | - | - | - | - | - |
2.1841 | 900 | 0.0016 | - | - | - | - | - |
2.2084 | 910 | 0.008 | - | - | - | - | - |
2.2327 | 920 | 0.0041 | - | - | - | - | - |
2.2569 | 930 | 0.0308 | - | - | - | - | - |
2.2812 | 940 | 0.011 | - | - | - | - | - |
2.3055 | 950 | 0.0207 | - | - | - | - | - |
2.3297 | 960 | 0.0048 | - | - | - | - | - |
2.3540 | 970 | 0.0215 | - | - | - | - | - |
2.3783 | 980 | 0.0061 | - | - | - | - | - |
2.4025 | 990 | 0.0164 | - | - | - | - | - |
2.4268 | 1000 | 0.0255 | - | - | - | - | - |
2.4511 | 1010 | 0.0062 | - | - | - | - | - |
2.4754 | 1020 | 0.0079 | - | - | - | - | - |
2.4996 | 1030 | 0.005 | - | - | - | - | - |
2.5239 | 1040 | 0.042 | - | - | - | - | - |
2.5482 | 1050 | 0.0057 | - | - | - | - | - |
2.5724 | 1060 | 0.0384 | - | - | - | - | - |
2.5967 | 1070 | 0.009 | - | - | - | - | - |
2.6210 | 1080 | 0.0089 | - | - | - | - | - |
2.6452 | 1090 | 0.0034 | - | - | - | - | - |
2.6695 | 1100 | 0.026 | - | - | - | - | - |
2.6938 | 1110 | 0.0358 | - | - | - | - | - |
2.7180 | 1120 | 0.0033 | - | - | - | - | - |
2.7423 | 1130 | 0.0037 | - | - | - | - | - |
2.7666 | 1140 | 0.0195 | - | - | - | - | - |
2.7908 | 1150 | 0.0024 | - | - | - | - | - |
2.8151 | 1160 | 0.0533 | - | - | - | - | - |
2.8394 | 1170 | 0.0137 | - | - | - | - | - |
2.8636 | 1180 | 0.0125 | - | - | - | - | - |
2.8879 | 1190 | 0.0253 | - | - | - | - | - |
2.9122 | 1200 | 0.0068 | - | - | - | - | - |
2.9364 | 1210 | 0.0436 | - | - | - | - | - |
2.9607 | 1220 | 0.0021 | - | - | - | - | - |
2.9850 | 1230 | 0.0129 | - | - | - | - | - |
2.9995 | 1236 | - | 0.2986 | 0.2955 | 0.2842 | 0.2749 | 0.2512 |
3.0093 | 1240 | 0.0037 | - | - | - | - | - |
3.0335 | 1250 | 0.0161 | - | - | - | - | - |
3.0578 | 1260 | 0.0164 | - | - | - | - | - |
3.0821 | 1270 | 0.0007 | - | - | - | - | - |
3.1063 | 1280 | 0.0023 | - | - | - | - | - |
3.1306 | 1290 | 0.0073 | - | - | - | - | - |
3.1549 | 1300 | 0.0134 | - | - | - | - | - |
3.1791 | 1310 | 0.0021 | - | - | - | - | - |
3.2034 | 1320 | 0.0571 | - | - | - | - | - |
3.2277 | 1330 | 0.0376 | - | - | - | - | - |
3.2519 | 1340 | 0.0049 | - | - | - | - | - |
3.2762 | 1350 | 0.0151 | - | - | - | - | - |
3.3005 | 1360 | 0.002 | - | - | - | - | - |
3.3247 | 1370 | 0.0276 | - | - | - | - | - |
3.3490 | 1380 | 0.0007 | - | - | - | - | - |
3.3733 | 1390 | 0.0324 | - | - | - | - | - |
3.3975 | 1400 | 0.0043 | - | - | - | - | - |
3.4218 | 1410 | 0.0074 | - | - | - | - | - |
3.4461 | 1420 | 0.005 | - | - | - | - | - |
3.4703 | 1430 | 0.0066 | - | - | - | - | - |
3.4946 | 1440 | 0.0039 | - | - | - | - | - |
3.5189 | 1450 | 0.0056 | - | - | - | - | - |
3.5432 | 1460 | 0.0039 | - | - | - | - | - |
3.5674 | 1470 | 0.0148 | - | - | - | - | - |
3.5917 | 1480 | 0.0029 | - | - | - | - | - |
3.6160 | 1490 | 0.011 | - | - | - | - | - |
3.6402 | 1500 | 0.0029 | - | - | - | - | - |
3.6645 | 1510 | 0.0057 | - | - | - | - | - |
3.6888 | 1520 | 0.0013 | - | - | - | - | - |
3.7130 | 1530 | 0.0618 | - | - | - | - | - |
3.7373 | 1540 | 0.0102 | - | - | - | - | - |
3.7616 | 1550 | 0.0009 | - | - | - | - | - |
3.7858 | 1560 | 0.023 | - | - | - | - | - |
3.8101 | 1570 | 0.0067 | - | - | - | - | - |
3.8344 | 1580 | 0.011 | - | - | - | - | - |
3.8586 | 1590 | 0.0023 | - | - | - | - | - |
3.8829 | 1600 | 0.0154 | - | - | - | - | - |
3.9072 | 1610 | 0.0014 | - | - | - | - | - |
3.9314 | 1620 | 0.0024 | - | - | - | - | - |
3.9557 | 1630 | 0.0034 | - | - | - | - | - |
3.9800 | 1640 | 0.0022 | - | - | - | - | - |
3.9994 | 1648 | - | 0.3 | 0.2938 | 0.2877 | 0.2752 | 0.2547 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.4.1
- Transformers: 4.43.0
- PyTorch: 2.6.0+cu124
- Accelerate: 1.3.0
- Datasets: 3.3.2
- Tokenizers: 0.19.1
📄 License
This model is licensed under the apache - 2.0 license.
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard - Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al - Rfou and Brian Strope and Yun - hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}





