Model Overview
Model Features
Model Capabilities
Use Cases
đ Vikhr-Nemo-12B-Instruct-R-21-09-24
Vikhr-Nemo is our flagship unimodal Large Language Model (LLM), an improved version of mistralai/Mistral-Nemo-Instruct-2407 adapted by the VikhrModels team, primarily for Russian and English. It's optimized for various use cases, including reasoning, summarization, coding, role-playing, and dialogue maintenance. With multilingual generation capabilities and high-performance RAG features, we believe it can rival gpt-4o-mini from OpenAI in some tasks, such as RAG.
⨠Features
- High-quality generation: Delivers excellent results in Russian, English, and some other languages, thanks to the Grandmaster-PRO-MAX dataset and the base model.
- System prompt support: Allows regulation of response styles through system prompts.
- Large context support: Supports up to 128k tokens of context, inherited from the base model.
- Grounded RAG mode: Comes with a special "documents" role and a dedicated mode for finding relevant document identifiers and using them to answer user questions, inspired by the Command-R model.
đĻ Installation
No installation steps are provided in the original document, so this section is skipped.
đģ Usage Examples
Basic Usage
The role of "documents" is a list of dictionaries describing document content, serialized using json.dumps(array, ensure_ascii=False)
(see example below). Document content can be in 3 formats: Markdown, HTML, or Plain Text, with each document chunk up to 4k characters long.
[
{
"doc_id": (0..5),
"title": "(null or str)",
"content": "(html or markdown or plain text)"
}
]
Advanced Usage
Running the vLLM server: vllm serve --dtype half --max-model-len 32000 -tp 1 Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24 --api-key token-abc123
GROUNDED_SYSTEM_PROMPT = "Your task is to answer the user's questions using only the information from the provided documents. Give two answers to each question: one with a list of relevant document identifiers and the second with the answer to the question itself, using documents with these identifiers."
documents = [
{
"doc_id": 0,
"title": "Global warming: Glaciers",
"content": "In the last 50 years, the volume of glaciers worldwide has decreased by 30%"
},
{
"doc_id": 1,
"title": "Global warming: Sea level",
"content": "The global sea level has risen by 20 cm since 1880 and continues to rise at a rate of 3.3 mm per year"
}
]
sample_history = [
{'role': 'system', 'content': GROUNDED_SYSTEM_PROMPT},
{'role': 'documents', 'content': json.dumps(documents, ensure_ascii=False)},
{'role': 'user', 'content': 'Global warming'}
]
relevant_indexes = llm_client.chat.completions.create(
model=llm_model,
messages=sample_history,
temperature=0.0,
max_tokens=2048
).choices[0].message.content
print('Using documents: ' + relevant_indexes + '\n----')
final_answer = llm_client.chat.completions.create(
model=llm_model,
messages=sample_history + [{'role': 'assistant', 'content': relevant_indexes}],
temperature=0.3,
max_tokens=2048
).choices[0].message.content
print(final_answer)
The output after running the code will look something like this:
Using documents: {"relevant_doc_ids": [0, 1]}
----
Global warming refers to the long-term increase in the average temperature of the Earth's atmosphere and oceans. This phenomenon has numerous consequences for our planet, and two key aspects can be highlighted based on the available data:
1. **Glacier volume reduction**: In the last 50 years, there has been a significant decrease in the volume of glaciers worldwide. According to the data, the volume of glaciers has decreased by 30%. This could be due to the melting of glaciers caused by rising temperatures, which is one of the signs of global warming.
2. **Sea level rise**: The global sea level is also rising, which is related to the melting of glaciers and ice sheets, as well as the expansion of water due to rising temperatures. Since 1880, the sea level has risen by 20 centimeters, and this process continues, with an annual increase of 3.3 millimeters.
These changes have serious consequences for ecosystems, climate, and human society. The melting of glaciers leads to a rise in sea level, which can cause flooding of coastal areas and islands, as well as changes in water resources and climate patterns.
By using the model's first response relevant_indexes
(JSON), you can determine whether the model found information in the documents. It's trained to return an empty array if no information is found, and in such cases, it will indicate that it couldn't find information in the knowledge base when generating the second response.
đ Documentation
Model Creation Process
Instructional SFT Phase
For the SFT training phase, we prepared a large (150k instructions) synthetic instructional dataset Vikhrmodels/GrandMaster-PRO-MAX. It features an integrated Chain-Of-Thought (CoT), collected using a modified prompt for gpt-4-turbo. Details can be found in the dataset card.
In addition, to enable RAG Grounding, we prepared another synthetic dataset - Vikhrmodels/Grounded-RAG-RU-v2 (50k dialogues). Its collection pipeline is quite complex and is described in detail in its dataset card.
Alignment Phase with SMPO
To further improve the quality of responses, we used the following pipeline:
- Trained a custom Reward model (not publicly available for now).
- Deduplicated and filtered the original Vikhrmodels/GrandMaster-PRO-MAX dataset using the RM model, resulting in about 10k high-quality and diverse dialogues.
- Performed Rejection Sampling with the SFT checkpoint using the obtained dataset and the Reward model. (Generated 7 hypotheses and selected only 2 of the worst as rejected).
- Fine-tuned the SFT checkpoint using our SMPO method with the dataset from step 3. SMPO was designed and chosen to improve the stability of preference training under Rejection Sampling and achieve the desired margin.
The implementation of SMPO, rejection sampling, etc., can be found in our library effective_llm_alignment on GitHub.
The idea of using SMPO instead of other PO methods came from extensive experiments with classical methods when better control of the convergence process was needed. With careful tuning of other methods (e.g., SimPO), similar results can be achieved. However, we aimed to stabilize the process and combine the best practices from other methods.
RAG Operation
The "documents" role is a list of dictionaries describing document content, serialized using json.dumps(array, ensure_ascii=False)
(see example above). Document content can be in Markdown, HTML, or Plain Text formats, with each document chunk up to 4k characters long.
đ§ Technical Details
Model Evaluation
The model was evaluated on our Russian open-source SbS benchmark ru-arena-general (50 topics with 10 questions each), where gpt-4-1106-preview served as the judge, and a benchmark for RAG based on the test set Grounded-RAG-v2, with gpt-4o as the judge.
Results on Ru-Arena-General
The responses from gpt-3.5-turbo-0125 were used as reference answers, so it has a win rate of 50%.
Here is a partial leaderboard. For more details, see the benchmark repository.
180 samples from the arena leaked into the training set. Thanks to Ilya for the information!
Model Name | Winrate | 95% CI | Average # Tokens |
---|---|---|---|
gpt-4-1106-preview | 90.9 | (-1.3, 1.0) | 541 |
gpt-4o-mini | 83.9 | (-1.8, 1.1) | 448 |
vikhr-nemo-12b-instruct-r-21-09-24(180 leaked) | 79.8 | (-2.2, 1.9) | 627 |
gemma-2-9b-it-sppo-iter3 | 73.6 | (-1.6, 2.2) | 509 |
gemma-2-9b-it | 69.2 | (-2.5, 1.9) | 459 |
t-lite-instruct-0.1 | 64.7 | (-2.1, 1.7) | 810 |
vikhr-llama3.1-8b-instruct-r-21-09-24 | 63.4 | (-2.1, 2.5) | 618 |
suzume-llama-3-8B-multilingual-orpo-borda-half | 57.1 | (-1.9, 2.2) | 682 |
mistral-nemo-instruct-2407 | 50.5 | (-2.7, 2.6) | 403 |
gpt-3.5-turbo-0125 | 50.0 | (0.0, 0.0) | 220 |
c4ai-command-r-v01 | 49.0 | (-1.7, 2.2) | 529 |
meta-llama-3.1-8b-instruct | 43.1 | (-2.8, 2.3) | 628 |
Results on the RAG Benchmark
The total size of the test set is 200 examples, 100 for in-domain questions and 100 for out-of-domain questions.
The judge model gpt-4o was instructed to consider the relevance and factual completeness of responses based on the documents and the reference answer from gpt-4-1106-preview.
For details of the prompts and evaluations, see the benchmark code on Colab.
- In-domain: Questions related to the content of the provided documents to some extent.
- Out-of-domain: Questions specifically unrelated to the content of the provided documents.
Model | Question Type | Judge Correct Percent | Avg Answer Match RougeL | Avg Abs Indexes Diff |
---|---|---|---|---|
gpt-4o | In-domain | 73% | 0.34 | NaN |
gpt-4o | Out-of-domain | 81% | 0.20 | NaN |
Vikhr-Nemo-12B-Instruct-R-21-09-24 | In-domain | 68% | 0.41 | 0 |
Vikhr-Nemo-12B-Instruct-R-21-09-24 | Out-of-domain | 92% | 0.52 | 0 |
gpt-4o-mini | In-domain | 65% | 0.33 | NaN |
gpt-4o-mini | Out-of-domain | 73% | 0.18 | NaN |
gpt-3.5-turbo-0125 | In-domain | 49% | 0.28 | NaN |
gpt-3.5-turbo-0125 | Out-of-domain | 76% | 0.20 | NaN |
đ License
The model is licensed under the Apache-2.0 license.

