Vikhr-Nemo-12B-Instruct-R-21-09-24 Open-Source Bilingual Large Model - Handles Russian and English Reasoning, Summarization, and Coding

Vikhr Nemo 12B Instruct R 21 09 24

Developed by Vikhrmodels

Vikhr-Nemo is a bilingual large language model optimized based on Mistral-Nemo-Instruct-2407, specifically designed for Russian and English, supporting various tasks such as logical reasoning, text summarization, and code generation.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Bilingual RAG Expert #128k Long Context #Logical Reasoning Optimization

Downloads 3,707

Release Time : 9/20/2024

Model Overview

Flagship unimodal large language model, optimized through supervised fine-tuning and SMPO alignment, featuring high-performance retrieval-augmented generation (RAG) capabilities, supporting multilingual generation and system prompt control.

Model Features

High-Quality Bilingual Generation

Optimized based on the GrandMaster-PRO-MAX dataset, excelling in Russian and English performance

System Prompt Support

Flexibly control response styles through system instructions

Extended Context

Inherits the base model's 128k tokens context window

Anchored RAG Mode

Built-in special role 'documents' that automatically retrieves relevant document IDs and generates responses based on content

Model Capabilities

Text generation

Dialogue interaction

Code generation

Logical reasoning

Text summarization

Retrieval-Augmented Generation (RAG)

Multilingual support

Use Cases

Content Generation

Multilingual Content Creation

Generate high-quality Russian and English content

Achieved a 79.8% win rate in the ru-arena-general benchmark

Information Retrieval

Document-Based Q&A

Use RAG mode to answer document-related questions

Achieved 73% accuracy in domain-specific questions

Code Assistance

Code Generation and Explanation

Generate and explain programming code

🚀 Vikhr-Nemo-12B-Instruct-R-21-09-24

Vikhr-Nemo is our flagship unimodal Large Language Model (LLM), an improved version of mistralai/Mistral-Nemo-Instruct-2407 adapted by the VikhrModels team, primarily for Russian and English. It's optimized for various use cases, including reasoning, summarization, coding, role-playing, and dialogue maintenance. With multilingual generation capabilities and high-performance RAG features, we believe it can rival gpt-4o-mini from OpenAI in some tasks, such as RAG.

Reame.md in English

✨ Features

High-quality generation: Delivers excellent results in Russian, English, and some other languages, thanks to the Grandmaster-PRO-MAX dataset and the base model.
System prompt support: Allows regulation of response styles through system prompts.
Large context support: Supports up to 128k tokens of context, inherited from the base model.
Grounded RAG mode: Comes with a special "documents" role and a dedicated mode for finding relevant document identifiers and using them to answer user questions, inspired by the Command-R model.

📦 Installation

No installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

The role of "documents" is a list of dictionaries describing document content, serialized using json.dumps(array, ensure_ascii=False) (see example below). Document content can be in 3 formats: Markdown, HTML, or Plain Text, with each document chunk up to 4k characters long.

[
  {
    "doc_id": (0..5),
    "title": "(null or str)",
    "content": "(html or markdown or plain text)"
  }
]

Advanced Usage

Running the vLLM server: vllm serve --dtype half --max-model-len 32000 -tp 1 Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24 --api-key token-abc123

GROUNDED_SYSTEM_PROMPT = "Your task is to answer the user's questions using only the information from the provided documents. Give two answers to each question: one with a list of relevant document identifiers and the second with the answer to the question itself, using documents with these identifiers."

documents = [
  {
    "doc_id": 0,
    "title": "Global warming: Glaciers",
    "content": "In the last 50 years, the volume of glaciers worldwide has decreased by 30%"
  },
  {
    "doc_id": 1,
    "title": "Global warming: Sea level",
    "content": "The global sea level has risen by 20 cm since 1880 and continues to rise at a rate of 3.3 mm per year"
  }
]
sample_history = [
    {'role': 'system', 'content': GROUNDED_SYSTEM_PROMPT}, 
    {'role': 'documents', 'content': json.dumps(documents, ensure_ascii=False)},
    {'role': 'user', 'content': 'Global warming'}
]
relevant_indexes = llm_client.chat.completions.create(
    model=llm_model,
    messages=sample_history,
    temperature=0.0,
    max_tokens=2048
).choices[0].message.content

print('Using documents: ' + relevant_indexes + '\n----')
final_answer = llm_client.chat.completions.create(
    model=llm_model,
    messages=sample_history + [{'role': 'assistant', 'content': relevant_indexes}],
    temperature=0.3,
    max_tokens=2048
).choices[0].message.content

print(final_answer)

The output after running the code will look something like this:

Using documents: {"relevant_doc_ids": [0, 1]}
----

Global warming refers to the long-term increase in the average temperature of the Earth's atmosphere and oceans. This phenomenon has numerous consequences for our planet, and two key aspects can be highlighted based on the available data:

1. **Glacier volume reduction**: In the last 50 years, there has been a significant decrease in the volume of glaciers worldwide. According to the data, the volume of glaciers has decreased by 30%. This could be due to the melting of glaciers caused by rising temperatures, which is one of the signs of global warming.

2. **Sea level rise**: The global sea level is also rising, which is related to the melting of glaciers and ice sheets, as well as the expansion of water due to rising temperatures. Since 1880, the sea level has risen by 20 centimeters, and this process continues, with an annual increase of 3.3 millimeters.

These changes have serious consequences for ecosystems, climate, and human society. The melting of glaciers leads to a rise in sea level, which can cause flooding of coastal areas and islands, as well as changes in water resources and climate patterns.

By using the model's first response relevant_indexes (JSON), you can determine whether the model found information in the documents. It's trained to return an empty array if no information is found, and in such cases, it will indicate that it couldn't find information in the knowledge base when generating the second response.

📚 Documentation

Model Creation Process

Instructional SFT Phase

For the SFT training phase, we prepared a large (150k instructions) synthetic instructional dataset Vikhrmodels/GrandMaster-PRO-MAX. It features an integrated Chain-Of-Thought (CoT), collected using a modified prompt for gpt-4-turbo. Details can be found in the dataset card.

In addition, to enable RAG Grounding, we prepared another synthetic dataset - Vikhrmodels/Grounded-RAG-RU-v2 (50k dialogues). Its collection pipeline is quite complex and is described in detail in its dataset card.

Alignment Phase with SMPO

To further improve the quality of responses, we used the following pipeline:

Trained a custom Reward model (not publicly available for now).
Deduplicated and filtered the original Vikhrmodels/GrandMaster-PRO-MAX dataset using the RM model, resulting in about 10k high-quality and diverse dialogues.
Performed Rejection Sampling with the SFT checkpoint using the obtained dataset and the Reward model. (Generated 7 hypotheses and selected only 2 of the worst as rejected).
Fine-tuned the SFT checkpoint using our SMPO method with the dataset from step 3. SMPO was designed and chosen to improve the stability of preference training under Rejection Sampling and achieve the desired margin.

The implementation of SMPO, rejection sampling, etc., can be found in our library effective_llm_alignment on GitHub.

The idea of using SMPO instead of other PO methods came from extensive experiments with classical methods when better control of the convergence process was needed. With careful tuning of other methods (e.g., SimPO), similar results can be achieved. However, we aimed to stabilize the process and combine the best practices from other methods.

RAG Operation

The "documents" role is a list of dictionaries describing document content, serialized using json.dumps(array, ensure_ascii=False) (see example above). Document content can be in Markdown, HTML, or Plain Text formats, with each document chunk up to 4k characters long.

🔧 Technical Details

Model Evaluation

The model was evaluated on our Russian open-source SbS benchmark ru-arena-general (50 topics with 10 questions each), where gpt-4-1106-preview served as the judge, and a benchmark for RAG based on the test set Grounded-RAG-v2, with gpt-4o as the judge.

Results on Ru-Arena-General

The responses from gpt-3.5-turbo-0125 were used as reference answers, so it has a win rate of 50%.

Here is a partial leaderboard. For more details, see the benchmark repository.

180 samples from the arena leaked into the training set. Thanks to Ilya for the information!

Model Name	Winrate	95% CI	Average # Tokens
gpt-4-1106-preview	90.9	(-1.3, 1.0)	541
gpt-4o-mini	83.9	(-1.8, 1.1)	448
vikhr-nemo-12b-instruct-r-21-09-24(180 leaked)	79.8	(-2.2, 1.9)	627
gemma-2-9b-it-sppo-iter3	73.6	(-1.6, 2.2)	509
gemma-2-9b-it	69.2	(-2.5, 1.9)	459
t-lite-instruct-0.1	64.7	(-2.1, 1.7)	810
vikhr-llama3.1-8b-instruct-r-21-09-24	63.4	(-2.1, 2.5)	618
suzume-llama-3-8B-multilingual-orpo-borda-half	57.1	(-1.9, 2.2)	682
mistral-nemo-instruct-2407	50.5	(-2.7, 2.6)	403
gpt-3.5-turbo-0125	50.0	(0.0, 0.0)	220
c4ai-command-r-v01	49.0	(-1.7, 2.2)	529
meta-llama-3.1-8b-instruct	43.1	(-2.8, 2.3)	628

Results on the RAG Benchmark

The total size of the test set is 200 examples, 100 for in-domain questions and 100 for out-of-domain questions.

The judge model gpt-4o was instructed to consider the relevance and factual completeness of responses based on the documents and the reference answer from gpt-4-1106-preview.

For details of the prompts and evaluations, see the benchmark code on Colab.

In-domain: Questions related to the content of the provided documents to some extent.
Out-of-domain: Questions specifically unrelated to the content of the provided documents.

Model	Question Type	Judge Correct Percent	Avg Answer Match RougeL	Avg Abs Indexes Diff
gpt-4o	In-domain	73%	0.34	NaN
gpt-4o	Out-of-domain	81%	0.20	NaN
Vikhr-Nemo-12B-Instruct-R-21-09-24	In-domain	68%	0.41	0
Vikhr-Nemo-12B-Instruct-R-21-09-24	Out-of-domain	92%	0.52	0
gpt-4o-mini	In-domain	65%	0.33	NaN
gpt-4o-mini	Out-of-domain	73%	0.18	NaN
gpt-3.5-turbo-0125	In-domain	49%	0.28	NaN
gpt-3.5-turbo-0125	Out-of-domain	76%	0.20	NaN

📄 License

The model is licensed under the Apache-2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご