🚀 Vikhr-YandexGPT-5-Lite-8B-it
An instructional model based on YandexGPT-5-Lite-8B-pretrain, trained on the Russian-language datasets GrandMaster-PRO-MAX and Grounded-RAG-RU-v2 using SFT.
✨ Features
📦 Quantized Variants
🚀 Quick Start
Try now

Training
Vikhr-YandexGPT-5-Lite-8B-it was created using the SFT (Supervised Fine-Tuning) method.
Instructional SFT Part
For the SFT stage of model training, we prepared a large (150k instructions) instructional synthetic dataset Vikhrmodels/GrandMaster-PRO-MAX. Its feature is the built-in CoT (Chain-Of-Thought), for which we used a modified prompt for gpt-4-turbo. Details can be found in the dataset card.
In addition, to implement RAG Grounding, we prepared another synthetic dataset - Vikhrmodels/Grounded-RAG-RU-v2 (50k dialogues). Its collection pipeline is quite complex for a short description, and you can read more about it in its dataset card.
Training Config
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
input_text = "Напиши краткое описание фильма Назад в будущее."
messages = [
{"role": "user", "content": input_text},
]
input_ids = tokenizer.apply_chat_template(messages, truncation=True, add_generation_prompt=True, return_tensors="pt")
output = model.generate(
input_ids,
max_length=1512,
temperature=0.7,
)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
Model Response
"Back to the Future" is an American science fiction film released in 1985. The film was directed by Robert Zemeckis, and the screenplay was written by Bob Gale. The main roles were played by Michael J. Fox, Christopher Lloyd, and Lea Thompson.
The film tells the story of Marty McFly, an ordinary teenager from 1985, who accidentally travels back to 1955 thanks to the invention of his friend, the scientist Dr. Emmett Brown. Marty finds himself in the past, where he must help Dr. Brown, who was young and naive at the time, invent the time machine.
During his adventures, Marty meets the young Dr. Brown and his family, and he also falls in love with a girl who will become his mother in the future. Marty must not only correct the mistakes of the past but also prevent a catastrophe that could change the future.
The film won numerous awards and became a cult classic, spawning two sequels and many memes and quotes that are still popular today.
Advanced Usage - Working with RAG
The role of documents
is a list of dictionaries describing the content of documents, using json.dumps(array, ensure_ascii=False)
(see the example below). The content of documents can be presented in 3 different formats: Markdown, HTML, Plain Text. The content of each document can be a text chunk up to 4k characters long.
[
{
"doc_id": (0..5),
"title": "(null or str)",
"content": "(html or markdown or plain text)"
}
]
Example of Correct Usage with OpenAI-like API
Start the vLLM server: vllm serve --dtype half --max-model-len 32000 -tp 1 Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it --api-key token-abc123
GROUNDED_SYSTEM_PROMPT = "Your task is to answer the user's questions using only the information from the provided documents. Give two answers to each question: one with a list of relevant document identifiers and the second with the answer to the question itself, using documents with these identifiers."
documents = [
{
"doc_id": 0,
"title": "Global Warming: Glaciers",
"content": "Over the past 50 years, the volume of glaciers worldwide has decreased by 30%"
},
{
"doc_id": 1,
"title": "Global Warming: Sea Level",
"content": "The global sea level has risen by 20 cm since 1880 and continues to rise at a rate of 3.3 mm per year"
}
]
sample_history = [
{'role': 'system', 'content': GROUNDED_SYSTEM_PROMPT},
{'role': 'documents', 'content': json.dumps(documents, ensure_ascii=False)},
{'role': 'user', 'content': 'Global Warming'}
]
relevant_indexes = llm_client.chat.completions.create(
model=llm_model,
messages=sample_history,
temperature=0.0,
max_tokens=2048
).choices[0].message.content
print('Using documents: ' + relevant_indexes + '\n----')
final_answer = llm_client.chat.completions.create(
model=llm_model,
messages=sample_history + [{'role': 'assistant', 'content': relevant_indexes}],
temperature=0.3,
max_tokens=2048
).choices[0].message.content
print(final_answer)
The response after running the code will look something like this:
Using documents: {"relevant_doc_ids": [0, 1]}
Global warming is the long-term increase in the average temperature of the Earth's atmosphere and oceans. This phenomenon has numerous consequences for our planet, and two key aspects can be highlighted based on the available data:
-
Glacier Volume Reduction: Over the past 50 years, there has been a significant reduction in the volume of glaciers worldwide. According to the data, the volume of glaciers has decreased by 30%. This may be related to the melting of glaciers due to rising temperatures, which is one of the signs of global warming.
-
Sea Level Rise: The global sea level is also rising, which is related to the melting of glaciers and ice sheets, as well as the expansion of water due to rising temperatures. Since 1880, the sea level has risen by 20 centimeters, and this process continues, with an annual increase of 3.3 millimeters.
These changes have serious consequences for ecosystems, the climate, and human society. The melting of glaciers leads to a rise in sea levels, which can cause flooding of coastal areas and islands, as well as changes in water resources and climate patterns.
Using the first model response relevant_indexes
(JSON), you can determine whether the model found information in the documents. It is trained to return an empty array if there is no information. In such a case, it will answer that it could not find information in the knowledge base (when generating the second response).
📚 Documentation
Nuances and Limitations
⚠️ Important Note
- The model has a low level of response security and is aimed at correctly and fully executing instructions. Keep this in mind when using it and test it yourself. This can be partially corrected by system prompts and additional instructions about the importance of security in the user's prompt.
- System prompts are not intended for character descriptions. We recommend using them to specify the response style (such as "answer only in json format"). In addition, it is advisable to write them in English because that's how it was in the dataset. The language of the response is not affected by using English in system prompts.
- The RAG mode requires the presence of the system prompt
GROUNDED_SYSTEM_PROMPT
described in the How to Work with RAG section. Sometimes, the model may add general information from its knowledge to the answer in addition to what is in the documents.
- It is better to use the model with a low temperature (0.1 - 0.5) and also use top_k (30 - 50). Random generation defects were observed at a temperature of 1.0.
Authors
@inproceedings{nikolich2024vikhr,
title={Vikhr: Advancing Open-Source Bilingual Instruction-Following Large Language Models for Russian and English},
author={Aleksandr Nikolich and Konstantin Korolev and Sergei Bratchikov and Nikolay Kompanets and Igor Kiselev and Artem Shelmanov},
booktitle={Proceedings of the 4th Workshop on Multilingual Representation Learning (MRL) @ EMNLP-2024},
year={2024},
publisher={Association for Computational Linguistics},
url={https://arxiv.org/pdf/2405.13929}
}
📄 License
The model is under the yandexgpt-5-lite-8b-pretrain license.
Property |
Details |
Library Name |
transformers |
Model Name |
Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it |
Datasets |
Vikhrmodels/GrandMaster-PRO-MAX, Vikhrmodels/Grounded-RAG-RU-v2 |
Base Model |
yandex/YandexGPT-5-Lite-8B-pretrain |
Language |
ru, en |
License |
other |
License Name |
yandexgpt-5-lite-8b-pretrain |
License Link |
LICENSE |