Model Overview
Model Features
Model Capabilities
Use Cases
🚀 RigoChat-7b-v2
RigoChat-7b-v2 is a Qwen-2.5-based model that offers accurate responses to Spanish queries, fine - tuned for enhanced Spanish - language performance.
🚀 Quick Start
RigoChat-7b-v2 is a model based on Qwen/Qwen2.5-7B-Instruct and fine - tuned with Direct Preference Optimization (DPO) for better performance in Spanish.
To load the model and tokenizer
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
)
import torch
model_name = "IIC/RigoChat-7b-v2"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="cuda",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
model_name,
trust_remote_code=True,
)
Sample generation
messages = [
{"role": "user", "content": "¿Cómo puedo transformar un diccionario de listas en una lista de diccionarios, y viceversa, en Python sin utilizar bucles for?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024,
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
For a better experience, we recommend to use the following generation parameters.
Tool Use
def obtener_temperatura_actual(location: str) -> float:
"""
Obtener la temperatura actual de una localización.
Args:
location: La localización, con el siguiente formato: "Ciudad, País."
Returns:
El tiempo en dicha localización, en grados Celsius.
"""
return 22.
messages = [
{"role": "user", "content": "¿Cuál es el tiempo en Madrid ahora mismo?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
tools=[obtener_temperatura_actual],
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Check the tool use documentation from HuggingFace for more information.
If the model generates a tool call, you should add it to the chat like so:
import re
import json
tools = {
"obtener_temperatura_actual" : obtener_temperatura_actual,
}
tool_call = re.search(
r"<tool_call>\s*(\{.*?\})\s*</tool_call>",
response,
)
tool_call = json.loads(tool_call.group(1))
# Add tool metadata to messages
messages.append(
{
"role": "assistant",
"tool_calls": [{"type": "function", "function": tool_call}],
},
)
# Add tool result to messages
messages.append(
{
"role": "tool",
"name": tool_call["name"],
"content": tools[tool_call["name"]](**tool_call["arguments"]),
},
)
The above code is intended only for when the model generates a function call, but the same logic can be used if several functions are called at the same time. After that, you can continue to generate messages as normal:
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
tools=[obtener_temperatura_actual],
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
✨ Features
- Improved performance on generalist tasks in Spanish.
- Enhanced safety and reduced hallucinations in RAG systems with Spanish texts.
- Possibility of using it in different hardware requirements, especially those with reduced computational capacity. For more information on how to use RigoChat-7b-v2 on reduced hardware, see IIC/RigoChat-7b-v2-GGUF.
📦 Installation
No specific installation steps are provided in the original document.
📚 Documentation
Model Details
Model Description
This model is the second version of RigoChat, a family of Large Language Models (LLMs) designed to solve typical NLP tasks with Spanish instructions such as: Tool Use, Summarization, Math, Code, Abstractive - QA, etc. Like Qwen/Qwen2.5-7B-Instruct, this model has no specific use case and can be applied to a wide range of tasks. Indeed, it offers a slight improvement for generalist tasks in Spanish, particularly in RAG (Retriever Augmented Generation) systems with Spanish databases, as its training focused on resolving questions about contexts to prevent hallucinations and ensure safe responses.
Property | Details |
---|---|
Developed by | Instituto de Ingeniería del Conocimiento (IIC). |
Model Type | Generative Fine - tuned Transformer. |
Language(s) (NLP) | Spanish (BCP - 47 es). |
License | RIGOCHAT NON - COMMERCIAL. |
Architecture | We use Qwen's architecture without modifications. |
Finetuned from model | Qwen/Qwen2.5-7B-Instruct. |
Model Sources
- Paper: https://arxiv.org/abs/2503.08188
Uses
Direct Use
You can use and deploy RigoChat-v2 for commercial purposes through a model package from AWS Marketplace. You can check the instructions inside the following notebook.
Out-of-Scope Use
This language model has been adapted for general natural language processing tasks in Spanish and specific use cases such as RAG. However, there are several cases where the model should not be used due to its technical and ethical limitations:
- Illegal Activities: The model should not be used to generate content related to illegal activities, such as creating malicious software, fraud, incitement to crime, or any illegal material.
- Harmful or Dangerous Content: It should not be used to generate hate speech, violence, harassment, or any content that promotes discrimination, violence, or abuse.
Bias, Risks, and Limitations
Although this model has been trained to understand and generate text in Spanish, there are several risks, biases, and limitations that users should be aware of:
- Biases: The model may reflect biases present in the training data. These biases could be related to gender, race, social class, sexual orientation, among others, and may generate responses that perpetuate stereotypes or discrimination.
- Accuracy and Reliability: While the model generates coherent and useful text in many contexts, it may not always be 100% accurate or reliable, especially in technical, scientific, or legal matters where high certainty is required.
- Limited or Outdated Knowledge: The model is not trained with information beyond its training cutoff date. Therefore, it may not reflect recent events, research, or advancements.
Recommendations
We recommend using this model as a general chatbot or within applications designed for specific tasks, such as SQL queries, RAG systems, or as an autonomous agent to facilitate the use of tools.
Training Details
Training Data
A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format chatml
and has the same structure as the Anthropic/hh-rlhf dataset. Each conversation has two variants: chosen
and rejected
, and only differs the last answer of the assistant. The last answer in the chosen
variant is considered a better answer than the one in the rejected
variant. Different techniques have been used to generate the dataset, which we explain in depth in the research (coming soon).
Training Procedure
We use the Transformer Reinforcement Learning (TRL) library. Specifically, we have applied the script they have published as an example for using DPO to the dataset we have generated.
Training Hyperparameters
Details
LORA_CONFIG = {
"r": 64,
"lora_alpha": 16,
"lora_dropout": 0.1,
"bias": "none",
"task_type": "CAUSAL_LM",
"target_modules": [
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"up_proj",
"gate_proj",
"down_proj",
],
"use_rslora": True,
}
DPO_CONFIG = {
"num_train_epochs": 2,
"logging_steps": 25,
"eval_steps": 500,
"save_steps": 100,
"save_total_limit": 5,
"per_device_train_batch_size": 1,
"per_device_eval_batch_size": 1,
"gradient_accumulation_steps": 16,
"learning_rate": 5e-6,
"max_length": 8192, # max length in the history chat + latest assistant response.
"max_prompt_length": 6656, # max length in the history chat: user-assistant-...-assistant-user.
"gradient_checkpointing": True,
"weight_decay": 0.001,
"optim": "rmsprop",
"evaluation_strategy": "steps",
"lr_scheduler_type": "cosine",
"bf16": True,
}
Speeds, Sizes, Times
latest_logs = {'loss': 0.3716, 'grad_norm': 4.989994049072266, 'learning_rate': 1.0380020311950844e-10, 'rewards/chosen': 0.534086287021637, 'rewards/rejected': -0.6236276030540466, 'rewards/accuracies': 0.8899999856948853, 'rewards/margins': 1.1577140092849731, 'logps/rejected': -218.88198852539062, 'logps/chosen': -250.0700225830078, 'logits/rejected': -1.6214849948883057, 'logits/chosen': -1.9585875272750854, 'epoch': 1.99}
final_training_results = {'train_runtime': 30825.7138, 'train_samples_per_second': 1.432, 'train_steps_per_second': 0.089, 'train_loss': 0.483570138469306, 'epoch': 2.0}
Evaluation
Testing Data, Factors & Metrics
Testing Data
To assess the performance of Large Language Models (LLMs), we have developed and utilized several high - quality corpora tailored to specific evaluation needs:
- IIC/AQuAS: A manually curated corpus created by two computational linguists to evaluate language models in the task of Abstractive Question Answering in Spanish. It includes examples from domains such as finance, insurance, healthcare, law, and music.
- IIC/RagQuAS. Another manually curated corpus developed by the same linguists to evaluate full RAG systems and language models in Abst
🔧 Technical Details
The model was trained on a single A100 GPU with limited computational resources, yet achieved its current state in a relatively short time (8.5 hours). This feat was made possible by leveraging a high - quality dataset and employing advanced techniques such as LoRA to optimize memory usage.
📄 License
This model is licensed for non - commercial use. If you want to use it commercially, please contact us or use it through the service we offer from the AWS Marketplace. The license name is rigochat - nc
, and you can find the license details at license_link.

