RigoChat-7b-v2 Open-source AI Model - Focused on Spanish Queries, Providing Precise Responses and Free to Use!

Rigochat 7b V2

Developed by IIC

RigoChat-7b-v2 is a Spanish-optimized model based on Qwen-2.5, with performance enhanced through DPO fine-tuning, focusing on accurate responses to Spanish queries.

Large Language Model

Transformers

SpanishOpen Source License:Other #Spanish language optimization #RAG enhancement #Low-resource fine-tuning

Downloads 776

Release Time : 11/22/2024

Model Overview

This model is the second version of the RigoChat family, designed to handle typical NLP tasks for Spanish instructions, such as tool usage, summarization, mathematics, coding, abstract Q&A, etc.

Model Features

Spanish performance optimization

Significantly improves performance on Spanish tasks through Direct Preference Optimization (DPO) fine-tuning.

Reduced hallucinations

Enhances safety and reduces hallucinations in Spanish RAG systems.

Hardware adaptability

Can be used on hardware with limited computational power, offering GGUF versions to meet various needs.

Efficient training

Optimizes memory usage with LoRA technology, completing training in just 8.5 hours on a single A100 GPU.

Model Capabilities

Spanish text generation

Tool usage

Abstract Q&A

Multi-turn dialogue

RAG system support

Use Cases

Customer service

Insurance policy consultation

Answering complex user questions about insurance policies

Scored 81.04 in multi-turn dialogue evaluations

Retail customer service

Handling customer inquiries for a clothing company

Scored 80.79 in multi-turn dialogue evaluations

Knowledge Q&A

Cross-domain Q&A

Answering professional questions in fields such as finance, healthcare, and law

Scored 82.52 in AQuAS evaluations

RAG system

Providing accurate answers based on retrieved content

Scored 79.10 in RagQuAS evaluations

🚀 RigoChat-7b-v2

RigoChat-7b-v2 is a Qwen-2.5-based model that offers accurate responses to Spanish queries, fine - tuned for enhanced Spanish - language performance.

🚀 Quick Start

RigoChat-7b-v2 is a model based on Qwen/Qwen2.5-7B-Instruct and fine - tuned with Direct Preference Optimization (DPO) for better performance in Spanish.

To load the model and tokenizer

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)
import torch

model_name = "IIC/RigoChat-7b-v2"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="cuda",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)

Sample generation

messages = [
    {"role": "user", "content": "¿Cómo puedo transformar un diccionario de listas en una lista de diccionarios, y viceversa, en Python sin utilizar bucles for?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

For a better experience, we recommend to use the following generation parameters.

Tool Use

def obtener_temperatura_actual(location: str) -> float:
    """
    Obtener la temperatura actual de una localización.
    
    Args:
        location: La localización, con el siguiente formato: "Ciudad, País."
    Returns:
        El tiempo en dicha localización, en grados Celsius.
    """
    return 22.


messages = [
  {"role": "user", "content": "¿Cuál es el tiempo en Madrid ahora mismo?"}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    tools=[obtener_temperatura_actual],
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Check the tool use documentation from HuggingFace for more information.

If the model generates a tool call, you should add it to the chat like so:

import re
import json

tools = {
    "obtener_temperatura_actual" : obtener_temperatura_actual,
}

tool_call = re.search(
    r"<tool_call>\s*(\{.*?\})\s*</tool_call>",
    response,
)
tool_call = json.loads(tool_call.group(1))

# Add tool metadata to messages
messages.append(
    {
        "role": "assistant",
        "tool_calls": [{"type": "function", "function": tool_call}],
    },
)

# Add tool result to messages
messages.append(
    {
        "role": "tool",
        "name": tool_call["name"],
        "content": tools[tool_call["name"]](**tool_call["arguments"]),
    },
)

The above code is intended only for when the model generates a function call, but the same logic can be used if several functions are called at the same time. After that, you can continue to generate messages as normal:

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    tools=[obtener_temperatura_actual],
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

✨ Features

Improved performance on generalist tasks in Spanish.
Enhanced safety and reduced hallucinations in RAG systems with Spanish texts.
Possibility of using it in different hardware requirements, especially those with reduced computational capacity. For more information on how to use RigoChat-7b-v2 on reduced hardware, see IIC/RigoChat-7b-v2-GGUF.

📦 Installation

No specific installation steps are provided in the original document.

📚 Documentation

Model Details

Model Description

This model is the second version of RigoChat, a family of Large Language Models (LLMs) designed to solve typical NLP tasks with Spanish instructions such as: Tool Use, Summarization, Math, Code, Abstractive - QA, etc. Like Qwen/Qwen2.5-7B-Instruct, this model has no specific use case and can be applied to a wide range of tasks. Indeed, it offers a slight improvement for generalist tasks in Spanish, particularly in RAG (Retriever Augmented Generation) systems with Spanish databases, as its training focused on resolving questions about contexts to prevent hallucinations and ensure safe responses.

Property	Details
Developed by	Instituto de Ingeniería del Conocimiento (IIC).
Model Type	Generative Fine - tuned Transformer.
Language(s) (NLP)	Spanish (BCP - 47 es).
License	RIGOCHAT NON - COMMERCIAL.
Architecture	We use Qwen's architecture without modifications.
Finetuned from model	Qwen/Qwen2.5-7B-Instruct.

Model Sources

Paper: https://arxiv.org/abs/2503.08188

Uses

Direct Use

You can use and deploy RigoChat-v2 for commercial purposes through a model package from AWS Marketplace. You can check the instructions inside the following notebook.

Out-of-Scope Use

This language model has been adapted for general natural language processing tasks in Spanish and specific use cases such as RAG. However, there are several cases where the model should not be used due to its technical and ethical limitations:

Illegal Activities: The model should not be used to generate content related to illegal activities, such as creating malicious software, fraud, incitement to crime, or any illegal material.
Harmful or Dangerous Content: It should not be used to generate hate speech, violence, harassment, or any content that promotes discrimination, violence, or abuse.

Bias, Risks, and Limitations

Although this model has been trained to understand and generate text in Spanish, there are several risks, biases, and limitations that users should be aware of:

Biases: The model may reflect biases present in the training data. These biases could be related to gender, race, social class, sexual orientation, among others, and may generate responses that perpetuate stereotypes or discrimination.
Accuracy and Reliability: While the model generates coherent and useful text in many contexts, it may not always be 100% accurate or reliable, especially in technical, scientific, or legal matters where high certainty is required.
Limited or Outdated Knowledge: The model is not trained with information beyond its training cutoff date. Therefore, it may not reflect recent events, research, or advancements.

Recommendations

We recommend using this model as a general chatbot or within applications designed for specific tasks, such as SQL queries, RAG systems, or as an autonomous agent to facilitate the use of tools.

Training Details

Training Data

A combination of both public and private datasets designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format chatml and has the same structure as the Anthropic/hh-rlhf dataset. Each conversation has two variants: chosen and rejected, and only differs the last answer of the assistant. The last answer in the chosen variant is considered a better answer than the one in the rejected variant. Different techniques have been used to generate the dataset, which we explain in depth in the research (coming soon).

Training Procedure

We use the Transformer Reinforcement Learning (TRL) library. Specifically, we have applied the script they have published as an example for using DPO to the dataset we have generated.

Training Hyperparameters

Details

LORA_CONFIG = {
    "r": 64,
    "lora_alpha": 16,
    "lora_dropout": 0.1,
    "bias": "none",
    "task_type": "CAUSAL_LM",
    "target_modules": [
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "up_proj",
        "gate_proj",
        "down_proj",
    ],
    "use_rslora": True,
}

DPO_CONFIG = {
    "num_train_epochs": 2,
    "logging_steps": 25,
    "eval_steps": 500,
    "save_steps": 100,
    "save_total_limit": 5,
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 1,
    "gradient_accumulation_steps": 16,
    "learning_rate": 5e-6,
    "max_length": 8192, # max length in the history chat + latest assistant response.
    "max_prompt_length": 6656, # max length in the history chat: user-assistant-...-assistant-user.
    "gradient_checkpointing": True,
    "weight_decay": 0.001,
    "optim": "rmsprop",
    "evaluation_strategy": "steps",
    "lr_scheduler_type": "cosine",
    "bf16": True,
}

Speeds, Sizes, Times

latest_logs = {'loss': 0.3716, 'grad_norm': 4.989994049072266, 'learning_rate': 1.0380020311950844e-10, 'rewards/chosen': 0.534086287021637, 'rewards/rejected': -0.6236276030540466, 'rewards/accuracies': 0.8899999856948853, 'rewards/margins': 1.1577140092849731, 'logps/rejected': -218.88198852539062, 'logps/chosen': -250.0700225830078, 'logits/rejected': -1.6214849948883057, 'logits/chosen': -1.9585875272750854, 'epoch': 1.99}

final_training_results = {'train_runtime': 30825.7138, 'train_samples_per_second': 1.432, 'train_steps_per_second': 0.089, 'train_loss': 0.483570138469306, 'epoch': 2.0}

Evaluation

Testing Data, Factors & Metrics

Testing Data

To assess the performance of Large Language Models (LLMs), we have developed and utilized several high - quality corpora tailored to specific evaluation needs:

IIC/AQuAS: A manually curated corpus created by two computational linguists to evaluate language models in the task of Abstractive Question Answering in Spanish. It includes examples from domains such as finance, insurance, healthcare, law, and music.
IIC/RagQuAS. Another manually curated corpus developed by the same linguists to evaluate full RAG systems and language models in Abst

🔧 Technical Details

The model was trained on a single A100 GPU with limited computational resources, yet achieved its current state in a relatively short time (8.5 hours). This feat was made possible by leveraging a high - quality dataset and employing advanced techniques such as LoRA to optimize memory usage.

📄 License

This model is licensed for non - commercial use. If you want to use it commercially, please contact us or use it through the service we offer from the AWS Marketplace. The license name is rigochat - nc, and you can find the license details at license_link.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご