EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval Open-source Model - Free Evaluation of the Correctness of RAG Retrieval Context

EEVE Korean Instruct 10.8B V1.0 Grade Retrieval

Developed by sinjy1203

This model is fine-tuned based on EEVE-Korean-Instruct-10.8B-v1.0, designed to evaluate whether the retrieved context in RAG (Retrieval-Augmented Generation) is correct for a given question, answering with 'yes' or 'no'.

Large Language Model

Transformers

KoreanOpen Source License:Apache-2.0 #Korean RAG Evaluation #Retrieval Quality Scoring #High-Precision Classification

Downloads 1,764

Release Time : 6/4/2024

Model Overview

This model is specifically used to assess whether the retrieved contextual information in a Retrieval-Augmented Generation (RAG) system is sufficient to answer a question, outputting a simple 'yes' or 'no' judgment.

Model Features

Retrieved Context Evaluation

Accurately evaluates whether the retrieved contextual information is sufficient to answer a given question.

Concise Output

Outputs only 'yes' or 'no', making it easy for system integration and processing.

Korean Optimization

Specifically optimized for Korean content and questions.

Model Capabilities

Text Classification

Retrieval Quality Evaluation

Korean Understanding

Use Cases

Retrieval-Augmented Generation Systems

RAG System Retrieval Quality Monitoring

Automatically evaluates whether the retrieved context in a RAG system is relevant and sufficient.

Improves the overall answer quality of the system.

Knowledge Base Retrieval Optimization

Evaluates the quality of knowledge base retrieval results to help optimize retrieval algorithms.

Enhances knowledge base retrieval accuracy.

🚀 EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval

This model is fine - tuned to evaluate whether the retrieved context for a question in RAG is correct, providing a yes - or - no answer.

🚀 Quick Start

This model has been fine - tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer.

The base model for this model is yanolja/EEVE-Korean-Instruct-10.8B-v1.0.

✨ Features

Fine - Tuned for RAG Evaluation: Capable of evaluating if the retrieved context for a question in RAG is correct with a simple yes or no answer.
Based on a Strong Foundation: Built upon the yanolja/EEVE-Korean-Instruct-10.8B-v1.0 base model.

📦 Installation

The installation mainly involves setting up the necessary Python libraries. You need to have torch and transformers installed. You can install them using the following commands:

pip install torch
pip install transformers

💻 Usage Examples

Basic Usage

import torch
from transformers import (
    BitsAndBytesConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
)

model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'}
)

prompt_template = '주어진 질문과 정보가 주어졌을 때 질문에 답하기에 충분한 정보인지 평가해줘.\n정보가 충분한지를 평가하기 위해 "예" 또는 "아니오"로 답해줘.\n\n### 질문:\n{question}\n\n### 정보:\n{context}\n\n### 평가:\n'
query = {
    "question": "동아리 종강총회가 언제인가요?",
    "context": "종강총회 날짜는 6월 21일입니다."
}

model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt')
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200)
print(output)

Example Output

주어진 질문과 정보가 주어졌을 때 질문에 답하기에 충분한 정보인지 평가해줘.
정보가 충분한지를 평가하기 위해 "예" 또는 "아니오"로 답해줘.

### 질문:
동아리 종강총회가 언제인가요?

### 정보:
종강총회 날짜는 6월 21일입니다.

### 평가:
예<|end_of_text|>

📚 Documentation

Prompt Template

주어진 질문과 정보가 주어졌을 때 질문에 답하기에 충분한 정보인지 평가해줘.
정보가 충분한지를 평가하기 위해 "예" 또는 "아니오"로 답해줘. 

### 질문: 
{question}

### 정보: 
{context}

### 평가:

Training Data

Referenced generated_instruction by stanford_alpaca
use yanolja/EEVE-Korean-Instruct-10.8B-v1.0 as the model for question generation.

📊 Metrics

Korean LLM Benchmark

Model	Average	Ko - ARC	Ko - HellaSwag	Ko - MMLU	Ko - TruthfulQA	Ko - CommonGen V2
EEVE-Korean-Instruct-10.8B-v1.0	56.08	55.2	66.11	56.48	49.14	53.48
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval	56.1	55.55	65.95	56.24	48.66	54.07

Generated Dataset

Model	Accuracy	F1	Precision	Recall
EEVE-Korean-Instruct-10.8B-v1.0	0.824	0.800	0.885	0.697
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval	0.892	0.875	0.903	0.848

📄 License

This project is licensed under the Apache - 2.0 license.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご