š EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval
This model is fine - tuned to evaluate whether the retrieved context for a question in RAG is correct, providing a yes - or - no answer.
š Quick Start
This model has been fine - tuned to evaluate whether the retrieved context for a question in RAG is correct with a yes or no answer.
The base model for this model is yanolja/EEVE-Korean-Instruct-10.8B-v1.0.
⨠Features
- Fine - Tuned for RAG Evaluation: Capable of evaluating if the retrieved context for a question in RAG is correct with a simple yes or no answer.
- Based on a Strong Foundation: Built upon the yanolja/EEVE-Korean-Instruct-10.8B-v1.0 base model.
š¦ Installation
The installation mainly involves setting up the necessary Python libraries. You need to have torch
and transformers
installed. You can install them using the following commands:
pip install torch
pip install transformers
š» Usage Examples
Basic Usage
import torch
from transformers import (
BitsAndBytesConfig,
AutoModelForCausalLM,
AutoTokenizer,
)
model_path = "sinjy1203/EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path, quantization_config=nf4_config, device_map={'': 'cuda:0'}
)
prompt_template = '주ģ“ģ§ ģ§ė¬øź³¼ ģ ė³“ź° ģ£¼ģ“ģ”ģ ė ģ§ė¬øģ ėµķźø°ģ ģ¶©ė¶ķ ģ 볓ģøģ§ ķź°ķ“ģ¤.\nģ ė³“ź° ģ¶©ė¶ķģ§ė„¼ ķź°ķźø° ģķ“ "ģ" ėė "ģėģ¤"ė” ėµķ“ģ¤.\n\n### ģ§ė¬ø:\n{question}\n\n### ģ 볓:\n{context}\n\n### ķź°:\n'
query = {
"question": "ėģ리 ģ¢
ź°ģ“ķź° ģøģ ģøź°ģ?",
"context": "ģ¢
ź°ģ“ķ ė ģ§ė 6ģ 21ģ¼ģ
ėė¤."
}
model_inputs = tokenizer(prompt_template.format_map(query), return_tensors='pt')
output = model.generate(**model_inputs, max_new_tokens=100, max_length=200)
print(output)
Example Output
주ģ“ģ§ ģ§ė¬øź³¼ ģ ė³“ź° ģ£¼ģ“ģ”ģ ė ģ§ė¬øģ ėµķźø°ģ ģ¶©ė¶ķ ģ 볓ģøģ§ ķź°ķ“ģ¤.
ģ ė³“ź° ģ¶©ė¶ķģ§ė„¼ ķź°ķźø° ģķ“ "ģ" ėė "ģėģ¤"ė” ėµķ“ģ¤.
### ģ§ė¬ø:
ėģ리 ģ¢
ź°ģ“ķź° ģøģ ģøź°ģ?
### ģ 볓:
ģ¢
ź°ģ“ķ ė ģ§ė 6ģ 21ģ¼ģ
ėė¤.
### ķź°:
ģ<|end_of_text|>
š Documentation
Prompt Template
주ģ“ģ§ ģ§ė¬øź³¼ ģ ė³“ź° ģ£¼ģ“ģ”ģ ė ģ§ė¬øģ ėµķźø°ģ ģ¶©ė¶ķ ģ 볓ģøģ§ ķź°ķ“ģ¤.
ģ ė³“ź° ģ¶©ė¶ķģ§ė„¼ ķź°ķźø° ģķ“ "ģ" ėė "ģėģ¤"ė” ėµķ“ģ¤.
### ģ§ė¬ø:
{question}
### ģ 볓:
{context}
### ķź°:
Training Data
š Metrics
Korean LLM Benchmark
Model |
Average |
Ko - ARC |
Ko - HellaSwag |
Ko - MMLU |
Ko - TruthfulQA |
Ko - CommonGen V2 |
EEVE-Korean-Instruct-10.8B-v1.0 |
56.08 |
55.2 |
66.11 |
56.48 |
49.14 |
53.48 |
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval |
56.1 |
55.55 |
65.95 |
56.24 |
48.66 |
54.07 |
Generated Dataset
Model |
Accuracy |
F1 |
Precision |
Recall |
EEVE-Korean-Instruct-10.8B-v1.0 |
0.824 |
0.800 |
0.885 |
0.697 |
EEVE-Korean-Instruct-10.8B-v1.0-Grade-Retrieval |
0.892 |
0.875 |
0.903 |
0.848 |
š License
This project is licensed under the Apache - 2.0 license.