Ko-reranker Open-source Model - A Practical Tool to Improve the Performance of Korean Retrieval Augmented Generation (RAG)

Ko Reranker

Developed by Dongjin-kr

A Reranker model fine-tuned on Korean data based on BAAI/bge-reranker-large, designed to enhance Korean Retrieval-Augmented Generation (RAG) performance

Text Embedding

Transformers

Supports Multiple LanguagesOpen Source License:MIT #Korean Reranking #Cross-Language Retrieval #Semantic Relevance Scoring

Downloads 34.08k

Release Time : 12/22/2023

Model Overview

This model is a Korean Reranker, fine-tuned from the BAAI/bge-reranker-large model, specifically designed for Korean text relevance scoring tasks. Unlike embedding models, it directly outputs similarity scores between questions and documents.

Model Features

Korean Optimization

Fine-tuned specifically for Korean data to improve Korean text relevance scoring performance

Direct Score Output

Unlike embedding models, it directly outputs similarity scores between questions and documents

Unbounded Scoring

Optimized with cross-entropy loss, relevance scores are not constrained to a specific range

SageMaker Compatibility

Provides comprehensive Amazon SageMaker training and deployment guidelines

Model Capabilities

Korean Text Relevance Scoring

Cross-Language Text Relevance Scoring (Korean-English)

Retrieval Result Reranking

Use Cases

Information Retrieval

Retrieval-Augmented Generation (RAG)

Rerank retrieval results in RAG systems to improve answer quality

Context accuracy improved to 0.96, mean reciprocal rank (mrr) improved to 0.87

Search Engine Optimization

Rerank search engine results by relevance

Question Answering Systems

Intelligent Customer Service

Rank candidate answers by relevance in customer service systems

🚀 Korean Reranker Training on Amazon SageMaker

This guide provides fine-tuning instructions for developing a Korean reranker. The ko-reranker is a fine-tuned model for Korean data based on BAAI/bge-reranker-larger. For more details, please refer to korean-reranker-git and AWS Blog, Boosting Retrieval Augmented Generation (RAG) Performance with Korean Reranker.

✨ Features

Unlike embedding models, rerankers take questions and documents as inputs and directly output similarity scores instead of embeddings.
By inputting questions and passages into the reranker, you can obtain relevance scores.
Since rerankers are optimized based on CrossEntropy loss, the relevance scores are not limited to a specific range.

📦 Installation

No specific installation steps are provided in the original document, so this section is skipped.

💻 Usage Examples

Basic Usage

def exp_normalize(x):
    b = x.max()
    y = np.exp(x - b)
    return y / y.sum()

from transformers import AutoModelForSequenceClassification, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
model.eval()

pairs = [["나는 너를 싫어해", "나는 너를 사랑해"], \
         ["나는 너를 좋아해", "너에 대한 나의 감정은 사랑 일 수도 있어"]]

with torch.no_grad():
    inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors='pt', max_length=512)
    scores = model(**inputs, return_dict=True).logits.view(-1, ).float()
    scores = exp_normalize(scores.numpy())
    print (f'first: {scores[0]}, second: {scores[1]}')

Advanced Usage

import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'Dongjin-kr/ko-reranker',
    'HF_TASK':'text-classification'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.28.1',
    pytorch_version='2.0.0',
    py_version='py310',
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.g5.large' # ec2 instance type
)

runtime_client = boto3.Session().client('sagemaker-runtime')
payload = json.dumps(
    {
        "inputs": [
            {"text": "나는 너를 싫어해", "text_pair": "나는 너를 사랑해"},
            {"text": "나는 너를 좋아해", "text_pair": "너에 대한 나의 감정은 사랑 일 수도 있어"}
        ]
    }
)

response = runtime_client.invoke_endpoint(
    EndpointName="<endpoint-name>",
    ContentType="application/json",
    Accept="application/json",
    Body=payload
)

## deserialization
out = json.loads(response['Body'].read().decode()) ## for json
print (f'Response: {out}')

📚 Documentation

Background

Context order affects accuracy (Lost in Middle, Liu et al., 2023).
Reasons for using rerankers:
- Currently, adding more context to LLMs does not necessarily improve performance. Relevant information should be ranked higher for better answers.
- The similarity (relevance) scores used in semantic search are not precise. (i.e., does the top-ranked result always contain more relevant information to the question than the lower-ranked ones?)
  - Embeddings are specialized in capturing the meaning behind documents.
  - Questions and answers may not have the same meaning. (Hypothetical Document Embeddings)
  - There is a penalty associated with using ANNs (Approximate Nearest Neighbors).

Reranker models

Dataset

msmarco-triplets:
- (Question, Answer, Negative)-Triplets from the MS MARCO Passages dataset, with 499,184 samples.
- The dataset is in English and was translated using Amazon Translate.
Format:

{"query": str, "pos": List[str], "neg": List[str]}

The query is the question, "pos" is a list of positive texts, and "neg" is a list of negative texts. If there are no negative texts for a query, some can be randomly selected from the entire corpus.

Example:

{"query": "대한민국의 수도는?", "pos": ["미국의 수도는 워싱턴이고, 일본은 도쿄이며 한국은 서울이다."], "neg": ["미국의 수도는 워싱턴이고, 일본은 도쿄이며 북한은 평양이다."]}

Performance

Model	has-right-in-contexts	mrr (mean reciprocal rank)
without-reranker (default)	0.93	0.80
with-reranker (bge-reranker-large)	0.95	0.84
with-reranker (fine-tuned using korean)	0.96	0.87

Evaluation set:

./dataset/evaluation/eval_dataset.csv

Training parameters:

{
    "learning_rate": 5e-6,
    "fp16": True,
    "num_train_epochs": 3,
    "per_device_train_batch_size": 1,
    "gradient_accumulation_steps": 32,
    "train_group_size": 3,
    "max_len": 512,
    "weight_decay": 0.01
}

🔧 Technical Details

No specific technical details are provided in the original document, so this section is skipped.

📄 License

FlagEmbedding is licensed under the MIT License.

Acknowledgement

Part of the code is developed based on FlagEmbedding and KoSimCSE-SageMaker.

Citation

If you find this repository useful, please consider giving a like ⭐ and citation.

Contributors

Dongjin Jang, Ph.D. (AWS AI/ML Specislist Solutions Architect) | Mail | Linkedin | Git

Analytics

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご