SlimPLM Open-source Retrieval Necessity Judgment Model - Empowering Large Models for Precise Retrieval Timing and Content

Slimplm Retrieval Necessity Judgment

Developed by zstanjj

SlimPLM is a lightweight proxy model designed to determine when and what content to retrieve for large language models (LLMs).

Large Language Model

Transformers

#Retrieval Necessity Judgment #Query Rewriting Optimization #Lightweight Proxy Model

Downloads 26

Release Time : 1/25/2024

Model Overview

This model is primarily used for retrieval necessity judgment, helping to decide when information retrieval is needed for large language models.

Model Features

Lightweight Design

As a lightweight proxy model, it requires lower computational resources.

Retrieval Decision

Capable of intelligently determining when information retrieval is needed for LLMs.

Chinese Optimization

Specially optimized for Chinese language scenarios.

Model Capabilities

Retrieval Necessity Judgment

Query Analysis

Structured Parsing

Use Cases

Information Retrieval Systems

Retrieval Decision Support

Determining whether external knowledge retrieval is needed in Q&A systems.

Improves system efficiency by reducing unnecessary retrieval overhead.

🚀 SlimPLM

SlimPLM is a model with capabilities in retrieval necessity judgment and query rewriting. It provides practical solutions for parsing user input and related tasks.

🚀 Quick Start

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# construct prompt
question = "Who voices Darth Vader in Star Wars Episodes III-VI, IX Rogue One, and Rebels?"
heuristic_answer = "The voice of Darth Vader in Star Wars is provided by British actor James Earl Jones. He first voiced the character in the 1977 film \"Star Wars: Episode IV - A New Hope\", and his performance has been used in all subsequent Star Wars films, including the prequels and sequels."
prompt = (f"<s>[INST] <<SYS>>\nYou are a helpful assistant. Your task is to parse user input into"
          f" structured formats according to the coarse answer. Current datatime is 2023-12-20 9:47:28"
          f" <</SYS>>\n Course answer: (({heuristic_answer}))\nQuestion: (({question})) [/INST]")
params_query_rewrite = {"repetition_penalty": 1.05, "temperature": 0.01, "top_k": 1, "top_p": 0.85,
                        "max_new_tokens": 512, "do_sample": False, "seed": 2023}

# deploy model
model = AutoModelForCausalLM.from_pretrained("zstanjj/SlimPLM-Retrieval-Necessity-Judgment").eval()
if torch.cuda.is_available():
    model.cuda()
tokenizer = AutoTokenizer.from_pretrained("zstanjj/SlimPLM-Retrieval-Necessity-Judgment")

# run inference 
input_ids = tokenizer.encode(prompt.format(question=question, answer=heuristic_answer), return_tensors="pt")
len_input_ids = len(input_ids[0])
if torch.cuda.is_available():
    input_ids = input_ids.cuda()
outputs = model.generate(input_ids)
res = tokenizer.decode(outputs[0][len_input_ids:], skip_special_tokens=True)
print(res)

✨ Features

Latest News

[1/25/2024]: Retrieval Necessity Judgment Model released in Hugging Face.
[2/20/2024]: Query Rewriting Model released in Hugging Face.
[5/19/2024]: Our new work, Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs, has been accepted by ACL 2024 main conference.

📄 License

The model uses the Llama 2 license.

✏️ Citation

@inproceedings{Tan2024SmallMB,
  title={Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs},
  author={Jiejun Tan and Zhicheng Dou and Yutao Zhu and Peidong Guo and Kun Fang and Ji-Rong Wen},
  year={2024},
  url={https://arxiv.org/abs/2402.12052}
}