Selfrag_llama2_7b Open-Source Model - Free Deployment, Diverse Query Outputs, Self-Criticism and Reflection

Selfrag Llama2 7b

Developed by selfrag

A 7-billion-parameter Self-RAG model capable of generating outputs for diverse user queries, adaptively invoking retrieval systems, self-criticizing outputs and retrieved passages, while generating reflection tokens.

Large Language Model

Transformers

Open Source License:MIT #Self-reflection generation #Retrieval-augmented generation #Multi-dimensional evaluation

Downloads 1,318

Release Time : 10/18/2023

Model Overview

The model is trained on an instruction-following corpus interleaved with passages and reflection tokens through standard next-token prediction objectives, achieving efficient and stable learning based on fine-grained feedback. During inference, it utilizes reflection tokens covering multiple dimensions of generated content to sample the optimal output that best aligns with user preferences.

Model Features

Adaptive retrieval invocation

The model automatically determines whether to invoke the retrieval system based on query needs, optimizing resource usage efficiency.

Self-critique mechanism

Generates reflection tokens during content creation to provide fine-grained quality feedback.

Retrieval-augmented generation

Combines retrieved passages to generate more accurate and factually grounded responses.

Fine-grained feedback learning

Utilizes reflection tokens during training to achieve stable learning based on multiple dimensions.

Model Capabilities

Text generation

Retrieval-augmented generation

Self-evaluation

Multi-turn dialogue

Fact-checking

Use Cases

Information query

Factual Q&A

Automatically retrieves relevant information when answering fact-based questions

Generates answers with cited passages and reliability scores

Content analysis

Text classification

Identifies and classifies different elements in input text

Outputs classification results with self-evaluation

🚀 Self-RAG 7B Model

This 7B Self-RAG model can generate outputs for various user queries. It also uses reflection tokens to adaptively call the retrieval system, and can self - critique its outputs and retrieved passages.

🚀 Quick Start

This model is a 7B Self-RAG model that can generate responses to a wide range of user queries. Additionally, it uses reflection tokens to adaptively invoke the retrieval system and self - evaluate its outputs and retrieved passages.

Self-RAG is trained on our instruction - following corpora with interleaved passages and reflection tokens using the standard next - token prediction objective. This enables efficient and stable learning with fine - grained feedback. At inference time, we utilize reflection tokens covering various aspects of generations to sample the best output that aligns with users' preferences. For full details, refer to our paper.

✨ Features

Generate diverse outputs for user queries.
Use reflection tokens for adaptive retrieval and self - critique.
Efficient and stable learning with fine - grained feedback during training.
Sampling of best outputs according to user preferences at inference.

📦 Installation

Make sure to install the dependencies listed at self-rag/requirements.txt.

💻 Usage Examples

Basic Usage

Here is an easy way to quickly download our model from HuggingFace and run it with vllm using pre - given passages.

from transformers import AutoTokenizer, AutoModelForCausalLM
from vllm import LLM, SamplingParams

model = LLM("selfrag/selfrag_llama2_7b", download_dir="/gscratch/h2lab/akari/model_cache", dtype="half")
sampling_params = SamplingParams(temperature=0.0, top_p=1.0, max_tokens=100, skip_special_tokens=False)

def format_prompt(input, paragraph=None):
  prompt = "### Instruction:\n{0}\n\n### Response:\n".format(input)
  if paragraph is not None:
    prompt += "[Retrieval]<paragraph>{0}</paragraph>".format(paragraph)
  return prompt

query_1 = "Leave odd one out: twitter, instagram, whatsapp."
query_2 = "Can you tell me the difference between llamas and alpacas?"
queries = [query_1, query_2]

preds = model.generate([format_prompt(query) for query in queries], sampling_params)
for pred in preds:
  print("Model prediction: {0}".format(pred.outputs[0].text))
# Model prediction: Twitter, Instagram, and WhatsApp are all social media platforms.[No Retrieval]WhatsApp is the odd one out because it is a messaging app, while Twitter and # Instagram are primarily used for sharing photos and videos.[Utility:5]</s> (this query doesn't require factual grounding; just skip retrieval and do normal instruction-following generation)
# Model prediction: Sure![Retrieval]<paragraph> ... (this query requires factual grounding, call a retriever)

# generate with retrieved passage
prompt = format_prompt("Can you tell me the difference between llamas and alpacas?", paragraph="The alpaca (Lama pacos) is a species of South American camelid mammal. It is similar to, and often confused with, the llama. Alpacas are considerably smaller than llamas, and unlike llamas, they were not bred to be working animals, but were bred specifically for their fiber.")
preds = model.generate([prompt], sampling_params)
print([pred.outputs[0].text for pred in preds])
# ['[Relevant]Alpacas are considerably smaller than llamas, and unlike llamas, they were not bred to be working animals, but were bred specifically for their fiber.[Fully supported][Utility:5]</s>']

Advanced Usage

To run our full inference pipeline with a retrieval system and fine - grained tree decoding, please use our code.

📚 Documentation

Input Format

As described in the format_prompt function, your input should follow one of the following formats:

### Instruction:\n{instruction}\n\n### Response:\n".format(instruction)

### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"

if you have additional input. You can insert paragraphs anywhere after ### Response:\n, but make sure to mark paragraphs as paragraph tokens (i.e., <paragraph>{0}</paragraph>).

Training details

Our training data is available at the HuggingFace dataset selfrag_train_data. For detailed training information, refer to our official repository. We used 8 A100 40GB GPUs for training on the Stability HPC server.

📄 License

This project is licensed under the MIT License.

🔧 Technical Details

Self-RAG is trained on instruction - following corpora with interleaved passages and reflection tokens using the standard next - token prediction objective. This allows for efficient and stable learning with fine - grained feedback. At inference, reflection tokens covering diverse aspects of generations are used to sample the best output that aligns with users' preferences.

📄 Citation and contact

If you use this model, please cite our work:

@article{asai2023selfrag,
  author    = {Asai, Akari and Wu, Zeqiu and Wang, Yizhong and Sil, Avirup and Hajishirzi, Hannaneh},
  title     = {{Self-RAG}: Learning to Retrieve, Generate, and Critique through Self-Reflection},
  year      = {2023},
  journal   = { arXiv preprint arXiv:2310.11511 },
  URL       = {https://arxiv.org/abs/2310.11511}
}

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご