Starling-LM-7B-alpha Open-Source Large Language Model - Trained with RLAIF, Performs Well in MT Bench Test

Starling LM 7B Alpha

Developed by berkeley-nest

The first open-source large language model trained with AI Feedback Reinforcement Learning (RLAIF), demonstrating excellent performance in MT Bench tests

Large Language Model

Transformers

EnglishOpen Source License:Apache-2.0 #RLHF optimization #GPT4-level conversation #Multi-turn interaction

Downloads 9,765

Release Time : 11/25/2023

Model Overview

A language model fine-tuned based on Openchat 3.5, achieving high-performance dialogue capabilities through the Nectar dataset and reward training

Model Features

RLAIF Training

The first open-source large language model trained with AI Feedback Reinforcement Learning

High-performance Dialogue

Achieved a score of 8.09 in MT Bench tests, surpassing similar models

Multi-turn Dialogue Support

Supports complex multi-turn dialogue scenarios

Programming Assistance

Capable of code generation and solving programming problems

Model Capabilities

Text generation

Multi-turn dialogue

Code generation

Question answering

Use Cases

Intelligent Assistant

Daily Conversation

Engages in natural and fluent daily conversations

Achieved a score of 91.99 in AlpacaEval evaluation

Programming Assistance

Code Generation

Generates code in various programming languages based on requirements

Supports code implementation in multiple languages including C++

🚀 Starling-LM-7B-alpha

Starling-LM-7B-alpha is an open large language model trained by Reinforcement Learning from AI Feedback (RLAIF), leveraging a new ranking dataset and a novel training pipeline.

🚀 Quick Start

Starling-7B is an open large language model (LLM) trained via Reinforcement Learning from AI Feedback (RLAIF). It utilizes our new GPT - 4 labeled ranking dataset, berkeley - nest/Nectar, and a new reward training and policy tuning pipeline. Starling-7B-alpha scores 8.09 in MT Bench with GPT - 4 as a judge, outperforming every model to date on MT - Bench except for OpenAI's GPT - 4 and GPT - 4 Turbo.

We've released the ranking dataset Nectar, the reward model Starling - RM - 7B - alpha, and the language model Starling - LM - 7B - alpha on HuggingFace, along with an online demo in LMSYS Chatbot Arena. Stay tuned for our upcoming code and paper, which will offer more details on the entire process.

✨ Features

Model Type: Language Model finetuned with RLHF / RLAIF
Finetuned from: Openchat 3.5 (based on Mistral - 7B - v0.1)
License: Apache - 2.0 license under the condition that the model is not used to compete with OpenAI

Model Evaluation

Property	Details
Model Type	Language Model finetuned with RLHF / RLAIF
Training Data	berkeley - nest/Nectar

Model	Tuning Method	MT Bench	AlpacaEval	MMLU
GPT - 4 - Turbo	?	9.32	97.70
GPT - 4	SFT + PPO	8.99	95.28	86.4
Starling - 7B	C - RLFT + APA	8.09	91.99	63.9
Claude - 2	?	8.06	91.36	78.5
GPT - 3.5 - Turbo	?	7.94	89.37	70
Claude - 1	?	7.9	88.39	77
Tulu - 2 - dpo - 70b	SFT + DPO	7.89	95.1
Openchat - 3.5	C - RLFT	7.81	88.51	64.3
Zephyr - 7B - beta	SFT + DPO	7.34	90.60	61.4
Llama - 2 - 70b - chat - hf	SFT + PPO	6.86	92.66	63
Neural - chat - 7b - v3 - 1	SFT + DPO	6.84	84.53	62.4
Tulu - 2 - dpo - 7b	SFT + DPO	6.29	85.1

💻 Usage Examples

Basic Usage

import transformers

tokenizer = transformers.AutoTokenizer.from_pretrained("berkeley-nest/Starling-LM-7B-alpha")
model = transformers.AutoModelForCausalLM.from_pretrained("berkeley-nest/Starling-LM-7B-alpha")

def generate_response(prompt):
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    outputs = model.generate(
        input_ids,
        max_length=256,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
    response_ids = outputs[0]
    response_text = tokenizer.decode(response_ids, skip_special_tokens=True)
    return response_text

# Single-turn conversation
prompt = "Hello, how are you?"
single_turn_prompt = f"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:"
response_text = generate_response(single_turn_prompt)
print("Response:", response_text)

## Multi-turn conversation
prompt = "Hello"
follow_up_question =  "How are you today?"
response = ""
multi_turn_prompt = f"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: {response}<|end_of_turn|>GPT4 Correct User: {follow_up_question}<|end_of_turn|>GPT4 Correct Assistant:"
response_text = generate_response(multi_turn_prompt)
print("Multi-turn conversation response:", response_text)

### Coding conversation
prompt = "Implement quicksort using C++"
coding_prompt = f"Code User: {prompt}<|end_of_turn|>Code Assistant:"
response = generate_response(coding_prompt)
print("Coding conversation response:", response)

Advanced Usage

# The conversation template is the same as Openchat 3.5
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("openchat/openchat_3.5")

# Single-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

# Multi-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

# Coding Mode
tokens = tokenizer("Code User: Implement quicksort using C++<|end_of_turn|>Code Assistant:").input_ids
assert tokens == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 32000, 7596, 21631, 28747]

📄 License

The dataset, model, and online demo are a research preview intended for non - commercial use only, subject to the data distillation License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please contact us if you find any potential violation.

👏 Acknowledgment

We would like to thank Wei - Lin Chiang from Berkeley for detailed feedback on the blog and the projects. We also thank the LMSYS Organization for their support of the lmsys - chat - 1M dataset, evaluation, and online demo. Additionally, we're grateful to the open - source community for providing the datasets and base models we used to develop the project, including but not limited to Anthropic, Llama, Mistral, Hugging Face H4, LMSYS, OpenChat, OpenBMB, Flan, and ShareGPT.

📚 Citation

@misc{starling2023,
    title = {Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF},
    url = {},
    author = {Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Jiao, Jiantao},
    month = {November},
    year = {2023}
}

⚠️ Important Note

Please use the exact chat template provided for the model. Otherwise, there will be a degrade in the performance. The model output can be verbose in rare cases. Please consider setting temperature = 0 to make this happen less.

💡 Usage Tip

Our model follows the exact chat template and usage as Openchat 3.5. Please refer to their model card for more details. In addition, our model is hosted on LMSYS Chatbot Arena for free test.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご