Starling LM 11B Alpha

CallComplyによって開発

Starling-7Bは、AIフィードバック強化学習（RLAIF）でトレーニングされたオープンソースの大規模言語モデルで、Openchat 3.5をベースにファインチューニングされており、MT Benchで優れたパフォーマンスを発揮します。

大規模言語モデル

Transformers

英語#RLHFファインチューニング #マルチタスクテキスト生成 #GPT-4レベルの性能

ダウンロード数 103

リリース時間 : 12/3/2023

モデル概要

Starling-7Bは、RLHF/RLAIFでファインチューニングされた言語モデルで、主にテキスト生成タスクに使用され、高い対話と推論能力を持っています。

モデル特徴

RLAIFトレーニング

AIフィードバック強化学習（RLAIF）を使用してファインチューニングされ、モデルの対話と推論能力が向上しました。

高性能

MT BenchでGPT-4を審査員として8.09点を獲得し、OpenAIのGPT-4とGPT-4 Turboを除くすべての既存モデルを凌駕しました。

オープンソース

モデル、ランキングデータセット、報酬モデルはすべてオープンソース化されており、研究や応用が容易です。

モデル能力

テキスト生成

対話システム

推論タスク

使用事例

対話システム

インテリジェントカスタマーサポート

高性能なインテリジェントカスタマーサポートシステムの構築に使用され、自然で流暢な対話体験を提供します。

教育

学習アシスタント

学生の質問に答え、学習アドバイスやリソースの推奨を提供します。

language:

en license: cc-by-nc-4.0 library_name: transformers tags:
reward model
RLHF
RLAIF datasets:
berkeley-nest/Nectar model-index:
name: Starling-LM-11B-alpha results:
- task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics:
  - type: acc_norm value: 61.26 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CallComply/Starling-LM-11B-alpha name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics:
  - type: acc_norm value: 81.99 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CallComply/Starling-LM-11B-alpha name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics:
  - type: acc value: 61.5 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CallComply/Starling-LM-11B-alpha name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics:
  - type: mc2 value: 41.53 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CallComply/Starling-LM-11B-alpha name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics:
  - type: acc value: 78.06 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CallComply/Starling-LM-11B-alpha name: Open LLM Leaderboard
- task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics:
  - type: acc value: 35.18 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=CallComply/Starling-LM-11B-alpha name: Open LLM Leaderboard

Starling-LM-7B-alpha

Developed by: Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao.
Model type: Language Model finetuned with RLHF / RLAIF
License: Non commercial license
Finetuned from model: Openchat 3.5 (based on Mistral-7B-v0.1)

We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset, berkeley-nest/Nectar, and our new reward training and policy tuning pipeline. Starling-7B-alpha scores 8.09 in MT Bench with GPT-4 as a judge, outperforming every model to date on MT-Bench except for OpenAI's GPT-4 and GPT-4 Turbo. We release the ranking dataset Nectar, the reward model Starling-RM-7B-alpha and the language model Starling-LM-7B-alpha on HuggingFace, and an online demo in LMSYS Chatbot Arena. Stay tuned for our forthcoming code and paper, which will provide more details on the whole process.

Starling-LM-7B-alpha is a language model trained from Openchat 3.5 with reward model berkeley-nest/Starling-RM-7B-alpha and policy optimization method advantage-induced policy alignment (APA). The evaluation results are listed below.

Model	Tuning Method	MT Bench	AlpacaEval	MMLU
GPT-4-Turbo	?	9.32	97.70
GPT-4	SFT + PPO	8.99	95.28	86.4
Starling-7B	C-RLFT + APA	8.09	91.99	63.9
Claude-2	?	8.06	91.36	78.5
GPT-3.5-Turbo	?	7.94	89.37	70
Claude-1	?	7.9	88.39	77
Tulu-2-dpo-70b	SFT + DPO	7.89	95.1
Openchat-3.5	C-RLFT	7.81	88.51	64.3
Zephyr-7B-beta	SFT + DPO	7.34	90.60	61.4
Llama-2-70b-chat-hf	SFT + PPO	6.86	92.66	63
Neural-chat-7b-v3-1	SFT + DPO	6.84	84.53	62.4
Tulu-2-dpo-7b	SFT + DPO	6.29	85.1

For more detailed discussions, please check out our blog post, and stay tuned for our upcoming code and paper!

Blog: https://starling.cs.berkeley.edu/
Paper: Coming soon!
Code: Coming soon!

Uses

Important: Please use the exact chat template provided below for the model. Otherwise there will be a degrade in the performance. The model output can be verbose in rare cases. Please consider setting temperature = 0 to make this happen less.

Our model follows the exact chat template and usage as Openchat 3.5. Please refer to their model card for more details. In addition, our model is hosted on LMSYS Chatbot Arena for free test.

The conversation template is the same as Openchat 3.5:

import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("openchat/openchat_3.5")

# Single-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

# Multi-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]

# Coding Mode
tokens = tokenizer("Code User: Implement quicksort using C++<|end_of_turn|>Code Assistant:").input_ids
assert tokens == [1, 7596, 1247, 28747, 26256, 2936, 7653, 1413, 334, 1680, 32000, 7596, 21631, 28747]

Code Examples

import transformers

tokenizer = transformers.AutoTokenizer.from_pretrained("berkeley-nest/Starling-LM-7B-alpha")
model = transformers.AutoModelForCausalLM.from_pretrained("berkeley-nest/Starling-LM-7B-alpha")

def generate_response(prompt):
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    outputs = model.generate(
        input_ids,
        max_length=256,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )
    response_ids = outputs[0]
    response_text = tokenizer.decode(response_ids, skip_special_tokens=True)
    return response_text

# Single-turn conversation
prompt = "Hello, how are you?"
single_turn_prompt = f"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant:"
response_text = generate_response(single_turn_prompt)
print("Response:", response_text)

## Multi-turn conversation
prompt = "Hello"
follow_up_question =  "How are you today?"
response = ""
multi_turn_prompt = f"GPT4 Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: {response}<|end_of_turn|>GPT4 Correct User: {follow_up_question}<|end_of_turn|>GPT4 Correct Assistant:"
response_text = generate_response(multi_turn_prompt)
print("Multi-turn conversation response:", response_text)

### Coding conversation
prompt = "Implement quicksort using C++"
coding_prompt = f"Code User: {prompt}<|end_of_turn|>Code Assistant:"
response = generate_response(coding_prompt)
print("Coding conversation response:", response)

License

The dataset, model and online demo is a research preview intended for non-commercial use only, subject to the data distillation License of LLaMA, Terms of Use of the data generated by OpenAI, and Privacy Practices of ShareGPT. Please contact us if you find any potential violation.

Acknowledgment

We would like to thank Wei-Lin Chiang from Berkeley for detailed feedback of the blog and the projects. We would like to thank the LMSYS Organization for their support of lmsys-chat-1M dataset, evaluation and online demo. We would like to thank the open source community for their efforts in providing the datasets and base models we used to develope the project, including but not limited to Anthropic, Llama, Mistral, Hugging Face H4, LMSYS, OpenChat, OpenBMB, Flan and ShareGPT.

Citation

@misc{starling2023,
    title = {Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF},
    url = {},
    author = {Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Jiao, Jiantao},
    month = {November},
    year = {2023}
}

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	59.92
AI2 Reasoning Challenge (25-Shot)	61.26
HellaSwag (10-Shot)	81.99
MMLU (5-Shot)	61.50
TruthfulQA (0-shot)	41.53
Winogrande (5-shot)	78.06
GSM8k (5-shot)	35.18