Piccolo-math-2x7b開源大語言模型 - 助力數學與代碼生成推理任務

首頁

Piccolo Math 2x7b

由macadeliccc開發

Piccolo-math-2x7b 是一個專注於數學和邏輯推理的大語言模型，以紀念作者的寵物狗克勞斯命名。該模型在多個基準測試中表現出色，尤其在數學和代碼生成任務上。

大型語言模型

Transformers

開源協議:MIT #數學推理 #邏輯分析 #多任務評估

下載量 87

發布時間 : 1/16/2024

模型概述

Piccolo-math-2x7b 是一個基於 Transformer 架構的大語言模型，專注於數學、代碼生成和邏輯推理任務。它支持高質量的文本生成，並在多個標準評估數據集上取得了優異成績。

模型特點

數學推理能力

在GSM8k數學推理基準測試中達到70.13%準確率，顯著優於同類基礎模型

多任務處理

在文本生成、邏輯推理和代碼生成等多種任務上表現均衡

高效推理

支持4-bit量化加載，降低硬件需求同時保持較好性能

模型能力

數學問題求解

代碼生成

邏輯推理

常識問答

文本生成

使用案例

教育

數學輔導

幫助學生解決數學問題並解釋解題步驟

在GSM8k測試集上達到70.13%準確率

開發輔助

代碼生成

根據自然語言描述生成代碼片段

示例顯示可處理高質量代碼生成

🚀 Piccolo-math-2x7b

Piccolo-math-2x7b 是一款具備高質量代碼、數學和邏輯推理能力的模型。本項目以紀念作者的愛犬 Klaus（暱稱為 Piccolo）為初衷，為用戶提供強大的文本生成服務。

🚀 快速開始

你可以通過以下 Colab 鏈接進行推理和評估：點擊此處

💻 使用示例

基礎用法

from transformers import AutoModelForCausalLM, AutoTokenizer

def generate_response(prompt):
    """
    Generate a response from the model based on the input prompt.
    Args:
    prompt (str): Prompt for the model.

    Returns:
    str: The generated response from the model.
    """
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return response

model_id = "macadeliccc/piccolo-math-2x7b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,load_in_4bit=True)

prompt = "What is the best way to train Cane Corsos?"

print("Response:")
print(generate_response(prompt), "\n")

該模型能夠進行高質量的代碼、數學和邏輯推理。你可以嘗試提出任何你想到的問題。

📚 詳細文檔

評估結果

模型	AGIEval	GPT4All	TruthfulQA	Bigbench	平均得分
piccolo-math-2x7b	43.89	74.98	63.96	44.99	56.96

EQ Bench

基準測試完成時間：2024-01-24 00:00:40
耗時：183.3 分鐘
提示格式：Mistral
模型：macadeliccc/piccolo-math-2x7b
得分 (v2)：70.74
可解析性：167.0

AGIEval

任務	版本	指標	數值		標準誤差
agieval_aqua_rat	0	準確率	24.41	±	2.70
		歸一化準確率	24.80	±	2.72
agieval_logiqa_en	0	準確率	35.79	±	1.88
		歸一化準確率	36.71	±	1.89
agieval_lsat_ar	0	準確率	23.48	±	2.80
		歸一化準確率	23.91	±	2.82
agieval_lsat_lr	0	準確率	49.22	±	2.22
		歸一化準確率	50.00	±	2.22
agieval_lsat_rc	0	準確率	63.94	±	2.93
		歸一化準確率	64.31	±	2.93
agieval_sat_en	0	準確率	77.18	±	2.93
		歸一化準確率	76.70	±	2.95
agieval_sat_en_without_passage	0	準確率	45.15	±	3.48
		歸一化準確率	44.66	±	3.47
agieval_sat_math	0	準確率	33.64	±	3.19
		歸一化準確率	30.00	±	3.10

平均得分：43.89%

GPT4All

任務	版本	指標	數值		標準誤差
arc_challenge	0	準確率	61.86	±	1.42
		歸一化準確率	62.88	±	1.41
arc_easy	0	準確率	84.34	±	0.75
		歸一化準確率	80.47	±	0.81
boolq	1	準確率	86.88	±	0.59
hellaswag	0	準確率	68.56	±	0.46
		歸一化準確率	85.16	±	0.35
openbookqa	0	準確率	37.00	±	2.16
		歸一化準確率	47.80	±	2.24
piqa	0	準確率	82.21	±	0.89
		歸一化準確率	83.68	±	0.86
winogrande	0	準確率	77.98	±	1.16

平均得分：74.98%

TruthfulQA

任務	版本	指標	數值		標準誤差
truthfulqa_mc	1	單項選擇題準確率	47.37	±	1.75
		多項選擇題準確率	63.96	±	1.57

平均得分：63.96%

Bigbench

任務	版本	指標	數值		標準誤差
bigbench_causal_judgement	0	多項選擇題得分	55.26	±	3.62
bigbench_date_understanding	0	多項選擇題得分	63.14	±	2.51
bigbench_disambiguation_qa	0	多項選擇題得分	42.64	±	3.08
bigbench_geometric_shapes	0	多項選擇題得分	22.84	±	2.22
		精確字符串匹配	3.34	±	0.95
bigbench_logical_deduction_five_objects	0	多項選擇題得分	36.60	±	2.16
bigbench_logical_deduction_seven_objects	0	多項選擇題得分	25.57	±	1.65
bigbench_logical_deduction_three_objects	0	多項選擇題得分	56.00	±	2.87
bigbench_movie_recommendation	0	多項選擇題得分	42.40	±	2.21
bigbench_navigate	0	多項選擇題得分	54.70	±	1.57
bigbench_reasoning_about_colored_objects	0	多項選擇題得分	62.90	±	1.08
bigbench_ruin_names	0	多項選擇題得分	53.35	±	2.36
bigbench_salient_translation_error_detection	0	多項選擇題得分	24.35	±	1.36
bigbench_snarks	0	多項選擇題得分	62.43	±	3.61
bigbench_sports_understanding	0	多項選擇題得分	70.28	±	1.46
bigbench_temporal_sequences	0	多項選擇題得分	41.30	±	1.56
bigbench_tracking_shuffled_objects_five_objects	0	多項選擇題得分	22.32	±	1.18
bigbench_tracking_shuffled_objects_seven_objects	0	多項選擇題得分	17.77	±	0.91
bigbench_tracking_shuffled_objects_three_objects	0	多項選擇題得分	56.00	±	2.87

平均得分：44.99%

總體平均得分：56.96%

總耗時：01:51:53

Open LLM Leaderboard 評估結果

詳細結果可查看此處

指標	數值
平均值	72.32
AI2 推理挑戰 (25 次少樣本學習)	69.11
HellaSwag (10 次少樣本學習)	87.27
MMLU (5 次少樣本學習)	63.69
TruthfulQA (0 次少樣本學習)	63.86
Winogrande (5 次少樣本學習)	79.87
GSM8k (5 次少樣本學習)	70.13