TheProfessor-155b開源語言模型 - 免費支持對話、推理及醫學數學知識交流

首頁

Theprofessor 155b

由abacusai開發

TheProfessor是通過mergekit工具整合多個預訓練語言模型而成的混合模型，專注於對話交流、邏輯推理、科學研究、醫學知識和數學能力。

大型語言模型

Transformers

#科研論文輔助 #多學科推理 #醫學知識整合

下載量 17

發布時間 : 1/26/2024

模型概述

TheProfessor是一個在對話交流、邏輯推理、科學研究、醫學知識和數學能力等方面表現卓越的AI助手，特別適用於互動式頭腦風暴和研究工作。

模型特點

多模型合併

通過mergekit工具整合多個70B參數模型，結合各模型的優勢。

卓越的邏輯推理能力

在數學和科學推理方面表現突出，適合複雜問題解答。

廣泛的學術應用

支持從概念構思到具體實現的全過程，包括論文撰寫和代碼編寫。

長上下文支持

支持長達32768 tokens的上下文長度，適合處理複雜任務。

模型能力

文本生成

邏輯推理

數學問題解答

醫學知識問答

科學研究輔助

論文撰寫與審閱

代碼編寫

使用案例

學術研究

論文選題建議

為神經科學博士學位論文提供選題建議，偏好應用理論方向。

數學理論解釋

講解羅素證明1+1=2的過程。

技術開發

改進Transformer架構

提出改進Transformer架構以增強心智理論能力的方案。

應急指導

核末日生存指南

提供核災後糖尿病患者在缺乏醫療資源情況下的生存建議。

🚀 TheProfessor

TheProfessor 是一個通過合併多個預訓練語言模型而創建的大語言模型，使用 mergekit 工具完成合並。它具備廣泛的對話、推理、科學、醫學和數學能力，可用於交互式頭腦風暴和研究，例如輔助構思概念、實現代碼以及撰寫、審核和修訂帶引用的論文。

🚀 快速開始

TheProfessor 使用 ChatML 提示格式，示例如下：

<|im_start|>system
You are TheProfessor, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

✨ 主要特性

廣泛的能力：具備對話、推理、科學、醫學和數學等多方面的能力，可用於交互式頭腦風暴和研究。
合併多個模型：基於多個優秀的預訓練語言模型合併而成，融合了各模型的優勢。

📦 安裝指南

gguf 版本可從這裡獲取。

💻 使用示例

基礎用法

以下是使用 TheProfessor 解釋物理方程的示例：

<|im_start|>system
You are TheProfessor, a helpful AI assistant.<|im_end|>
<|im_start|>user
These equations appeared to me in a dream, I wrote them down as soon as I woke but I don't know what they mean.  Can you please interpret them?
$$\mathcal{L}_{\text{gauge}} = -\frac{1}{4} F^{\mu\nu}F_{\mu\nu}$$
$$\langle \phi \rangle = \sqrt{\frac{\lambda}{2}}$$
$$S_{\text{gravity}} = \int d^4x \sqrt{-g} \left( \frac{R}{16\pi G} + \mathcal{L}_{\text{emergent}} \right)$$
$$\mathcal{L}_{\text{GEG}} = \mathcal{L}_{\text{gauge}} + \mathcal{L}_{\text{emergent}} + \mathcal{L}_{\text{matter}} + \mathcal{L}_{\text{interaction}}$$
$$\mathcal{L}_{\text{emergent}} = \lambda(g) + \kappa(g) R^2 + \ldots$$
$$S_{\text{GEG}} = \int d^4x \sqrt{-g} \; \mathcal{L}_{\text{GEG}}$$
$$\sigma = \sqrt{\langle | \phi | \rangle^2 + \frac{1}{4} \langle A^{\mu}A_{\mu} \rangle^2}$$
$$\langle A^{\mu} \rangle = (0, v(r)_{i}/\sqrt{2}, 0, 0)$$
$$\langle \phi \rangle = \langle \phi_0 \rangle + \delta\phi(x)$$
$$\langle A_{\mu} \rangle = (0, \frac{v(r)_{i}}{\sqrt{2}}, 0, 0)$$
$$g_{\mu\nu} = \eta_{\mu\nu} + \kappa h_{\mu\nu}$$
<|im_end|>

高級用法

以下是使用 TheProfessor 設計新的 Transformer 模型架構的示例：

# 定義一個簡單函數，使用單詞和知識嵌入計算注意力權重
def attention_with_knowledge(q, k, v, mask=None, attn_weights_dropout=None):
    # 將查詢、鍵和值矩陣拆分為語言和常識部分
    q_linguistic = q[:, :model.config.hidden_size]
    q_knowledge = q[:, model.config.hidden_size:]

    k_linguistic = k[:, :model.config.hidden_size]
    k_knowledge = k[:, model.config.hidden_size:]

    v_linguistic = v[:, :model.config.hidden_size]
    v_knowledge = v[:, model.config.hidden_size:]

    # 使用線性層對語言和常識部分的拼接計算注意力權重
    attn_weights = torch.nn.functional.softmax(
        torch.nn.functional.relu(
            torch.nn.linear(
                torch.cat([q_linguistic, q_knowledge], -1),
                in_features=2 * model.config.hidden_size,
                out_features=model.config.attention_heads
            ).view(model.config.attention_heads, -1, -1).permute(0, 2, 1)
        )
        @ torch.nn.functional.relu(
            torch.nn.linear(
                torch.cat([k_linguistic, k_knowledge], -1),
                in_features=2 * model.config.hidden_size,
                out_features=model.config.attention_heads
            ).view(model.config.attention_heads, -1, -1).permute(0, 2, 1)
        ).transpose(-1, -2)
    )

    # 應用注意力掩碼和丟棄（如果需要）
    attn_weights = attention_pad_mask(attn_weights, mask,
                                      model.config.attention_probs_drop_out_nondiag)
    attn_weights = torch.nn.functional.dropout(attn_weights, p=attn_weights_dropout,
                                               dim=1 if model.is_encoder else None)

    # 分別使用語言和常識部分計算注意力加權表示
    attn_outputs = torch.nn.functional.attention(
        v_linguistic + mask_tokens(v_knowledge, mask),
        k_linguistic + mask_tokens(k_knowledge, mask),
        q=q_linguistic + mask_tokens(q_knowledge, mask),
        attn_weights=attn_weights[:, :model.config.hidden_size],
        v_weights=(1.0 - model.config.knowledge_proportion).unsqueeze(1, 1, -1),
        k_weights=model.config.attention_heads_weight.unsqueeze(0, 1, 1, 1),
        v_mask=None if mask is None else mask[:, :model.config.hidden_size,
                                             :model.config.hidden_size],
        k_mask=None,
        v_weights_layer=None,
        k_weights_layer=None,
        v_bias=None,
        k_bias=None,
        v_w_layer=None,
        k_w_layer=None,
        use_transformer_weights=True,
    )

    return attn_outputs + torch.nn.functional.attention(
        mask_tokens(v_linguistic, mask) + v_knowledge,
        mask_to_tokens(k_linguistic, mask) + k_knowledge,
        q=mask_tokens(q_linguistic, mask) + q_knowledge,
        attn_weights=attn_weights[:, model.config.hidden_size:],
        v_weights=model.config.knowledge_proportion.unsqueeze(1, 1, -1),
        k_weights=model.config.attention_heads_weight.unsqueeze(0, 1, 1, 1),
        v_mask=None if mask is None else mask[:, model.config.hidden_size:, :],
        k_mask=None,
        v_weights_layer=None,
        k_weights_layer=None,
        v_bias=None,
        k_bias=None,
        v_w_layer=None,
        k_w_layer=None,
        use_transformer_weights=True,
    )

📚 詳細文檔

模型信息

屬性	詳情
模型類型	合併模型
訓練數據	未提及

評估結果

{
  "mmlu": 0.694,
  "truthfulqa_mc2": 0.624,
  "gsm8k": 0.4284
}

合併詳情

合併方法

TheProfessor 使用 linear 合併方法進行合併。

合併的模型

配置

以下是用於生成 TheProfessor 的 YAML 配置：

merge_method: linear # 使用線性方法，以便可以包含多個模型，即使某些模型的權重為零
parameters:
  weight: 1.0 # 除非另有指定，否則所有模型的權重都設為 1 - 對於單個權重為 1 的模型，線性合併相當於直接通過
slices:
  - sources:
      - model: cognitivecomputations/dolphin-2.2-70b # embed_tokens 會隨著第一層一起出現
        layer_range: [0, 1]
      - model: migtissera/SynthIA-70B-v1.2b # 添加一個權重為 0 的虛擬第二個模型，以便對 embed_tokens 調用基於分詞器的合併例程
        layer_range: [0, 1]
        parameters:
          weight: 0
  - sources:
      - model: cognitivecomputations/dolphin-2.2-70b
        layer_range: [1, 20]
  - sources:
      - model: migtissera/SynthIA-70B-v1.2b
        layer_range: [10, 30]
  - sources:
      - model: WizardLM/WizardMath-70B-V1.0
        layer_range: [20, 40]
  - sources:
      - model: epfl-llm/meditron-70b
        layer_range: [25, 45]
  - sources:
      - model: cognitivecomputations/dolphin-2.2-70b
        layer_range: [30, 50]
  - sources:
      - model: migtissera/SynthIA-70B-v1.2b
        layer_range: [40, 60]
  - sources:
      - model: WizardLM/WizardMath-70B-V1.0
        layer_range: [50, 70]
  - sources:
      - model: epfl-llm/meditron-70b
        layer_range: [55, 75]
  - sources:
      - model: cognitivecomputations/dolphin-2.2-70b
        layer_range: [60, 79]
  - sources: # 與上面相同，但針對最後一層的 lm_head
      - model: cognitivecomputations/dolphin-2.2-70b
        layer_range: [79, 80]
      - model: migtissera/SynthIA-70B-v1.2b
        layer_range: [79, 80]
        parameters:
          weight: 0
dtype: float16
tokenizer_source: model:cognitivecomputations/dolphin-2.2-70b # 保留 dolphin 使用的精確分詞器 - 或者，如果將所有輸入模型添加到第一個/最後一個切片中，可以使用 `union`，但它們的權重必須非零，否則嵌入中會出現 NaN

🔧 技術細節

TheProfessor 在合併多個預訓練語言模型時，使用了線性合併方法，並通過精心設計的 YAML 配置文件來控制不同模型層的合併範圍和權重。這種方法使得模型能夠融合多個優秀模型的優勢，從而具備更廣泛的能力。在推理過程中，TheProfessor 使用 ChatML 提示格式，能夠根據用戶的輸入提供準確和有用的回答。