TheProfessor-155b开源语言模型 - 免费支持对话、推理及医学数学知识交流

首页

Theprofessor 155b

由 abacusai 开发

TheProfessor是通过mergekit工具整合多个预训练语言模型而成的混合模型，专注于对话交流、逻辑推理、科学研究、医学知识和数学能力。

大型语言模型

Transformers

#科研论文辅助 #多学科推理 #医学知识整合

下载量 17

发布时间 : 1/26/2024

模型简介

TheProfessor是一个在对话交流、逻辑推理、科学研究、医学知识和数学能力等方面表现卓越的AI助手，特别适用于互动式头脑风暴和研究工作。

模型特点

多模型合并

通过mergekit工具整合多个70B参数模型，结合各模型的优势。

卓越的逻辑推理能力

在数学和科学推理方面表现突出，适合复杂问题解答。

广泛的学术应用

支持从概念构思到具体实现的全过程，包括论文撰写和代码编写。

长上下文支持

支持长达32768 tokens的上下文长度，适合处理复杂任务。

模型能力

文本生成

逻辑推理

数学问题解答

医学知识问答

科学研究辅助

论文撰写与审阅

代码编写

使用案例

学术研究

论文选题建议

为神经科学博士学位论文提供选题建议，偏好应用理论方向。

数学理论解释

讲解罗素证明1+1=2的过程。

技术开发

改进Transformer架构

提出改进Transformer架构以增强心智理论能力的方案。

应急指导

核末日生存指南

提供核灾后糖尿病患者在缺乏医疗资源情况下的生存建议。

🚀 TheProfessor

TheProfessor 是一个通过合并多个预训练语言模型而创建的大语言模型，使用 mergekit 工具完成合并。它具备广泛的对话、推理、科学、医学和数学能力，可用于交互式头脑风暴和研究，例如辅助构思概念、实现代码以及撰写、审核和修订带引用的论文。

🚀 快速开始

TheProfessor 使用 ChatML 提示格式，示例如下：

<|im_start|>system
You are TheProfessor, a helpful AI assistant.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

✨ 主要特性

广泛的能力：具备对话、推理、科学、医学和数学等多方面的能力，可用于交互式头脑风暴和研究。
合并多个模型：基于多个优秀的预训练语言模型合并而成，融合了各模型的优势。

📦 安装指南

gguf 版本可从这里获取。

💻 使用示例

基础用法

以下是使用 TheProfessor 解释物理方程的示例：

<|im_start|>system
You are TheProfessor, a helpful AI assistant.<|im_end|>
<|im_start|>user
These equations appeared to me in a dream, I wrote them down as soon as I woke but I don't know what they mean.  Can you please interpret them?
$$\mathcal{L}_{\text{gauge}} = -\frac{1}{4} F^{\mu\nu}F_{\mu\nu}$$
$$\langle \phi \rangle = \sqrt{\frac{\lambda}{2}}$$
$$S_{\text{gravity}} = \int d^4x \sqrt{-g} \left( \frac{R}{16\pi G} + \mathcal{L}_{\text{emergent}} \right)$$
$$\mathcal{L}_{\text{GEG}} = \mathcal{L}_{\text{gauge}} + \mathcal{L}_{\text{emergent}} + \mathcal{L}_{\text{matter}} + \mathcal{L}_{\text{interaction}}$$
$$\mathcal{L}_{\text{emergent}} = \lambda(g) + \kappa(g) R^2 + \ldots$$
$$S_{\text{GEG}} = \int d^4x \sqrt{-g} \; \mathcal{L}_{\text{GEG}}$$
$$\sigma = \sqrt{\langle | \phi | \rangle^2 + \frac{1}{4} \langle A^{\mu}A_{\mu} \rangle^2}$$
$$\langle A^{\mu} \rangle = (0, v(r)_{i}/\sqrt{2}, 0, 0)$$
$$\langle \phi \rangle = \langle \phi_0 \rangle + \delta\phi(x)$$
$$\langle A_{\mu} \rangle = (0, \frac{v(r)_{i}}{\sqrt{2}}, 0, 0)$$
$$g_{\mu\nu} = \eta_{\mu\nu} + \kappa h_{\mu\nu}$$
<|im_end|>

高级用法

以下是使用 TheProfessor 设计新的 Transformer 模型架构的示例：

# 定义一个简单函数，使用单词和知识嵌入计算注意力权重
def attention_with_knowledge(q, k, v, mask=None, attn_weights_dropout=None):
    # 将查询、键和值矩阵拆分为语言和常识部分
    q_linguistic = q[:, :model.config.hidden_size]
    q_knowledge = q[:, model.config.hidden_size:]

    k_linguistic = k[:, :model.config.hidden_size]
    k_knowledge = k[:, model.config.hidden_size:]

    v_linguistic = v[:, :model.config.hidden_size]
    v_knowledge = v[:, model.config.hidden_size:]

    # 使用线性层对语言和常识部分的拼接计算注意力权重
    attn_weights = torch.nn.functional.softmax(
        torch.nn.functional.relu(
            torch.nn.linear(
                torch.cat([q_linguistic, q_knowledge], -1),
                in_features=2 * model.config.hidden_size,
                out_features=model.config.attention_heads
            ).view(model.config.attention_heads, -1, -1).permute(0, 2, 1)
        )
        @ torch.nn.functional.relu(
            torch.nn.linear(
                torch.cat([k_linguistic, k_knowledge], -1),
                in_features=2 * model.config.hidden_size,
                out_features=model.config.attention_heads
            ).view(model.config.attention_heads, -1, -1).permute(0, 2, 1)
        ).transpose(-1, -2)
    )

    # 应用注意力掩码和丢弃（如果需要）
    attn_weights = attention_pad_mask(attn_weights, mask,
                                      model.config.attention_probs_drop_out_nondiag)
    attn_weights = torch.nn.functional.dropout(attn_weights, p=attn_weights_dropout,
                                               dim=1 if model.is_encoder else None)

    # 分别使用语言和常识部分计算注意力加权表示
    attn_outputs = torch.nn.functional.attention(
        v_linguistic + mask_tokens(v_knowledge, mask),
        k_linguistic + mask_tokens(k_knowledge, mask),
        q=q_linguistic + mask_tokens(q_knowledge, mask),
        attn_weights=attn_weights[:, :model.config.hidden_size],
        v_weights=(1.0 - model.config.knowledge_proportion).unsqueeze(1, 1, -1),
        k_weights=model.config.attention_heads_weight.unsqueeze(0, 1, 1, 1),
        v_mask=None if mask is None else mask[:, :model.config.hidden_size,
                                             :model.config.hidden_size],
        k_mask=None,
        v_weights_layer=None,
        k_weights_layer=None,
        v_bias=None,
        k_bias=None,
        v_w_layer=None,
        k_w_layer=None,
        use_transformer_weights=True,
    )

    return attn_outputs + torch.nn.functional.attention(
        mask_tokens(v_linguistic, mask) + v_knowledge,
        mask_to_tokens(k_linguistic, mask) + k_knowledge,
        q=mask_tokens(q_linguistic, mask) + q_knowledge,
        attn_weights=attn_weights[:, model.config.hidden_size:],
        v_weights=model.config.knowledge_proportion.unsqueeze(1, 1, -1),
        k_weights=model.config.attention_heads_weight.unsqueeze(0, 1, 1, 1),
        v_mask=None if mask is None else mask[:, model.config.hidden_size:, :],
        k_mask=None,
        v_weights_layer=None,
        k_weights_layer=None,
        v_bias=None,
        k_bias=None,
        v_w_layer=None,
        k_w_layer=None,
        use_transformer_weights=True,
    )

📚 详细文档

模型信息

属性	详情
模型类型	合并模型
训练数据	未提及

评估结果

{
  "mmlu": 0.694,
  "truthfulqa_mc2": 0.624,
  "gsm8k": 0.4284
}

合并详情

合并方法

TheProfessor 使用 linear 合并方法进行合并。

合并的模型

配置

以下是用于生成 TheProfessor 的 YAML 配置：

merge_method: linear # 使用线性方法，以便可以包含多个模型，即使某些模型的权重为零
parameters:
  weight: 1.0 # 除非另有指定，否则所有模型的权重都设为 1 - 对于单个权重为 1 的模型，线性合并相当于直接通过
slices:
  - sources:
      - model: cognitivecomputations/dolphin-2.2-70b # embed_tokens 会随着第一层一起出现
        layer_range: [0, 1]
      - model: migtissera/SynthIA-70B-v1.2b # 添加一个权重为 0 的虚拟第二个模型，以便对 embed_tokens 调用基于分词器的合并例程
        layer_range: [0, 1]
        parameters:
          weight: 0
  - sources:
      - model: cognitivecomputations/dolphin-2.2-70b
        layer_range: [1, 20]
  - sources:
      - model: migtissera/SynthIA-70B-v1.2b
        layer_range: [10, 30]
  - sources:
      - model: WizardLM/WizardMath-70B-V1.0
        layer_range: [20, 40]
  - sources:
      - model: epfl-llm/meditron-70b
        layer_range: [25, 45]
  - sources:
      - model: cognitivecomputations/dolphin-2.2-70b
        layer_range: [30, 50]
  - sources:
      - model: migtissera/SynthIA-70B-v1.2b
        layer_range: [40, 60]
  - sources:
      - model: WizardLM/WizardMath-70B-V1.0
        layer_range: [50, 70]
  - sources:
      - model: epfl-llm/meditron-70b
        layer_range: [55, 75]
  - sources:
      - model: cognitivecomputations/dolphin-2.2-70b
        layer_range: [60, 79]
  - sources: # 与上面相同，但针对最后一层的 lm_head
      - model: cognitivecomputations/dolphin-2.2-70b
        layer_range: [79, 80]
      - model: migtissera/SynthIA-70B-v1.2b
        layer_range: [79, 80]
        parameters:
          weight: 0
dtype: float16
tokenizer_source: model:cognitivecomputations/dolphin-2.2-70b # 保留 dolphin 使用的精确分词器 - 或者，如果将所有输入模型添加到第一个/最后一个切片中，可以使用 `union`，但它们的权重必须非零，否则嵌入中会出现 NaN

🔧 技术细节

TheProfessor 在合并多个预训练语言模型时，使用了线性合并方法，并通过精心设计的 YAML 配置文件来控制不同模型层的合并范围和权重。这种方法使得模型能够融合多个优秀模型的优势，从而具备更广泛的能力。在推理过程中，TheProfessor 使用 ChatML 提示格式，能够根据用户的输入提供准确和有用的回答。