MiniMax-Text-01开源文本生成模型 - 依据提示免费生成连贯文本

首页

Minimax Text 01

由 MiniMaxAI 开发

该模型是一个文本生成模型，能够根据输入的提示生成连贯的文本内容。

文本生成

Safetensors

#多语言文本生成 #高精度语义理解 #零样本学习

下载量 8,231

发布时间 : 1/12/2025

模型简介

这是一个通用的文本生成模型，适用于多种自然语言处理任务，如文本补全、对话生成等。

模型特点

通用文本生成

能够根据输入的提示生成连贯、上下文相关的文本内容

多任务适应

适用于多种文本生成任务，包括但不限于问答、摘要、对话等

模型能力

文本生成

文本补全

对话生成

内容创作

使用案例

内容创作

文章写作辅助

帮助作者生成文章段落或提供写作灵感

提高写作效率和质量

营销文案生成

自动生成产品描述、广告文案等营销内容

快速产出多样化营销素材

客户服务

自动客服对话

用于构建智能客服系统，自动回答常见问题

降低客服成本，提高响应速度

🚀 MiniMax-Text-01

MiniMax-Text-01是一款强大的语言模型，总参数达4560亿，每个标记激活459亿参数。它采用混合架构，结合多种注意力机制和专家混合（MoE），训练上下文长度扩展至100万标记，推理时可处理多达400万标记的上下文，在各类学术基准测试中表现出色。

🚀 快速开始

这里我们提供一个加载分词器和模型以生成内容的简单示例。

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig

# load hf config
hf_config = AutoConfig.from_pretrained("MiniMaxAI/MiniMax-Text-01", trust_remote_code=True)

# quantization config, int8 is recommended
quantization_config =  QuantoConfig(
            weights="int8",
            modules_to_not_convert=[
                "lm_head",
                "embed_tokens",
            ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
            + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
        )

# assume 8 GPUs
world_size = 8
layers_per_device = hf_config.num_hidden_layers // world_size
# set device map
device_map = {
    'model.embed_tokens': 'cuda:0',
    'model.norm': f'cuda:{world_size - 1}',
    'lm_head': f'cuda:{world_size - 1}'
}
for i in range(world_size):
    for j in range(layers_per_device):
        device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-Text-01")
prompt = "Hello!"
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
    {"role": "user", "content": [{"type": "text", "text": prompt}]},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
# tokenize and move to device
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")

# load bfloat16 model, move to device, and apply quantization
quantized_model = AutoModelForCausalLM.from_pretrained(
    "MiniMaxAI/MiniMax-Text-01",
    torch_dtype="bfloat16",
    device_map=device_map,
    quantization_config=quantization_config,
    trust_remote_code=True,
    offload_buffers=True,
)

# generate response
generation_config = GenerationConfig(
    max_new_tokens=20,
    eos_token_id=200020,
    use_cache=True,
)
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
print(f"generated_ids: {generated_ids}")
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

✨ 主要特性

强大的参数规模：拥有4560亿总参数，每个标记激活459亿参数。
混合架构：采用结合Lightning Attention、Softmax Attention和专家混合（MoE）的混合架构，更好地解锁模型的长上下文能力。
长上下文处理能力：训练上下文长度扩展至100万标记，推理时可处理多达400万标记的上下文。
高性能表现：在各种学术基准测试中展现出顶级模型的性能。
函数调用能力：支持函数调用功能，能智能识别何时需要调用外部函数，并以结构化JSON格式输出参数。

📦 安装指南

对于生产部署，我们建议使用 vLLM 来服务MiniMax-Text-01。vLLM具有以下特点，为服务大语言模型提供了出色的性能：

🔥 出色的服务吞吐量性能
⚡ 高效智能的内存管理
📦 强大的批量请求处理能力
⚙️ 深度优化的底层性能

详细的部署说明，请参考我们的 vLLM部署指南。

📚 详细文档

模型架构

MiniMax-Text-01的架构简要描述如下：

属性	详情
总参数	456B
每个标记激活的参数	45.9B
层数	80
混合注意力	每7个Lightning Attention后接一个Softmax Attention - 注意力头数量：64 - 注意力头维度：128
专家混合（MoE）	- 专家数量：32 - 专家隐藏维度：9216 - Top-2路由策略
位置编码	旋转位置嵌入（RoPE）应用于一半的注意力头维度，基频为10,000,000
隐藏大小	6144
词表大小	200,064

评估

核心学术基准

任务	GPT-4o (11 - 20)	Claude - 3.5 - Sonnet (10 - 22)	Gemini - 1.5 - Pro (002)	Gemini - 2.0 - Flash (exp)	Qwen2.5 - 72B - Inst.	DeepSeek - V3	Llama - 3.1 - 405B - Inst.	MiniMax - Text - 01
通用
MMLU^*	85.7	88.3	86.8	86.5	86.1	88.5	88.6	88.5
MMLU - Pro^*	74.4	78.0	75.8	76.4	71.1	75.9	73.3	75.7
SimpleQA	39.0	28.1	23.4	26.6	10.3	24.9	23.2	23.7
C - SimpleQA	64.6	56.8	59.4	63.3	52.2	64.8	54.7	67.4
IFEval (avg)	84.1	90.1	89.4	88.4	87.2	87.3	86.4	89.1
Arena - Hard	92.4	87.6	85.3	72.7	81.2	91.4	63.5	89.1
推理
GPQA^* (diamond)	46.0	65.0	59.1	62.1	49.0	59.1	50.7	54.4
DROP^* (F1)	89.2	88.8	89.2	89.3	85.0	91.0	92.5	87.8
数学
GSM8k^*	95.6	96.9	95.2	95.4	95.8	96.7	96.7	94.8
MATH^*	76.6	74.1	84.6	83.9	81.8	84.6	73.8	77.4
编码
MBPP +	76.2	75.1	75.4	75.9	77.0	78.8	73.0	71.7
HumanEval	90.2	93.7	86.6	89.6	86.6	92.1	89.0	86.9

^* 按照 0-shot CoT 设置进行评估。

长上下文基准

4M Needle In A Haystack Test

Ruler | 模型 | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | GPT - 4o (11 - 20) | 0.970 | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - | | Claude - 3.5 - Sonnet (10 - 22) | 0.965 | 0.960 | 0.957 | 0.950 | 0.952 | 0.938 | - | - | - | | Gemini - 1.5 - Pro (002) | 0.962 | 0.960 | 0.960 | 0.958 | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 | | Gemini - 2.0 - Flash (exp) | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - | | MiniMax - Text - 01 | 0.963 | 0.961 | 0.953 | 0.954 | 0.943 | 0.947 | 0.945 | 0.928 | 0.910 |
LongBench v2 | 模型 | 总体 | 简单 | 困难 | 短上下文 | 中上下文 | 长上下文 | | --- | --- | --- | --- | --- | --- | --- | | 人类 | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 | | w/ CoT | | | | | | | | GPT - 4o (11 - 20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 | | Claude - 3.5 - Sonnet (10 - 22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 | | Deepseek - V3 | - | - | - | - | - | - | | Qwen2.5 - 72B - Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 | | MiniMax - Text - 01 | 56.5 | 66.1 | 50.5 | 61.7 | 56.7 | 47.2 | | w/o CoT | | | | | | | | GPT - 4o (11 - 20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 | | Claude - 3.5 - Sonnet (10 - 22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 | | Deepseek - V3 | 48.7 | - | - | - | - | - | | Qwen2.5 - 72B - Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | 44.4 | | MiniMax - Text - 01 | 52.9 | 60.9 | 47.9 | 58.9 | 52.6 | 43.5 |
MTOB | 上下文类型 | 无上下文 | 半本书 | 整本书 | Δ 半本书 | Δ 整本书 | | --- | --- | --- | --- | --- | --- | | eng → kalam (ChrF) | | | | | | | GPT - 4o (11 - 20) | 9.90 | 54.30 | - | 44.40 | - | | Claude - 3.5 - Sonnet (10 - 22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 | | Gemini - 1.5 - Pro (002) | 16.79 | 53.68 | 57.90 | 36.89 | 41.11 | | Gemini - 2.0 - Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 | | Qwen - Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 | | MiniMax - Text - 01 | 6.0 | 51.74 | 51.60 | 45.7 | 45.6 | | kalam → eng (BLEURT) | | | | | | | GPT - 4o (11 - 20) | 33.20 | 58.30 | - | 25.10 | - | | Claude - 3.5 - Sonnet (10 - 22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 | | Gemini - 1.5 - Pro (002) | 32.02 | 61.52 | 63.09 | 29.50 | 31.07 | | Gemini - 2.0 - Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 | | Qwen - Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 | | MiniMax - Text - 01 | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 |

函数调用

MiniMax-Text-01支持函数调用功能，使模型能够智能识别何时需要调用外部函数，并以结构化JSON格式输出参数。通过函数调用，你可以：

让模型识别用户请求中隐含的函数调用需求
接收结构化的参数输出，以便无缝集成到应用程序中
支持各种复杂的参数类型，包括嵌套对象和数组

函数调用支持标准的OpenAI兼容格式定义，并与Transformers库无缝集成。详细的使用说明，请参考我们的函数调用指南或中文指南。

🔧 技术细节

为了更好地解锁模型的长上下文能力，MiniMax-Text-01采用了结合Lightning Attention、Softmax Attention和专家混合（MoE）的混合架构。利用先进的并行策略和创新的计算通信重叠方法，如线性注意力序列并行增强（LASP+）、可变长度环形注意力、专家张量并行（ETP）等，将训练上下文长度扩展到100万标记，并在推理时能够处理多达400万标记的上下文。

📄 许可证

模型许可证：请参考模型许可证协议。
代码许可证：遵循 MIT许可证。

📖 引用

@misc{minimax2025minimax01scalingfoundationmodels,
      title={MiniMax-01: Scaling Foundation Models with Lightning Attention}, 
      author={MiniMax and Aonian Li and Bangwei Gong and Bo Yang and Boji Shan and Chang Liu and Cheng Zhu and Chunhao Zhang and Congchao Guo and Da Chen and Dong Li and Enwei Jiao and Gengxin Li and Guojun Zhang and Haohai Sun and Houze Dong and Jiadai Zhu and Jiaqi Zhuang and Jiayuan Song and Jin Zhu and Jingtao Han and Jingyang Li and Junbin Xie and Junhao Xu and Junjie Yan and Kaishun Zhang and Kecheng Xiao and Kexi Kang and Le Han and Leyang Wang and Lianfei Yu and Liheng Feng and Lin Zheng and Linbo Chai and Long Xing and Meizhi Ju and Mingyuan Chi and Mozhi Zhang and Peikai Huang and Pengcheng Niu and Pengfei Li and Pengyu Zhao and Qi Yang and Qidi Xu and Qiexiang Wang and Qin Wang and Qiuhui Li and Ruitao Leng and Shengmin Shi and Shuqi Yu and Sichen Li and Songquan Zhu and Tao Huang and Tianrun Liang and Weigao Sun and Weixuan Sun and Weiyu Cheng and Wenkai Li and Xiangjun Song and Xiao Su and Xiaodong Han and Xinjie Zhang and Xinzhu Hou and Xu Min and Xun Zou and Xuyang Shen and Yan Gong and Yingjie Zhu and Yipeng Zhou and Yiran Zhong and Yongyi Hu and Yuanxiang Fan and Yue Yu and Yufeng Yang and Yuhao Li and Yunan Huang and Yunji Li and Yunpeng Huang and Yunzhi Xu and Yuxin Mao and Zehan Li and Zekang Li and Zewei Tao and Zewen Ying and Zhaoyang Cong and Zhen Qin and Zhenhua Fan and Zhihang Yu and Zhuo Jiang and Zijia Wu},
      year={2025},
      eprint={2501.08313},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.08313}, 
}