MiniMax-Text-01開源文本生成模型 - 依據提示免費生成連貫文本

首頁

Minimax Text 01

由MiniMaxAI開發

該模型是一個文本生成模型，能夠根據輸入的提示生成連貫的文本內容。

文本生成

Safetensors

#多語言文本生成 #高精度語義理解 #零樣本學習

下載量 8,231

發布時間 : 1/12/2025

模型概述

這是一個通用的文本生成模型，適用於多種自然語言處理任務，如文本補全、對話生成等。

模型特點

通用文本生成

能夠根據輸入的提示生成連貫、上下文相關的文本內容

多任務適應

適用於多種文本生成任務，包括但不限於問答、摘要、對話等

模型能力

文本生成

文本補全

對話生成

內容創作

使用案例

內容創作

文章寫作輔助

幫助作者生成文章段落或提供寫作靈感

提高寫作效率和質量

營銷文案生成

自動生成產品描述、廣告文案等營銷內容

快速產出多樣化營銷素材

客戶服務

自動客服對話

用於構建智能客服系統，自動回答常見問題

降低客服成本，提高響應速度

🚀 MiniMax-Text-01

MiniMax-Text-01是一款強大的語言模型，總參數達4560億，每個標記激活459億參數。它採用混合架構，結合多種注意力機制和專家混合（MoE），訓練上下文長度擴展至100萬標記，推理時可處理多達400萬標記的上下文，在各類學術基準測試中表現出色。

🚀 快速開始

這裡我們提供一個加載分詞器和模型以生成內容的簡單示例。

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig

# load hf config
hf_config = AutoConfig.from_pretrained("MiniMaxAI/MiniMax-Text-01", trust_remote_code=True)

# quantization config, int8 is recommended
quantization_config =  QuantoConfig(
            weights="int8",
            modules_to_not_convert=[
                "lm_head",
                "embed_tokens",
            ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
            + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
        )

# assume 8 GPUs
world_size = 8
layers_per_device = hf_config.num_hidden_layers // world_size
# set device map
device_map = {
    'model.embed_tokens': 'cuda:0',
    'model.norm': f'cuda:{world_size - 1}',
    'lm_head': f'cuda:{world_size - 1}'
}
for i in range(world_size):
    for j in range(layers_per_device):
        device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-Text-01")
prompt = "Hello!"
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
    {"role": "user", "content": [{"type": "text", "text": prompt}]},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
# tokenize and move to device
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")

# load bfloat16 model, move to device, and apply quantization
quantized_model = AutoModelForCausalLM.from_pretrained(
    "MiniMaxAI/MiniMax-Text-01",
    torch_dtype="bfloat16",
    device_map=device_map,
    quantization_config=quantization_config,
    trust_remote_code=True,
    offload_buffers=True,
)

# generate response
generation_config = GenerationConfig(
    max_new_tokens=20,
    eos_token_id=200020,
    use_cache=True,
)
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
print(f"generated_ids: {generated_ids}")
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

✨ 主要特性

強大的參數規模：擁有4560億總參數，每個標記激活459億參數。
混合架構：採用結合Lightning Attention、Softmax Attention和專家混合（MoE）的混合架構，更好地解鎖模型的長上下文能力。
長上下文處理能力：訓練上下文長度擴展至100萬標記，推理時可處理多達400萬標記的上下文。
高性能表現：在各種學術基準測試中展現出頂級模型的性能。
函數調用能力：支持函數調用功能，能智能識別何時需要調用外部函數，並以結構化JSON格式輸出參數。

📦 安裝指南

對於生產部署，我們建議使用 vLLM 來服務MiniMax-Text-01。vLLM具有以下特點，為服務大語言模型提供了出色的性能：

🔥 出色的服務吞吐量性能
⚡ 高效智能的內存管理
📦 強大的批量請求處理能力
⚙️ 深度優化的底層性能

詳細的部署說明，請參考我們的 vLLM部署指南。

📚 詳細文檔

模型架構

MiniMax-Text-01的架構簡要描述如下：

屬性	詳情
總參數	456B
每個標記激活的參數	45.9B
層數	80
混合注意力	每7個Lightning Attention後接一個Softmax Attention - 注意力頭數量：64 - 注意力頭維度：128
專家混合（MoE）	- 專家數量：32 - 專家隱藏維度：9216 - Top-2路由策略
位置編碼	旋轉位置嵌入（RoPE）應用於一半的注意力頭維度，基頻為10,000,000
隱藏大小	6144
詞表大小	200,064

評估

核心學術基準

任務	GPT-4o (11 - 20)	Claude - 3.5 - Sonnet (10 - 22)	Gemini - 1.5 - Pro (002)	Gemini - 2.0 - Flash (exp)	Qwen2.5 - 72B - Inst.	DeepSeek - V3	Llama - 3.1 - 405B - Inst.	MiniMax - Text - 01
通用
MMLU^*	85.7	88.3	86.8	86.5	86.1	88.5	88.6	88.5
MMLU - Pro^*	74.4	78.0	75.8	76.4	71.1	75.9	73.3	75.7
SimpleQA	39.0	28.1	23.4	26.6	10.3	24.9	23.2	23.7
C - SimpleQA	64.6	56.8	59.4	63.3	52.2	64.8	54.7	67.4
IFEval (avg)	84.1	90.1	89.4	88.4	87.2	87.3	86.4	89.1
Arena - Hard	92.4	87.6	85.3	72.7	81.2	91.4	63.5	89.1
推理
GPQA^* (diamond)	46.0	65.0	59.1	62.1	49.0	59.1	50.7	54.4
DROP^* (F1)	89.2	88.8	89.2	89.3	85.0	91.0	92.5	87.8
數學
GSM8k^*	95.6	96.9	95.2	95.4	95.8	96.7	96.7	94.8
MATH^*	76.6	74.1	84.6	83.9	81.8	84.6	73.8	77.4
編碼
MBPP +	76.2	75.1	75.4	75.9	77.0	78.8	73.0	71.7
HumanEval	90.2	93.7	86.6	89.6	86.6	92.1	89.0	86.9

^* 按照 0-shot CoT 設置進行評估。

長上下文基準

4M Needle In A Haystack Test

Ruler | 模型 | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | GPT - 4o (11 - 20) | 0.970 | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - | | Claude - 3.5 - Sonnet (10 - 22) | 0.965 | 0.960 | 0.957 | 0.950 | 0.952 | 0.938 | - | - | - | | Gemini - 1.5 - Pro (002) | 0.962 | 0.960 | 0.960 | 0.958 | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 | | Gemini - 2.0 - Flash (exp) | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - | | MiniMax - Text - 01 | 0.963 | 0.961 | 0.953 | 0.954 | 0.943 | 0.947 | 0.945 | 0.928 | 0.910 |
LongBench v2 | 模型 | 總體 | 簡單 | 困難 | 短上下文 | 中上下文 | 長上下文 | | --- | --- | --- | --- | --- | --- | --- | | 人類 | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 | | w/ CoT | | | | | | | | GPT - 4o (11 - 20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 | | Claude - 3.5 - Sonnet (10 - 22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 | | Deepseek - V3 | - | - | - | - | - | - | | Qwen2.5 - 72B - Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 | | MiniMax - Text - 01 | 56.5 | 66.1 | 50.5 | 61.7 | 56.7 | 47.2 | | w/o CoT | | | | | | | | GPT - 4o (11 - 20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 | | Claude - 3.5 - Sonnet (10 - 22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 | | Deepseek - V3 | 48.7 | - | - | - | - | - | | Qwen2.5 - 72B - Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | 44.4 | | MiniMax - Text - 01 | 52.9 | 60.9 | 47.9 | 58.9 | 52.6 | 43.5 |
MTOB | 上下文類型 | 無上下文 | 半本書 | 整本書 | Δ 半本書 | Δ 整本書 | | --- | --- | --- | --- | --- | --- | | eng → kalam (ChrF) | | | | | | | GPT - 4o (11 - 20) | 9.90 | 54.30 | - | 44.40 | - | | Claude - 3.5 - Sonnet (10 - 22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 | | Gemini - 1.5 - Pro (002) | 16.79 | 53.68 | 57.90 | 36.89 | 41.11 | | Gemini - 2.0 - Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 | | Qwen - Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 | | MiniMax - Text - 01 | 6.0 | 51.74 | 51.60 | 45.7 | 45.6 | | kalam → eng (BLEURT) | | | | | | | GPT - 4o (11 - 20) | 33.20 | 58.30 | - | 25.10 | - | | Claude - 3.5 - Sonnet (10 - 22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 | | Gemini - 1.5 - Pro (002) | 32.02 | 61.52 | 63.09 | 29.50 | 31.07 | | Gemini - 2.0 - Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 | | Qwen - Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 | | MiniMax - Text - 01 | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 |

函數調用

MiniMax-Text-01支持函數調用功能，使模型能夠智能識別何時需要調用外部函數，並以結構化JSON格式輸出參數。通過函數調用，你可以：

讓模型識別用戶請求中隱含的函數調用需求
接收結構化的參數輸出，以便無縫集成到應用程序中
支持各種複雜的參數類型，包括嵌套對象和數組

函數調用支持標準的OpenAI兼容格式定義，並與Transformers庫無縫集成。詳細的使用說明，請參考我們的函數調用指南或中文指南。

🔧 技術細節

為了更好地解鎖模型的長上下文能力，MiniMax-Text-01採用了結合Lightning Attention、Softmax Attention和專家混合（MoE）的混合架構。利用先進的並行策略和創新的計算通信重疊方法，如線性注意力序列並行增強（LASP+）、可變長度環形注意力、專家張量並行（ETP）等，將訓練上下文長度擴展到100萬標記，並在推理時能夠處理多達400萬標記的上下文。

📄 許可證

模型許可證：請參考模型許可證協議。
代碼許可證：遵循 MIT許可證。

📖 引用

@misc{minimax2025minimax01scalingfoundationmodels,
      title={MiniMax-01: Scaling Foundation Models with Lightning Attention}, 
      author={MiniMax and Aonian Li and Bangwei Gong and Bo Yang and Boji Shan and Chang Liu and Cheng Zhu and Chunhao Zhang and Congchao Guo and Da Chen and Dong Li and Enwei Jiao and Gengxin Li and Guojun Zhang and Haohai Sun and Houze Dong and Jiadai Zhu and Jiaqi Zhuang and Jiayuan Song and Jin Zhu and Jingtao Han and Jingyang Li and Junbin Xie and Junhao Xu and Junjie Yan and Kaishun Zhang and Kecheng Xiao and Kexi Kang and Le Han and Leyang Wang and Lianfei Yu and Liheng Feng and Lin Zheng and Linbo Chai and Long Xing and Meizhi Ju and Mingyuan Chi and Mozhi Zhang and Peikai Huang and Pengcheng Niu and Pengfei Li and Pengyu Zhao and Qi Yang and Qidi Xu and Qiexiang Wang and Qin Wang and Qiuhui Li and Ruitao Leng and Shengmin Shi and Shuqi Yu and Sichen Li and Songquan Zhu and Tao Huang and Tianrun Liang and Weigao Sun and Weixuan Sun and Weiyu Cheng and Wenkai Li and Xiangjun Song and Xiao Su and Xiaodong Han and Xinjie Zhang and Xinzhu Hou and Xu Min and Xun Zou and Xuyang Shen and Yan Gong and Yingjie Zhu and Yipeng Zhou and Yiran Zhong and Yongyi Hu and Yuanxiang Fan and Yue Yu and Yufeng Yang and Yuhao Li and Yunan Huang and Yunji Li and Yunpeng Huang and Yunzhi Xu and Yuxin Mao and Zehan Li and Zekang Li and Zewei Tao and Zewen Ying and Zhaoyang Cong and Zhen Qin and Zhenhua Fan and Zhihang Yu and Zhuo Jiang and Zijia Wu},
      year={2025},
      eprint={2501.08313},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.08313}, 
}