360智腦3-7B-O1.5開源模型 - 免費部署支持複雜推理任務長思維鏈

首頁

360zhinao3 7B O1.5

由qihoo360開發

360智腦3-7B-O1.5是奇虎360開源的長思維鏈模型，基於360智腦3-7B-Instruct微調，支持複雜推理任務。

大型語言模型

Transformers

支持多種語言開源協議:Apache-2.0 #多語言大模型 #長文本推理 #開源商用

下載量 35

發布時間 : 4/23/2025

模型概述

360智腦3系列模型是奇虎360開源的7B參數大語言模型，包含基礎版、指令版和長思維鏈版。O1.5版本針對複雜推理任務優化，支持長思維鏈推理。

模型特點

長思維鏈推理

專門針對複雜推理任務優化，支持長思維鏈推理過程

多語言支持

支持中文和英文處理

開源商用

採用Apache 2.0許可證，支持免費商用

模型能力

文本生成

複雜推理

問答系統

數學計算

代碼生成

使用案例

教育

數學問題解答

解決複雜的數學應用題

在AIME24測試中獲得54.2分

科研

科學問題推理

處理需要多步推理的科學問題

在GPQA鑽石級測試中獲得40分

🚀 360Zhinao3 (360智腦)

360智腦3是奇虎360開源並升級的模型，具備多種能力且可免費商用。其在多個基準測試中表現優異，為自然語言處理等領域提供了強大支持。

🤗 HuggingFace | 💬 WeChat (微信)

歡迎訪問360智腦的官方網站 https://ai.360.com 進行更多體驗。

🚀 快速開始

使用🤗Transformers快速使用模型

以下是使用🤗Transformers快速使用360Zhinao3-7B、360Zhinao3-7B-Instruct和360Zhinao3-7B-O1.5的簡單示例。

基礎模型推理示例

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 1024

inputs = tokenizer('中國二十四節氣\n1. 立春\n2. 雨水\n3. 驚蟄\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

指令模型推理示例

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048

messages = []

#round-1
print(f"user: 簡單介紹一下劉德華")
messages.append({"role": "user", "content": "簡單介紹一下劉德華"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")


#round-1
print(f"user: 他有什麼代表作?")
messages.append({"role": "user", "content": "他有什麼代表作?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")

長思維鏈模型推理示例

import re
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-O1.5"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048


def extract_thinking_and_answer(input_string):
    thinking, answer = "", ""
    # 提取答案
    pattern_answer = r'.*</think>(.*)$'
    match_answer = re.search(pattern_answer, input_string, re.S)
    if match_answer:
        answer = match_answer.group(1)
    else:
        return thinking, input_string

    # 提取思考過程
    pattern_thinking = r'<think>(.*?)</think>'
    match_thinking = re.search(pattern_thinking, input_string, re.S)
    if match_thinking:
        thinking = match_thinking.group(1)

    return thinking, answer


messages = []
messages.append({"role": "user", "content": "現有一籠子，裡面有雞和兔子若干只，數一數，共有頭14個，腿38條，求雞和兔子各有多少隻？"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
thinking, answer = extract_thinking_and_answer(response)
messages.append({"role": "assistant", "content": answer, "reasoning_content": thinking})
print(json.dumps(messages, ensure_ascii=False, indent=4))

✨ 主要特性

開源升級：奇虎360開源並升級了自研的7B參數模型360Zhinao3-7B，現已在Github開源社區360zhinao3發佈，可免費用於商業用途。
性能提升：360Zhinao3-7B在360Zhinao2-7B的基礎上使用700B高質量token進行持續預訓練，模型性能的提升主要源於訓練數據質量的提高。
多場景適用：提供了基礎模型、指令模型和長思維鏈模型等多種類型，滿足不同場景的需求。

📦 下載地址

大小	模型	BF16
7B	360Zhinao3-7B	🤗
7B	360Zhinao3-7B-Instruct	🤗
7B	360Zhinao3-7B-O1.5	🤗

📚 詳細文檔

模型評估

基礎模型

使用開源工具opencompass對模型進行了多維度評估，該模型在小於10B參數的模型中基準平均得分排名第一，在同規模模型中具有競爭力。

類型	數據集	語言	glm4 - 9b	Qwen2.5 - 7B	internlm2.5 - 7b	Yi1.5 - 9B	gemma2 - 9b	Llama3.1 - 8B	360Zhinao2 - 7B	360Zhinao3 - 7B
考試	ceval	zh	75.83	81.41	77.71	73.51	56.36	51.67	83.04	84.7
考試	mmlu	en	75.5	75.5	71.55	71.43	72.22	66.75	67.84	75.42
考試	cmmlu	zh	74.24	81.79	78.77	74.2	58.89	52.49	73.8	82.17
考試	ARC - c	en	94.92	80	85.08	87.46	77.63	80.68	87.12	88.14
考試	ARC - e	en	98.41	84.83	95.24	94.53	78.84	89.77	92.77	94
語言	WiC	en	51.57	52.82	50.78	50.63	50.47	50	49.84	50.31
語言	WSC	en	68.27	68.27	69.23	66.35	68.27	67.31	65.38	71.15
知識	BoolQ	en	81.8	83.88	89.51	84.46	85.6	82.2	88.29	88.38
知識	commonsense_qa	en	71.17	73.22	68.55	71.58	68.47	71.25	69.78	71.33
理解	C3	zh	91.51	92	93.04	85.86	81.64	83.51	93.26	92.77
理解	race - middle	en	91.99	91.02	92.06	91.16	88.09	81.69	90.46	90.04
理解	race - high	en	90.71	87.91	90.08	88.34	82.08	78.73	86.74	85.96
理解	lcsts	zh	18.29	15.82	15.96	16.49	10.62	17.29	18.61	18.85
理解	eprstmt - dev	zh	91.88	86.88	91.25	91.88	48.12	83.12	90	92.50
理解	lambada	en	71.67	71.14	69.98	70.64	75.43	74.23	72.56	68.17
推理	hellaswag	en	70.25	72.76	70.38	71.55	66.83	74.65	71.49	73.61
推理	siqa	en	81.73	72.52	78.97	76.2	58.96	64.18	77.12	79.02
推理	bbh	en	73.68	54.63	59.43	67.86	68.45	59.9	46.54	73.74
代碼	humaneval	en	69.51	75	60.37	26.22	5.49	27.44	60.98	64.63
代碼	mbpp	en	60	60	43.6	56.8	51.2	42.6	54	67.80
數學	math	en	26.86	38	27.14	27.06	28.52	15.32	38.34	37.60
數學	gsm8k	en	78.54	79.76	52.54	71.11	73.09	56.25	75.51	78.77
總體	avg_zh		70.35	71.58	71.35	68.39	51.13	57.62	71.74	74.20
總體	avg_all		73.11	71.78	69.60	68.88	61.60	62.32	70.61	74.83

指令模型

在IFEval、MT - bench和CF - Bench三個流行評估中對360Zhinao3 - 7B - Instruct模型進行了評估和比較。MT - bench和CFBench在同級別開源模型中均排名第一，具有較強的競爭力。在IFEval（嚴格提示）中，僅次於glm4 - 9b，在7B規模中得分最高。

模型	MT - bench	IFEval(嚴格提示)	CFBench(CSR,ISR,PSR)
Qwen2.5 - 7B - Instruct	8.07	0.556	0.81	0.46	0.57
Yi - 9B - 16k - Chat	7.44	0.455	0.75	0.4	0.52
GLM4 - 9B - Chat	8.08	0.634	0.82	0.48	0.61
InternLM2.5 - 7B - Chat	7.39	0.540	0.78	0.4	0.54
360Zhinao2 - 7B - Chat - 4k	7.86	0.577	0.8	0.44	0.57
360Zhinao3 - 7B - Instruct	8.17	0.626	0.83	0.52	0.64

長思維鏈模型

使用之前開源的智腦[Light - R1](https://github.com/Qihoo360/Light - R1)方法對360Zhinao3 - 7B - Instruct的長思維鏈進行了繼續微調，以及RFT和GRPO。與最新的OpenThinker2 - 7B相比仍有一定差距，但超越了所有基於通用Qwen2.5 - 7B - Instruct的先前模型。

模型	日期	基礎模型	AIME24	AIME25	GPQA Diamond
OpenThinker2 - 7B	25.4.3	Qwen2.5 - 7B - Instruct	50	33.3	49.3
OpenThinker - 7B	25.1.28	Qwen2.5 - 7B - Instruct	31.3	23.3	42.4
360Zhinao3 - 7B - O1.5	25.4.14	360Zhinao3 - 7B - Instruct	54.2	36.3	40.0
OpenR1 - Qwen - 7B	25.2.11	Qwen2.5 - Math - 7B - Instruct	48.7	34.7	21.2
DeepSeek - R1 - Distill - Qwen - 7B	25.1.20	Qwen2.5 - Math - 7B - Instruct	57.3	33.3	47.3
Light - R1 - 7B - DS	25.3.12	DeepSeek - R1 - Distill - Qwen - 7B	59.1	44.3	49.4
Areal - boba - RL - 7B	25.3.31	DeepSeek - R1 - Distill - Qwen - 7B	61.9	48.3	47.6

模型推理

部署

vLLM安裝

推薦使用 vllm==0.6.0。

如果使用 CUDA 12.1和PyTorch 2.1，可以直接使用以下命令安裝vLLM：

pip install  vllm==0.6.0

否則，請參考vLLM官方安裝說明。

安裝完成後，執行以下步驟：

將 vllm/zhinao.py 複製到vllm安裝目錄（python/conda環境）下的 vllm/model_executor/models 中。
然後在 vllm/model_executor/models/__init__.py 中添加一行：

"ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),

vLLM服務啟動

啟動服務：

python -m vllm.entrypoints.openai.api_server \
    --model qihoo360/360Zhinao3-7B-O1.5 \
    --served-model-name 360Zhinao3-7B-O1.5 \
    --port 8360 \
    --host 0.0.0.0 \
    --dtype bfloat16 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.8 \
    --trust-remote-code

使用curl請求服務：

curl http://localhost:8360/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "360Zhinao3-7B-O1.5",
    "max_tokens": 200,
    "top_k": -1,
    "top_p": 0.8,
    "temperature": 1.0,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"}
    ],
    "stop": [
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ]
}'

使用python請求服務：

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8360/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="360Zhinao3-7B-O1.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"},
    ],
    stop=[
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ],
    presence_penalty=0.0,
    frequency_penalty=0.0
)
print("Chat response:", chat_response)

⚠️ 重要提示

如果需要啟用重複懲罰，建議設置 presence_penalty 和 frequency_penalty 而不是 repetition_penalty。

模型微調

訓練數據

訓練數據：data/training_data_sample.json。此示例數據從 multiturn_chat_0.8M 中採樣了10000行並進行了格式轉換。

數據格式：

[
  {
    "id": 1,
    "conversations": [
        {
            "from": "system",
            "value": "You are a helpful assistant."
        },
        {
            "from": "user",
            "value": "您好啊"
        },
        {
            "from": "assistant",
            "value": "你好！我今天能為您做些什麼？有什麼問題或需要幫助嗎? 我在這裡為您提供服務。"
        }
    ]
  }
]

微調腳本

set -x

HOSTFILE=hostfile
DS_CONFIG=./finetune/ds_config_zero2.json

# PARAMS
LR=5e-6
EPOCHS=3
MAX_LEN=32768
BATCH_SIZE=4
NUM_NODES=1
NUM_GPUS=8
MASTER_PORT=29500

IS_CONCAT=False # Whether to concatenate to maximum length (MAX_LEN)

DATA_PATH="./data/training_data_sample.json"
MODEL_PATH="qihoo360/360Zhinao3-7B-Instruct"
OUTPUT_DIR="./outputs/"

deepspeed --hostfile ${HOSTFILE} \
        --master_port ${MASTER_PORT} \
        --num_nodes ${NUM_NODES} \
        --num_gpus ${NUM_GPUS} \
        finetune.py \
        --report_to "tensorboard" \
        --data_path ${DATA_PATH} \
        --model_name_or_path ${MODEL_PATH} \
        --output_dir ${OUTPUT_DIR} \
        --model_max_length ${MAX_LEN} \
        --num_train_epochs ${EPOCHS} \
        --per_device_train_batch_size ${BATCH_SIZE} \
        --gradient_accumulation_steps 1 \
        --save_strategy steps \
        --save_steps 200 \
        --learning_rate ${LR} \
        --lr_scheduler_type cosine \
        --adam_beta1 0.9 \
        --adam_beta2 0.95 \
        --adam_epsilon 1e-8 \
        --max_grad_norm 1.0 \
        --weight_decay 0.1 \
        --warmup_ratio 0.01 \
        --gradient_checkpointing True \
        --bf16 True \
        --tf32 True \
        --deepspeed ${DS_CONFIG} \
        --is_concat ${IS_CONCAT} \
        --logging_steps 1 \
        --log_on_each_node False