360 智脳 3 - 7B - O1.5 オープンソースモデル - 無料展開で複雑な推論タスクや長い思考連鎖サポート

ホーム

360zhinao3 7B O1.5

qihoo360によって開発

360智脳3-7B-O1.5は奇虎360がオープンソース化した長思考連鎖モデルで、360智脳3-7B-Instructをファインチューニングしており、複雑な推論タスクをサポートします。

大規模言語モデル

Transformers

複数言語対応オープンソースライセンス:Apache-2.0 #多言語大規模モデル #長文推論 #オープンソース商用利用

ダウンロード数 35

リリース時間 : 4/23/2025

モデル概要

360智脳3シリーズモデルは奇虎360がオープンソース化した7Bパラメータの大規模言語モデルで、基本版、指示版、長思考連鎖版が含まれます。O1.5バージョンは複雑な推論タスク向けに最適化され、長思考連鎖推論をサポートします。

モデル特徴

長思考連鎖推論

複雑な推論タスク向けに特別に最適化され、長思考連鎖推論プロセスをサポート

多言語サポート

中国語と英語の処理をサポート

オープンソース商用利用

Apache 2.0ライセンスを採用し、無料での商用利用をサポート

モデル能力

テキスト生成

複雑な推論

質問応答システム

数学計算

コード生成

使用事例

教育

数学問題解答

複雑な数学応用問題を解決

AIME24テストで54.2点を獲得

研究

科学問題推論

多段階の推論を必要とする科学問題を処理

GPQAダイヤモンドレベルテストで40点を獲得

🚀 360Zhinao3 (360智脑)

360Zhinao3は、奇虎360が開発したAIモデルです。このモデルは、多様なタスクで優れた性能を発揮し、GitHubやHugging Faceでオープンソースとして公開されています。商用利用も無料で可能です。

🤗 HuggingFace | 💬 WeChat (微信)

360Zhinaoの公式ウェブサイト https://ai.360.com をご覧いただき、より詳しい体験をお楽しみください。

✨ 主な機能

360Zhinao3-7Bは、360Zhinao2-7Bをベースに700Bの高品質トークンで継続的に事前学習されています。
モデルの性能向上は、主に学習データの品質向上に起因しています。
360Zhinao3シリーズのモデルは、多くのベンチマークで優れた結果を達成しています。

📦 ニュースと更新情報

[2025.04.14] 🔥🔥🔥360Zhinao3シリーズのモデルをリリースし、360Zhinao3-7B、360Zhinao3-7B-Instruct、および長思考チェーンモデル360Zhinao3-7B-O1.5を公開しました。
[2024.11.18] 360Zhinao2-7Bをリリースし、Baseモデルとテキスト長4K、32K、360KのChatモデルへのアクセスを提供します。
[2024.05.23] 2つのモデル、360Zhinao-searchと360Zhinao-1.8B-Rerankingをリリースし、C-MTEB LeaderboardのRetrievalとRerankingタスクでそれぞれ1位にランクインしました。
[2024.05.20] llama3を拡張し、llama3-8B-360Zhinao-360k-Instruct🤗をリリースしました。
[2024.04.12] 360Zhinao-7B v1.0をリリースし、ベースモデルとコンテキスト長4K、32K、360Kの3つのチャットモデルを含みます。技術レポートはarXivにあります。

📦 ダウンロードURL

サイズ	モデル	BF16
7B	360Zhinao3-7B	🤗
7B	360Zhinao3-7B-Instruct	🤗
7B	360Zhinao3-7B-O1.5	🤗

📊 モデル評価

ベースモデル

オープンソースツールのopencompassを使用して、モデルの多次元評価を行いました。モデルのベンチマーク平均スコアは、10B未満のパラメータを持つモデルの中で1位です。同じサイズのモデルと比較して競争力があります。

タイプ	データセット	言語	glm4-9b	Qwen2.5-7B	internlm2.5-7b	Yi1.5-9B	gemma2-9b	Llama3.1-8B	360Zhinao2-7B	360Zhinao3-7B
試験	ceval	zh	75.83	81.41	77.71	73.51	56.36	51.67	83.04	84.7
試験	mmlu	en	75.5	75.5	71.55	71.43	72.22	66.75	67.84	75.42
試験	cmmlu	zh	74.24	81.79	78.77	74.2	58.89	52.49	73.8	82.17
試験	ARC-c	en	94.92	80	85.08	87.46	77.63	80.68	87.12	88.14
試験	ARC-e	en	98.41	84.83	95.24	94.53	78.84	89.77	92.77	94
言語	WiC	en	51.57	52.82	50.78	50.63	50.47	50	49.84	50.31
言語	WSC	en	68.27	68.27	69.23	66.35	68.27	67.31	65.38	71.15
知識	BoolQ	en	81.8	83.88	89.51	84.46	85.6	82.2	88.29	88.38
知識	commonsense_qa	en	71.17	73.22	68.55	71.58	68.47	71.25	69.78	71.33
理解	C3	zh	91.51	92	93.04	85.86	81.64	83.51	93.26	92.77
理解	race-middle	en	91.99	91.02	92.06	91.16	88.09	81.69	90.46	90.04
理解	race-high	en	90.71	87.91	90.08	88.34	82.08	78.73	86.74	85.96
理解	lcsts	zh	18.29	15.82	15.96	16.49	10.62	17.29	18.61	18.85
理解	eprstmt-dev	zh	91.88	86.88	91.25	91.88	48.12	83.12	90	92.50
理解	lambada	en	71.67	71.14	69.98	70.64	75.43	74.23	72.56	68.17
推論	hellaswag	en	70.25	72.76	70.38	71.55	66.83	74.65	71.49	73.61
推論	siqa	en	81.73	72.52	78.97	76.2	58.96	64.18	77.12	79.02
推論	bbh	en	73.68	54.63	59.43	67.86	68.45	59.9	46.54	73.74
コード	humaneval	en	69.51	75	60.37	26.22	5.49	27.44	60.98	64.63
コード	mbpp	en	60	60	43.6	56.8	51.2	42.6	54	67.80
数学	math	en	26.86	38	27.14	27.06	28.52	15.32	38.34	37.60
数学	gsm8k	en	78.54	79.76	52.54	71.11	73.09	56.25	75.51	78.77
全体	avg_zh		70.35	71.58	71.35	68.39	51.13	57.62	71.74	74.20
全体	avg_all		73.11	71.78	69.60	68.88	61.60	62.32	70.61	74.83

インストラクションモデル

360Zhinao3-7B-Instructモデルを、IFEval、MT-bench、およびCF-Benchの3つの人気評価で評価し、比較しました。MT-benchとCFBenchは、同レベルのオープンソースモデルの中でどちらも1位で、強い競争力を持っています。IFEval（プロンプト厳格）では、glm4-9bに次いで2位で、7Bサイズで最高のスコアを獲得しています。

モデル	MT-bench	IFEval(厳格なプロンプト)	CFBench(CSR,ISR,PSR)
Qwen2.5-7B-Instruct	8.07	0.556	0.81	0.46	0.57
Yi-9B-16k-Chat	7.44	0.455	0.75	0.4	0.52
GLM4-9B-Chat	8.08	0.634	0.82	0.48	0.61
InternLM2.5-7B-Chat	7.39	0.540	0.78	0.4	0.54
360Zhinao2-7B-Chat-4k	7.86	0.577	0.8	0.44	0.57
360Zhinao3-7B-Instruct	8.17	0.626	0.83	0.52	0.64

長思考チェーンモデル

以前にオープンソース化された知脑のLight-R1手法を使用して、360Zhinao3-7B-Instructの長思考チェーン、およびRFTとGRPOの微調整を続けました。最新のOpenThinker2-7Bと比較するとまだ一定のギャップがありますが、一般的なQwen2.5-7B-Instructをベースとしたこれまでのすべてのモデルを上回っています。

モデル	日付	ベースモデル	AIME24	AIME25	GPQAダイヤモンド
OpenThinker2-7B	25.4.3	Qwen2.5-7B-Instruct	50	33.3	49.3
OpenThinker-7B	25.1.28	Qwen2.5-7B-Instruct	31.3	23.3	42.4
360Zhinao3-7B-O1.5	25.4.14	360Zhinao3-7B-Instruct	54.2	36.3	40.0
OpenR1-Qwen-7B	25.2.11	Qwen2.5-Math-7B-Instruct	48.7	34.7	21.2
DeepSeek-R1-Distill-Qwen-7B	25.1.20	Qwen2.5-Math-7B-Instruct	57.3	33.3	47.3
Light-R1-7B-DS	25.3.12	DeepSeek-R1-Distill-Qwen-7B	59.1	44.3	49.4
Areal-boba-RL-7B	25.3.31	DeepSeek-R1-Distill-Qwen-7B	61.9	48.3	47.6

💻 使用例

クイックスタート

🤗Transformersを使用して、360Zhinao3-7B、360Zhinao3-7B-Instruct、および360Zhinao3-7B-O1.5をすぐに使う方法を説明する簡単な例です。

基本的な使用法

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 1024

inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

インストラクションモデル推論のデモ

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048

messages = []

#round-1
print(f"user: 简单介绍一下刘德华")
messages.append({"role": "user", "content": "简单介绍一下刘德华"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")


#round-1
print(f"user: 他有什么代表作?")
messages.append({"role": "user", "content": "他有什么代表作?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")

長思考チェーンモデル推論のデモ

import re
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-O1.5"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048


def extract_thinking_and_answer(input_string):
    thinking, answer = "", ""
    # 提取答案
    pattern_answer = r'.*</think>(.*)$'
    match_answer = re.search(pattern_answer, input_string, re.S)
    if match_answer:
        answer = match_answer.group(1)
    else:
        return thinking, input_string

    # 提取思考过程
    pattern_thinking = r'<think>(.*?)</think>'
    match_thinking = re.search(pattern_thinking, input_string, re.S)
    if match_thinking:
        thinking = match_thinking.group(1)

    return thinking, answer


messages = []
messages.append({"role": "user", "content": "现有一笼子，里面有鸡和兔子若干只，数一数，共有头14个，腿38条，求鸡和兔子各有多少只？"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
thinking, answer = extract_thinking_and_answer(response)
messages.append({"role": "assistant", "content": answer, "reasoning_content": thinking})
print(json.dumps(messages, ensure_ascii=False, indent=4))

🔧 モデル推論

デプロイメント

vLLMのインストール

vllm==0.6.0の使用をお勧めします。

CUDA 12.1とPyTorch 2.1を使用している場合は、次のコマンドで直接vLLMをインストールできます。

pip install  vllm==0.6.0

それ以外の場合は、公式のvLLM インストール手順を参照してください。

インストール後、次の手順を実行します。

vllm/zhinao.pyを、vLLMのインストールディレクトリ（python/conda環境）のvllm/model_executor/modelsにコピーします。
次に、vllm/model_executor/models/__init__.pyに次の行を追加します。

"ZhinaoForCausalLM": ("zhinao", "ZhinaoForCausalLM"),

vLLMサービスの起動

サービスを起動します。

python -m vllm.entrypoints.openai.api_server \
    --model qihoo360/360Zhinao3-7B-O1.5 \
    --served-model-name 360Zhinao3-7B-O1.5 \
    --port 8360 \
    --host 0.0.0.0 \
    --dtype bfloat16 \
    --tensor-parallel-size 4 \
    --gpu-memory-utilization 0.8 \
    --trust-remote-code

curlを使用してサービスにリクエストを送信します。

curl http://localhost:8360/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "360Zhinao3-7B-O1.5",
    "max_tokens": 200,
    "top_k": -1,
    "top_p": 0.8,
    "temperature": 1.0,
    "presence_penalty": 0.0,
    "frequency_penalty": 0.0,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"}
    ],
    "stop": [
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ]
}'

Pythonを使用してサービスにリクエストを送信します。

from openai import OpenAI
# Set OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8360/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

chat_response = client.chat.completions.create(
    model="360Zhinao3-7B-O1.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "你好"},
    ],
    stop=[
        "<eod>",
        "<|im_end|>",
        "<|im_start|>"
    ],
    presence_penalty=0.0,
    frequency_penalty=0.0
)
print("Chat response:", chat_response)

繰り返しペナルティを有効にする必要がある場合は、repetition_penaltyではなくpresence_penaltyとfrequency_penaltyの設定をお勧めします。

🔧 モデル微調整

学習データ

学習データ: data/training_data_sample.json。このサンプルデータは、multiturn_chat_0.8Mからサンプリングされた10,000行のデータで、形式が変換されています。

データ形式:

[
  {
    "id": 1,
    "conversations": [
        {
            "from": "system",
            "value": "You are a helpful assistant."
        },
        {
            "from": "user",
            "value": "您好啊"
        },
        {
            "from": "assistant",
            "value": "你好！我今天能为您做些什么？有什么问题或需要帮助吗? 我在这里为您提供服务。"
        }
    ]
  }
]

微調整スクリプト

set -x

HOSTFILE=hostfile
DS_CONFIG=./finetune/ds_config_zero2.json

# PARAMS
LR=5e-6
EPOCHS=3
MAX_LEN=32768
BATCH_SIZE=4
NUM_NODES=1
NUM_GPUS=8
MASTER_PORT=29500

IS_CONCAT=False # Whether to concatenate to maximum length (MAX_LEN)

DATA_PATH="./data/training_data_sample.json"
MODEL_PATH="qihoo360/360Zhinao3-7B-Instruct"
OUTPUT_DIR="./outputs/"

deepspeed --hostfile ${HOSTFILE} \
        --master_port ${MASTER_PORT} \
        --num_nodes ${NUM_NODES} \
        --num_gpus ${NUM_GPUS} \
        finetune.py \
        --report_to "tensorboard" \
        --data_path ${DATA_PATH} \
        --model_name_or_path ${MODEL_PATH} \
        --output_dir ${OUTPUT_DIR} \
        --model_max_length ${MAX_LEN} \
        --num_train_epochs ${EPOCHS} \
        --per_device_train_batch_size ${BATCH_SIZE} \
        --gradient_accumulation_steps 1 \
        --save_strategy steps \
        --save_steps 200 \
        --learning_rate ${LR} \
        --lr_scheduler_type cosine \
        --adam_beta1 0.9 \
        --adam_beta2 0.95 \
        --adam_epsilon 1e-8 \
        --max_grad_norm 1.0 \
        --weight_decay 0.1 \
        --warmup_ratio 0.01 \
        --gradient_checkpointing True \
        --bf16 True \
        --tf32 True \
        --deepspeed ${DS_CONFIG} \
        --is_concat ${IS_CONCAT} \
        --logging_steps 1 \
        --log_on_each_node False