360 Zhinao 3-7B-O1.5 Open-source Model - Free Deployment, Supports Complex Reasoning Tasks and Long Thought Chains

360zhinao3 7B O1.5

Developed by qihoo360

360 Zhinao 3-7B-O1.5 is a long chain-of-thought model open-sourced by Qihoo 360, fine-tuned based on 360 Zhinao 3-7B-Instruct, supporting complex reasoning tasks.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Apache-2.0 #Multilingual Large Model #Long-Text Reasoning #Open Source Commercial Use

Downloads 35

Release Time : 4/23/2025

Model Overview

The 360 Zhinao 3 series models are open-source 7B-parameter large language models from Qihoo 360, including the base version, instruction version, and long chain-of-thought version. The O1.5 version is optimized for complex reasoning tasks and supports long chain-of-thought reasoning.

Model Features

Long Chain-of-Thought Reasoning

Specially optimized for complex reasoning tasks, supporting long chain-of-thought reasoning processes.

Multilingual Support

Supports processing in both Chinese and English.

Open Source Commercial Use

Licensed under Apache 2.0, allowing free commercial use.

Model Capabilities

Text Generation

Complex Reasoning

Question Answering

Mathematical Calculation

Code Generation

Use Cases

Education

Math Problem Solving

Solving complex mathematical word problems

Achieved a score of 54.2 on the AIME24 test

Research

Scientific Problem Reasoning

Handling scientific problems requiring multi-step reasoning

Achieved 40 points on the GPQA Diamond Level test

🚀 360Zhinao3 (360 Zhinao)

360Zhinao3 is a powerful model open - sourced by Qihoo 360. It offers various versions with enhanced capabilities, and can be used for commercial purposes free of charge. You can access more information and experience it on the official website.

🚀 Quick Start

Here is a simple example to illustrate how to quickly use 360Zhinao3-7B, 360Zhinao3-7B-Instruct, and 360Zhinao3-7B-O1.5 with 🤗Transformers.

💻 Usage Examples

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH, 
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 1024

inputs = tokenizer('中国二十四节气\n1. 立春\n2. 雨水\n3. 惊蛰\n4. 春分\n5. 清明\n', return_tensors='pt')
inputs = inputs.to(model.device)

pred = model.generate(input_ids=inputs["input_ids"], generation_config=generation_config)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Advanced Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048

messages = []

#round-1
print(f"user: 简单介绍一下刘德华")
messages.append({"role": "user", "content": "简单介绍一下刘德华"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")


#round-1
print(f"user: 他有什么代表作?")
messages.append({"role": "user", "content": "他有什么代表作?"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")

import re
import json
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig

MODEL_NAME_OR_PATH = "qihoo360/360Zhinao3-7B-O1.5"

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True).cuda()

generation_config = GenerationConfig.from_pretrained(
    MODEL_NAME_OR_PATH,
    trust_remote_code=True)
generation_config.max_new_tokens = 2048

messages = []

#round-1
print(f"user: 请详细解答这道数学题：[具体数学题内容]")
messages.append({"role": "user", "content": "请详细解答这道数学题：[具体数学题内容]"})
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)
pred = model.generate(input_ids=input_ids, generation_config=generation_config)
response = tokenizer.decode(pred.cpu()[0][len(input_ids[0]):], skip_special_tokens=True)
messages.append({"role": "assistant", "content": response})
print(f"gpt: {response}")

✨ Features

Model Versions

360Zhinao3-7B: Continuously pre - trained with 700B high - quality tokens on the basis of 360Zhinao2-7B.
360Zhinao3-7B-Instruct: Performs well in multiple evaluations, ranking first in some open - source models of the same level.
360Zhinao3-7B-O1.5: Fine - tuned on the basis of 360Zhinao3-7B-Instruct, showing good performance in long - chain reasoning tasks.

Model Performance

The model has achieved excellent results in multiple benchmarks. For example, in the Base Model evaluation, the benchmark average score of 360Zhinao3-7B ranks first among models with less than 10B parameters.

📦 Download URL

Size	Model	BF16
7B	360Zhinao3-7B	🤗
7B	360Zhinao3-7B-Instruct	🤗
7B	360Zhinao3-7B-O1.5	🤗

📚 Documentation

Model Evaluation

Base Model

We used the open - source tool opencompass to conduct multi - dimensional evaluation of the model. The benchmark average score of the model ranks first among models with less than 10B parameters. It is competitive in the same size.

Type	Datasets	language	glm4 - 9b	Qwen2.5 - 7B	internlm2.5 - 7b	Yi1.5 - 9B	gemma2 - 9b	Llama3.1 - 8B	360Zhinao2 - 7B	360Zhinao3 - 7B
Exam	ceval	zh	75.83	81.41	77.71	73.51	56.36	51.67	83.04	84.7
Exam	mmlu	en	75.5	75.5	71.55	71.43	72.22	66.75	67.84	75.42
Exam	cmmlu	zh	74.24	81.79	78.77	74.2	58.89	52.49	73.8	82.17
Exam	ARC - c	en	94.92	80	85.08	87.46	77.63	80.68	87.12	88.14
Exam	ARC - e	en	98.41	84.83	95.24	94.53	78.84	89.77	92.77	94
Language	WiC	en	51.57	52.82	50.78	50.63	50.47	50	49.84	50.31
Language	WSC	en	68.27	68.27	69.23	66.35	68.27	67.31	65.38	71.15
Knowledge	BoolQ	en	81.8	83.88	89.51	84.46	85.6	82.2	88.29	88.38
Knowledge	commonsense_qa	en	71.17	73.22	68.55	71.58	68.47	71.25	69.78	71.33
Understanding	C3	zh	91.51	92	93.04	85.86	81.64	83.51	93.26	92.77
Understanding	race - middle	en	91.99	91.02	92.06	91.16	88.09	81.69	90.46	90.04
Understanding	race - high	en	90.71	87.91	90.08	88.34	82.08	78.73	86.74	85.96
Understanding	lcsts	zh	18.29	15.82	15.96	16.49	10.62	17.29	18.61	18.85
Understanding	eprstmt - dev	zh	91.88	86.88	91.25	91.88	48.12	83.12	90	92.50
Understanding	lambada	en	71.67	71.14	69.98	70.64	75.43	74.23	72.56	68.17
Reasoning	hellaswag	en	70.25	72.76	70.38	71.55	66.83	74.65	71.49	73.61
Reasoning	siqa	en	81.73	72.52	78.97	76.2	58.96	64.18	77.12	79.02
Reasoning	bbh	en	73.68	54.63	59.43	67.86	68.45	59.9	46.54	73.74
Code	humaneval	en	69.51	75	60.37	26.22	5.49	27.44	60.98	64.63
Code	mbpp	en	60	60	43.6	56.8	51.2	42.6	54	67.80
Math	math	en	26.86	38	27.14	27.06	28.52	15.32	38.34	37.60
Math	gsm8k	en	78.54	79.76	52.54	71.11	73.09	56.25	75.51	78.77
Overall	avg_zh		70.35	71.58	71.35	68.39	51.13	57.62	71.74	74.20
Overall	avg_all		73.11	71.78	69.60	68.88	61.60	62.32	70.61	74.83

Instruct Model

We have evaluated and compared the 360Zhinao3-7B-Instruct model on three popular evaluations: IFEval, MT - bench, and CF - Bench. MT - bench and CFBench both rank first among open - source models of the same level and have strong competitiveness. In IFEval (prompt strict), it is second only to glm4 - 9b and has the highest score in the 7B size.

Model	MT - bench	IFEval(strict prompt)	CFBench(CSR,ISR,PSR)
Qwen2.5 - 7B - Instruct	8.07	0.556	0.81	0.46	0.57
Yi - 9B - 16k - Chat	7.44	0.455	0.75	0.4	0.52
GLM4 - 9B - Chat	8.08	0.634	0.82	0.48	0.61
InternLM2.5 - 7B - Chat	7.39	0.540	0.78	0.4	0.54
360Zhinao2 - 7B - Chat - 4k	7.86	0.577	0.8	0.44	0.57
360Zhinao3 - 7B - Instruct	8.17	0.626	0.83	0.52	0.64

Long COT Model

We used the previously open - sourced [Light - R1](https://github.com/Qihoo360/Light - R1) method of Zhinao to continue fine - tuning the Long COT of 360Zhinao3-7B - Instruct, as well as RFT and GRPO. There is still a certain gap compared with the latest OpenThinker2-7B, but it surpasses all previous models based on the general Qwen2.5-7B - Instruct.

Model	Date	Base Model	AIME24	AIME25	GPQA Diamond
OpenThinker2 - 7B	25.4.3	Qwen2.5 - 7B - Instruct	50	33.3	49.3
OpenThinker - 7B	25.1.28	Qwen2.5 - 7B - Instruct	31.3	23.3	42.4
360Zhinao3 - 7B - O1.5	25.4.14	360Zhinao3 - 7B - Instruct	54.2	36.3	40.0
OpenR1 - Qwen - 7B	25.2.11	Qwen2.5 - Math - 7B - Instruct	48.7	34.7	21.2
DeepSeek - R1 - Distill - Qwen - 7B	25.1.20	Qwen2.5 - Math - 7B - Instruct	57.3	33.3	47.3
Light - R1 - 7B - DS	25.3.12	DeepSeek - R1 - Distill - Qwen - 7B	59.1	44.3	49.4
Areal - boba - RL - 7B	25.3.31	DeepSeek - R1 - Distill - Qwen - 7B	61.9	48.3	47.6

📄 License

This project is licensed under the Apache - 2.0 license.

🤗 HuggingFace | 💬 WeChat (WeChat)

Feel free to visit 360Zhinao's official website https://ai.360.com for more experience.

News and Updates

[2025.04.14] 🔥🔥🔥 We have released the 360Zhinao3 series of models, and at the same time opened up 360Zhinao3-7B, 360Zhinao3-7B-Instruct, and the long thought chain model 360Zhinao3-7B-O1.5.
[2024.11.18] We release 360Zhinao2-7B, providing access to both the Base model and Chat models with text lengths of 4K, 32K, and 360K.
[2024.05.23] We released two models, 360Zhinao-search and 360Zhinao-1.8B-Reranking, which ranked first respectively in the Retrieval and Reranking tasks of C - MTEB Leaderboard.
[2024.05.20] We extended llama3 and released llama3-8B-360Zhinao-360k-Instruct 🤗
[2024.04.12] We released 360Zhinao-7B v1.0, including the base model and three chat models with context lengths 4K, 32K and 360K. Technical report is on arXiv.

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご