license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
library_name: transformers
MiniCPM Repo |
MiniCPM Paper |
MiniCPM-V Repo |
Join us in Discord and WeChat
Introduction
MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.
Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to Advanced Features for usage guidelines.
MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.
Usage
Inference with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
path = "openbmb/MiniCPM3-4B"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
messages = [
{"role": "user", "content": "推荐5个北京的景点。"},
]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)
model_outputs = model.generate(
model_inputs,
max_new_tokens=1024,
top_p=0.7,
temperature=0.7
)
output_token_ids = [
model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
]
responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
print(responses)
Inference with vLLM
For now, you need to install our forked version of vLLM.
pip install git+https://github.com/OpenBMB/vllm.git@minicpm3
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
model_name = "openbmb/MiniCPM3-4B"
prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
llm = LLM(
model=model_name,
trust_remote_code=True,
tensor_parallel_size=1
)
sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)
Evaluation Results
Benchmark |
Qwen2-7B-Instruct |
GLM-4-9B-Chat |
Gemma2-9B-it |
Llama3.1-8B-Instruct |
GPT-3.5-Turbo-0125 |
Phi-3.5-mini-Instruct(3.8B) |
MiniCPM3-4B |
English |
MMLU |
70.5 |
72.4 |
72.6 |
69.4 |
69.2 |
68.4 |
67.2 |
BBH |
64.9 |
76.3 |
65.2 |
67.8 |
70.3 |
68.6 |
70.2 |
MT-Bench |
8.41 |
8.35 |
7.88 |
8.28 |
8.17 |
8.60 |
8.41 |
IFEVAL (Prompt Strict-Acc.) |
51.0 |
64.5 |
71.9 |
71.5 |
58.8 |
49.4 |
68.4 |
Chinese |
CMMLU |
80.9 |
71.5 |
59.5 |
55.8 |
54.5 |
46.9 |
73.3 |
CEVAL |
77.2 |
75.6 |
56.7 |
55.2 |
52.8 |
46.1 |
73.6 |
AlignBench v1.1 |
7.10 |
6.61 |
7.10 |
5.68 |
5.82 |
5.73 |
6.74 |
FollowBench-zh (SSR) |
63.0 |
56.4 |
57.0 |
50.6 |
64.6 |
58.1 |
66.8 |
Math |
MATH |
49.6 |
50.6 |
46.0 |
51.9 |
41.8 |
46.4 |
46.6 |
GSM8K |
82.3 |
79.6 |
79.7 |
84.5 |
76.4 |
82.7 |
81.1 |
MathBench |
63.4 |
59.4 |
45.8 |
54.3 |
48.9 |
54.9 |
65.6 |
Code |
HumanEval+ |
70.1 |
67.1 |
61.6 |
62.8 |
66.5 |
68.9 |
68.3 |
MBPP+ |
57.1 |
62.2 |
64.3 |
55.3 |
71.4 |
55.8 |
63.2 |
LiveCodeBench v3 |
22.2 |
20.2 |
19.2 |
20.4 |
24.0 |
19.6 |
22.6 |
Function Call |
BFCL v2 |
71.6 |
70.1 |
19.2 |
73.3 |
75.4 |
48.4 |
76.0 |
Overall |
Average |
65.3 |
65.0 |
57.9 |
60.8 |
61.0 |
57.2 |
66.3 |
Statement
- As a language model, MiniCPM3-4B generates content by learning from a vast amount of text.
- However, it does not possess the ability to comprehend or express personal opinions or value judgments.
- Any content generated by MiniCPM3-4B does not represent the viewpoints or positions of the model developers.
- Therefore, when using content generated by MiniCPM3-4B, users should take full responsibility for evaluating and verifying it on their own.
LICENSE
- This repository is released under the Apache-2.0 License.
- The usage of MiniCPM3-4B model weights must strictly follow MiniCPM Model License.md.
- The models and weights of MiniCPM3-4B are completely free for academic research. after filling out a "questionnaire" for registration, are also available for free commercial use.
Citation
@article{hu2024minicpm,
title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies},
author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others},
journal={arXiv preprint arXiv:2404.06395},
year={2024}
}