🚀 MiniCPM
MiniCPM is a series of end-side large language models jointly open-sourced by ModelBest and the Natural Language Processing Laboratory of Tsinghua University. The main language model, MiniCPM-2B, has only 2.4 billion non-word embedding parameters. It offers high performance comparable to larger models, can be deployed on smartphones, and has low development costs.
🚀 Quick Start
To start using MiniCPM, you need to install transformers>=4.36.0
and accelerate
. Then run the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)
path = 'openbmb/MiniCPM-2B-sft-fp32'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.float32, device_map='cuda', trust_remote_code=True)
responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?", temperature=0.8, top_p=0.8)
print(responds)
Expected Output:
山东省最高的山是泰山,海拔1545米。
相对于黄山(海拔1864米),泰山海拔较低,相差约319米。
Warning: It is necessary to specify the data type of the model clearly in from_pretrained
, otherwise large calculation errors will be caused.
✨ Features
- High Performance: After SFT, MiniCPM performs similarly to Mistral - 7B on public comprehensive evaluation sets, and is superior in Chinese, mathematics, and coding abilities. Its overall performance surpasses models like Llama2 - 13B, MPT - 30B, and Falcon - 40B. After DPO, it outperforms many representative open - source large models such as Llama2 - 70B - Chat, Vicuna - 33B, Mistral - 7B - Instruct - v0.1, and Zephyr - 7B - alpha on MTBench.
- Multimodal Capability: MiniCPM - V, based on MiniCPM - 2B, achieves the best overall performance among multimodal models of the same scale, surpassing existing multimodal large models built on Phi - 2 and achieving performance comparable to or even better than 9.6B Qwen - VL - Chat on some evaluation sets.
- Mobile Deployment: After Int4 quantization, MiniCPM can be deployed and inferred on mobile phones, with a streaming output speed slightly higher than the human speaking speed. MiniCPM - V is the first multimodal large model to be deployed on mobile phones.
- Low Development Cost: Parameter - efficient fine - tuning can be carried out with a single 1080/2080 GPU, and full - parameter fine - tuning can be carried out with a 3090/4090 GPU.
📦 Installation
Run the following command to install the necessary libraries:
pip install transformers>=4.36.0 accelerate
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)
path = 'openbmb/MiniCPM-2B-sft-fp32'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.float32, device_map='cuda', trust_remote_code=True)
responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?", temperature=0.8, top_p=0.8)
print(responds)
📚 Documentation
Evaluation Results
Detailed evaluation results are in github repo.
Notice: We discovered that the quality of Huggingface generation is slightly lower than vLLM, thus benchmarking using vLLM is recommended. We are investigating the cause now.
Limitations
- Hallucination Issues: Due to the limitation of model scale, the model may have hallucinatory problems. Since the DPO model generates longer response content, it is more prone to hallucinations. We will continue to iterate and improve the MiniCPM model.
- Identity Information: To ensure the universality of the model for academic research, we did not conduct any identity training on the model. As we use ShareGPT open - source corpus as part of the training data, the model may output identity information similar to the GPT series models.
- Prompt Sensitivity: Due to the limitation of model scale, the model's output is greatly influenced by prompt words, which may result in inconsistent results from multiple attempts.
- Knowledge Memory: Due to limited model capacity, the model's knowledge memory is not accurate. In the future, we will combine the RAG method to enhance the model's knowledge memory ability.
🔧 Technical Details
Not provided in the original document, so this section is skipped.
📄 License
Model LICENSE
- This repository is released under the Apache - 2.0 License.
- The usage of MiniCPM model weights must strictly follow the General Model License (GML).
- The models and weights of MiniCPM are completely free for academic research.
- If you intend to utilize the model for commercial purposes, please reach out to cpm@modelbest.cn to obtain the certificate of authorization.
Statement
- As a language model, MiniCPM generates content by learning from a vast amount of text. However, it does not possess the ability to comprehend or express personal opinions or value judgments. Any content generated by MiniCPM does not represent the viewpoints or positions of the model developers.
- Therefore, when using content generated by MiniCPM, users should take full responsibility for evaluating and verifying it on their own.
📚 Citation
If you find MiniCPM useful for your work, please cite our technical report:
@inproceedings{minicpm2024,
title={MiniCPM:Unveiling the Potential of End-side Large Language Models},
booktitle={OpenBMB Blog},
year={2024}
}
📦 Model Download