MiniCPM-2B-sft-fp32 Open-Source Edge Language Model - Lightweight Parameters Meet Diverse Language Needs

Minicpm 2B Sft Fp32

Developed by openbmb

MiniCPM is a series of edge-side large language models jointly open-sourced by Mianbi Intelligence and the Natural Language Processing Laboratory of Tsinghua University. The main language model, MiniCPM-2B, has only 2.4 billion non-word embedding parameters.

Large Language Model

Transformers

Supports Multiple LanguagesOpen Source License:Other #Edge-side large model #Multimodal support #Low-resource deployment

Downloads 218

Release Time : 1/30/2024

Model Overview

MiniCPM is an efficient large language model for edge devices. After supervised fine-tuning (SFT) and direct preference optimization (DPO), it performs excellently in multiple evaluations, supports both Chinese and English, and can be deployed on mobile devices.

Model Features

Efficient edge-side deployment

After Int4 quantization, it can be deployed and inferred on mobile phones, and the streaming output speed is slightly higher than the human speaking speed.

Excellent performance

In public evaluations, its performance is similar to that of Mistral-7B, and its overall performance surpasses models such as Llama2-13B, MPT-30B, and Falcon-40B.

Multimodal support

The edge-side multimodal large model MiniCPM-V built based on MiniCPM-2B outperforms models of the same scale.

Low-cost development

Only one 1080/2080 graphics card is needed to efficiently perform parameter-efficient fine-tuning, and the secondary development cost is relatively low.

Model Capabilities

Text generation

Dialogue system

Multimodal understanding

Code generation

Mathematical reasoning

Use Cases

Intelligent assistant

Question-answering system

Answer various questions raised by users

Performs excellently in Chinese, mathematics, and code capabilities

Education

Learning assistance

Help students answer learning questions

Performs well in mathematical reasoning ability

Development assistance

Code generation

Assist developers in generating code snippets

Performs excellently in code ability evaluations

🚀 MiniCPM

MiniCPM is a series of end-side large language models jointly open-sourced by ModelBest and the Natural Language Processing Laboratory of Tsinghua University. The main language model, MiniCPM-2B, has only 2.4 billion non-word embedding parameters. It offers high performance comparable to larger models, can be deployed on smartphones, and has low development costs.

🚀 Quick Start

To start using MiniCPM, you need to install transformers>=4.36.0 and accelerate. Then run the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM-2B-sft-fp32'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.float32, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？", temperature=0.8, top_p=0.8)
print(responds)

Expected Output:

山东省最高的山是泰山，海拔1545米。

相对于黄山（海拔1864米），泰山海拔较低，相差约319米。

Warning: It is necessary to specify the data type of the model clearly in from_pretrained, otherwise large calculation errors will be caused.

✨ Features

High Performance: After SFT, MiniCPM performs similarly to Mistral - 7B on public comprehensive evaluation sets, and is superior in Chinese, mathematics, and coding abilities. Its overall performance surpasses models like Llama2 - 13B, MPT - 30B, and Falcon - 40B. After DPO, it outperforms many representative open - source large models such as Llama2 - 70B - Chat, Vicuna - 33B, Mistral - 7B - Instruct - v0.1, and Zephyr - 7B - alpha on MTBench.
Multimodal Capability: MiniCPM - V, based on MiniCPM - 2B, achieves the best overall performance among multimodal models of the same scale, surpassing existing multimodal large models built on Phi - 2 and achieving performance comparable to or even better than 9.6B Qwen - VL - Chat on some evaluation sets.
Mobile Deployment: After Int4 quantization, MiniCPM can be deployed and inferred on mobile phones, with a streaming output speed slightly higher than the human speaking speed. MiniCPM - V is the first multimodal large model to be deployed on mobile phones.
Low Development Cost: Parameter - efficient fine - tuning can be carried out with a single 1080/2080 GPU, and full - parameter fine - tuning can be carried out with a 3090/4090 GPU.

📦 Installation

Run the following command to install the necessary libraries:

pip install transformers>=4.36.0 accelerate

💻 Usage Examples

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM-2B-sft-fp32'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.float32, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？", temperature=0.8, top_p=0.8)
print(responds)

📚 Documentation

Evaluation Results

Detailed evaluation results are in github repo.

Notice: We discovered that the quality of Huggingface generation is slightly lower than vLLM, thus benchmarking using vLLM is recommended. We are investigating the cause now.

Limitations

Hallucination Issues: Due to the limitation of model scale, the model may have hallucinatory problems. Since the DPO model generates longer response content, it is more prone to hallucinations. We will continue to iterate and improve the MiniCPM model.
Identity Information: To ensure the universality of the model for academic research, we did not conduct any identity training on the model. As we use ShareGPT open - source corpus as part of the training data, the model may output identity information similar to the GPT series models.
Prompt Sensitivity: Due to the limitation of model scale, the model's output is greatly influenced by prompt words, which may result in inconsistent results from multiple attempts.
Knowledge Memory: Due to limited model capacity, the model's knowledge memory is not accurate. In the future, we will combine the RAG method to enhance the model's knowledge memory ability.

🔧 Technical Details

Not provided in the original document, so this section is skipped.

📄 License

Model LICENSE

This repository is released under the Apache - 2.0 License.
The usage of MiniCPM model weights must strictly follow the General Model License (GML).
The models and weights of MiniCPM are completely free for academic research.
If you intend to utilize the model for commercial purposes, please reach out to cpm@modelbest.cn to obtain the certificate of authorization.

Statement

As a language model, MiniCPM generates content by learning from a vast amount of text. However, it does not possess the ability to comprehend or express personal opinions or value judgments. Any content generated by MiniCPM does not represent the viewpoints or positions of the model developers.
Therefore, when using content generated by MiniCPM, users should take full responsibility for evaluating and verifying it on their own.

📚 Citation

If you find MiniCPM useful for your work, please cite our technical report:

@inproceedings{minicpm2024,
 title={MiniCPM：Unveiling the Potential of End-side Large Language Models},
 booktitle={OpenBMB Blog},
 year={2024}
}

📦 Model Download

HuggingFace	ModelScope	WiseModel
sft - bf16	sft - bf16	sft - bf16
sft - fp32	sft - fp32	sft - fp32
dpo - bf16	dpo - bf16	dpo - bf16
dpo - fp16	dpo - fp16	dpo - fp16
dpo - fp32	dpo - fp32	dpo - fp32

Featured Recommended AI Models

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご