🚀 MiniCPM-2B-128k
MiniCPM-2B-128k is a long context extension trial of MiniCPM-2B, developed by ModelBest Inc. and TsinghuaNLP. It supports 128k context and has made improvements in format and vocabulary.
🚀 Quick Start
MiniCPM is an End-Size LLM jointly developed by ModelBest Inc. and TsinghuaNLP. The main language model MiniCPM-2B has only 2.4 billion non-embedding parameters. MiniCPM-2B-128k is an attempt to extend the length based on MiniCPM-2B and is the first long-text model below 3B. Compared with the previously released version, the improvements are as follows:
- It supports a 128k context and achieves the best score below 7B on the comprehensive long-text evaluation InfiniteBench. However, its performance declines within a 4k context.
- To facilitate community developers, the model has updated the {} directive template to the chatml format (user\n{}\nassistant\n) during alignment. This also helps users deploy and use the vllm openai compatible server mode.
- Due to the requirement of the parallel mechanism, tie_embedding has been removed, and the vocabulary has been expanded to 127,660.
For more details, please refer to the GitHub repo and Blog.
✨ Features
Evaluation Results
Model |
avg |
avg w/o code&math |
passkey |
number_string |
kv_retrieval |
longbook_choice_eng |
longbook_qa_chn |
longbook_qa_eng |
longbook_sum_eng |
longdialogue_qa_eng |
math_calc |
math_find |
code_debug |
code_run |
LWM-Text-128k |
24.45 |
33.62 |
100 |
97.8 |
0.6 |
28.82 |
15.93 |
14.31 |
9.99 |
1.5 |
0 |
3.43 |
20.05 |
1 |
Yarn-Mistral-7b-128k |
19.84 |
27.36 |
92.71 |
|
0 |
27.95 |
15.49 |
9.55 |
9.06 |
7.5 |
0 |
17.14 |
0.76 |
1.25 |
Mistral-7B-Instruct-v0.2(ABF 1000w) |
27.75 |
36.9 |
100 |
78.98 |
3.6 |
37.12 |
11.74 |
17.37 |
21.12 |
9.5 |
0 |
29.43 |
17.51 |
0 |
Yi-6B-200k |
22.15 |
32.54 |
100 |
94.92 |
0 |
36.68 |
15.07 |
9.2 |
0.92 |
3.5 |
0 |
4.29 |
0.51 |
0.75 |
chatglm3-6b-128k |
25.58 |
36.57 |
89.93 |
99.66 |
5.2 |
46.29 |
10.7 |
8.38 |
25.91 |
6.5 |
0 |
8 |
5.33 |
1 |
MiniCPM-2.4B-128k |
27.32 |
37.68 |
98.31 |
99.83 |
9 |
29.69 |
23.06 |
16.33 |
15.73 |
9.5 |
0 |
4.29 |
22.08 |
0 |
⚠️ Important Note
We discovered that the quality of Huggingface generation is slightly lower and significantly slower than vLLM. Thus, benchmarking using vLLM is recommended.
Limitations
- Due to the limitations of the model size, the model may experience hallucination issues. Since DPO models tend to generate longer responses, hallucinations are more likely to occur. We will also continue to iterate and improve the MiniCPM model.
- To ensure the universality of the model for academic research purposes, we did not conduct any identity training on the model. Meanwhile, as we use the ShareGPT open-source corpus as part of the training data, the model may output identity information similar to that of GPT series models.
- Due to the limitation of the model size, the output of the model is greatly influenced by prompt words, which may result in inconsistent results from multiple attempts.
- Due to the limited model capacity, the model's knowledge memory is not accurate. In the future, we will combine the RAG method to enhance the model's knowledge memory ability.
💻 Usage Examples
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)
path = 'openbmb/MiniCPM-2B-128k'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
responds, history = model.chat(tokenizer, "Which is the highest mountain in Shandong Province? Is it taller or shorter than Huangshan? What's the height difference?", temperature=0.8, top_p=0.8)
print(responds)
📚 Documentation
Datasets
Language
Library Name
transformers
Pipeline Tag
text-generation
Tags
- MiniCPM
- ModelBest
- THUNLP
- conversational
- custom_code