MiniCPM-2B-128k: An Open-Source Edge-side Large Language Model Supporting 128k Context Window with Free Deployment and Use

Minicpm 2B 128k

Developed by openbmb

MiniCPM is an edge-side large language model jointly developed by FaceWall Intelligence and Tsinghua University's Natural Language Processing Lab, with only 2.4 billion non-word embedding parameters (2.4B) and supports a 128k context window.

Large Language Model

Transformers

Supports Multiple Languages#128k long-context processing #Lightweight large model #Bilingual Chinese-English dialogue

Downloads 145

Release Time : 4/9/2024

Model Overview

MiniCPM-2B-128k is an experimental long-context extension version based on MiniCPM-2B, being the first lightweight language model with less than 3B parameters that supports 128k context.

Model Features

128k long-context support

The first lightweight language model with less than 3B parameters that supports 128k context, achieving the best performance among sub-7B models in the comprehensive long-text benchmark InfiniteBench.

Lightweight model

With only 2.4 billion non-word embedding parameters (2.4B), it is suitable for edge-side deployment.

ChatML format support

Updated to ChatML format (user\n{}\nassistant\n) during model alignment, facilitating deployment via vLLM framework's OpenAI-compatible server mode.

Model Capabilities

Long-text processing

Dialogue generation

Text understanding

Knowledge Q&A

Use Cases

Long-text processing

Long-document Q&A

Handles document Q&A tasks with up to 128k context

Scored 23.06 in InfiniteBench's Chinese long-book Q&A evaluation

Dialogue systems

Intelligent assistant

Builds dialogue assistants with long-context memory support

Supports complex multi-turn dialogue interactions

🚀 MiniCPM-2B-128k

MiniCPM-2B-128k is a long context extension trial of MiniCPM-2B, developed by ModelBest Inc. and TsinghuaNLP. It supports 128k context and has made improvements in format and vocabulary.

🚀 Quick Start

MiniCPM is an End-Size LLM jointly developed by ModelBest Inc. and TsinghuaNLP. The main language model MiniCPM-2B has only 2.4 billion non-embedding parameters. MiniCPM-2B-128k is an attempt to extend the length based on MiniCPM-2B and is the first long-text model below 3B. Compared with the previously released version, the improvements are as follows:

It supports a 128k context and achieves the best score below 7B on the comprehensive long-text evaluation InfiniteBench. However, its performance declines within a 4k context.
To facilitate community developers, the model has updated the {} directive template to the chatml format (user\n{}\nassistant\n) during alignment. This also helps users deploy and use the vllm openai compatible server mode.
Due to the requirement of the parallel mechanism, tie_embedding has been removed, and the vocabulary has been expanded to 127,660.

For more details, please refer to the GitHub repo and Blog.

✨ Features

Evaluation Results

Model	avg	avg w/o code&math	passkey	number_string	kv_retrieval	longbook_choice_eng	longbook_qa_chn	longbook_qa_eng	longbook_sum_eng	longdialogue_qa_eng	math_find	code_debug	code_run
LWM-Text-128k	24.45	33.62	100	97.8	0.6	28.82	15.93	14.31	9.99	1.5	3.43	20.05	1
Yarn-Mistral-7b-128k	19.84	27.36	92.71		0	27.95	15.49	9.55	9.06	7.5	17.14	0.76	1.25
Mistral-7B-Instruct-v0.2(ABF 1000w)	27.75	36.9	100	78.98	3.6	37.12	11.74	17.37	21.12	9.5	29.43	17.51	0
Yi-6B-200k	22.15	32.54	100	94.92	0	36.68	15.07	9.2	0.92	3.5	4.29	0.51	0.75
chatglm3-6b-128k	25.58	36.57	89.93	99.66	5.2	46.29	10.7	8.38	25.91	6.5	8	5.33	1
MiniCPM-2.4B-128k	27.32	37.68	98.31	99.83	9	29.69	23.06	16.33	15.73	9.5	4.29	22.08	0

⚠️ Important Note

We discovered that the quality of Huggingface generation is slightly lower and significantly slower than vLLM. Thus, benchmarking using vLLM is recommended.

Limitations

Due to the limitations of the model size, the model may experience hallucination issues. Since DPO models tend to generate longer responses, hallucinations are more likely to occur. We will also continue to iterate and improve the MiniCPM model.
To ensure the universality of the model for academic research purposes, we did not conduct any identity training on the model. Meanwhile, as we use the ShareGPT open-source corpus as part of the training data, the model may output identity information similar to that of GPT series models.
Due to the limitation of the model size, the output of the model is greatly influenced by prompt words, which may result in inconsistent results from multiple attempts.
Due to the limited model capacity, the model's knowledge memory is not accurate. In the future, we will combine the RAG method to enhance the model's knowledge memory ability.

💻 Usage Examples

Basic Usage

# Install transformers>=4.36.0 and accelerate, then run the following code
# Note: It is necessary to specify the data type of the model clearly in 'from_pretrained', otherwise large calculation errors will be caused
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM-2B-128k'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "Which is the highest mountain in Shandong Province? Is it taller or shorter than Huangshan? What's the height difference?", temperature=0.8, top_p=0.8)
print(responds)

Minicpm 2B 128k

Model Overview

Model Features

Model Capabilities

Use Cases

🚀 MiniCPM-2B-128k

🚀 Quick Start

✨ Features

Evaluation Results

Limitations

💻 Usage Examples

Basic Usage

📚 Documentation

Datasets

Language

Library Name

Pipeline Tag

Tags