M

Minicpm4 8B

Developed by openbmb
MiniCPM4 is an efficient large language model designed specifically for edge devices. Through systematic innovation, it achieves extreme efficiency improvements in four dimensions: model architecture, training data, training algorithm, and inference system. It can achieve over 5 times faster generation speed on edge chips.
Downloads 643
Release Time : 6/5/2025

Model Overview

The MiniCPM4 series includes multiple models of different scales, focusing on achieving efficient large language model inference on edge devices, supporting long text processing and multi - task processing.

Model Features

Efficient model architecture
Using the trainable sparse attention mechanism of InfLLM v2, when processing 128K long text, each token only needs to calculate the correlation with less than 5% of the tokens, significantly reducing the computational overhead of long text.
Efficient learning algorithm
Introducing the Model Wind Tunnel 2.0 efficient and predictable scaling method, using BitCPM extreme ternary quantization to compress the model parameter bit - width to 3 values, achieving a 90% reduction in extreme model bit - width.
High - quality training data
Constructing the UltraClean high - quality pre - training data filtering and generation strategy, and open - sourcing the high - quality Chinese and English pre - training dataset UltraFinweb.
Efficient inference system
Integrating the CPM.cu lightweight and efficient CUDA inference framework, integrating sparse attention, model quantization, and speculative sampling to achieve efficient pre - filling and decoding.
Long text processing ability
Natively supporting a context length of up to 32,768 tokens, and can be extended to 131,072 tokens through the RoPE scaling technique.

Model Capabilities

Text generation
Dialogue system
Long text understanding
Tool invocation
Survey paper generation
Multi - round dialogue
Knowledge - intensive task processing
Inference - intensive task processing

Use Cases

Content generation
Article writing
Generate articles on specific topics according to user prompts
Can generate long articles with complete structure and relevant content
Survey paper generation
Autonomously generate credible long - form survey papers based on user queries
The MiniCPM4 - Survey variant is specifically optimized for this function
Intelligent assistant
Multi - round dialogue
Conduct natural and smooth multi - round dialogues with users
Supports context understanding and coherent dialogue flow
Tool invocation
Autonomously invoke relevant tools according to user needs
The MiniCPM4 - MCP variant is specifically optimized for this function
Information retrieval and processing
Long document analysis
Process and analyze long documents up to 128K tokens
Performs well in the needle - in - a - haystack test
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase