ERNIE Speed 8K

An efficient inference optimization model developed by Baidu. It is lightweight improved based on the ERNIE 4.0 architecture, supports an 8K context window, has an inference speed 5 times faster than the base model, and reduces input costs by 80%.

Intelligence(Weak)

Speed(Relatively Fast)

Input Supported Modalities

Is Reasoning Model

8,192

Context Window

8,192

Maximum Output Tokens

2024-10-31

Knowledge Cutoff

Go Compare

Pricing

￥0.8 /M tokens

Input

￥3.2 /M tokens

Output

￥1.6 /M tokens

Blended Price

Quick Simple Comparison

Input

Output

ERNIE-4.5-Turbo-128K

￥0.56

ERNIE-4.5-Turbo

￥0.56

ERNIE-X1-Turbo-32K

￥0.28

Basic Parameters

ERNIE-Speed-8KTechnical Parameters

Parameter Count

Not Announced

Context Length

8,192 tokens

Training Data Cutoff

2024-10-31

Open Source Category

Proprietary

Multimodal Support

Text Only

Throughput

Release Date

2025-05-01

Response Speed

180 tokens/s

Benchmark Scores

Below is the performance of ERNIE-Speed-8K in various standard benchmark tests. These tests evaluate the model's capabilities in different tasks and domains.

Intelligence Index

Large Language Model Intelligence Level

Coding Index

Indicator of AI model performance on coding tasks

Math Index

Capability indicator in solving mathematical problems, mathematical reasoning, or performing math-related tasks

MMLU Pro

Massive Multitask Multimodal Understanding - Testing understanding of text, images, audio, and video

GPQA

Graduate Physics Questions Assessment - Testing advanced physics knowledge with diamond science-level questions

HLE

The model's comprehensive average score on the Hugging Face Open LLM Leaderboard