P

PARD Llama 3.2 1B

amdによって開発
PARD is a high-performance speculative decoding method that can convert autoregressive draft models into parallel draft models at low cost, significantly accelerating the inference of large language models.
ダウンロード数 2,219
リリース時間 : 5/17/2025

モデル概要

PARD adaptively accelerates the inference of large language models through low-cost parallel draft models, reducing training and deployment costs while maintaining high performance.

モデル特徴

Low-cost training
PARD can convert autoregressive draft models into parallel draft models with minimal overhead, increasing the average inference speed by 1.78 times.
Strong generalization
A single PARD draft model can accelerate an entire target model family, significantly reducing deployment complexity and adaptation costs.
High performance
When integrated into an optimized inference framework, PARD's acceleration ratio is up to 4.08 times, reaching a state-of-the-art speed of 311.5 tokens per second.

モデル能力

Text generation
Acceleration of large language model inference

使用事例

Natural language processing
Acceleration of large language model inference
Use PARD to accelerate the inference process of large language models and improve generation efficiency.
The acceleration ratio is up to 4.08 times, generating 311.5 tokens per second.
AIbase
未来を切り開く、あなたのAIソリューション知識ベース
© 2025AIbase