P

PARD Llama 3.2 1B

Developed by amd
PARD is a high-performance speculative decoding method that can convert autoregressive draft models into parallel draft models at low cost, significantly accelerating the inference of large language models.
Downloads 2,219
Release Time : 5/17/2025

Model Overview

PARD adaptively accelerates the inference of large language models through low-cost parallel draft models, reducing training and deployment costs while maintaining high performance.

Model Features

Low-cost training
PARD can convert autoregressive draft models into parallel draft models with minimal overhead, increasing the average inference speed by 1.78 times.
Strong generalization
A single PARD draft model can accelerate an entire target model family, significantly reducing deployment complexity and adaptation costs.
High performance
When integrated into an optimized inference framework, PARD achieves a speedup ratio of up to 4.08 times, reaching a state-of-the-art speed of 311.5 tokens per second.

Model Capabilities

Text generation
Acceleration of large language model inference

Use Cases

Natural language processing
Acceleration of large language model inference
Use PARD to accelerate the inference process of large language models and improve generation efficiency.
The speedup ratio is up to 4.08 times, generating 311.5 tokens per second.
Featured Recommended AI Models
ยฉ 2025AIbase