M

Meta Llama 3 8B Instruct FP8 KV

Developed by RedHatAI
The Meta-Llama-3-8B-Instruct model has undergone per-tensor quantization of FP8 weights and activations, suitable for inference with vLLM >= 0.5.0. This model checkpoint also includes per-tensor scaling parameters for FP8 quantized KV cache.
Downloads 3,153
Release Time : 5/20/2024

Model Overview

This is an FP8 quantized Meta-Llama-3-8B-Instruct model that supports FP8 KV cache, designed for efficient inference.

Model Features

FP8 Quantization
Model weights and activations are quantized to FP8 per-tensor, reducing memory usage while maintaining accuracy
FP8 KV Cache Support
Includes per-tensor scaling parameters for FP8 quantized KV cache, callable via vLLM
Efficient Inference
Optimized for vLLM >= 0.5.0, delivering high-efficiency inference performance

Model Capabilities

Text generation
Dialogue systems
Instruction following

Use Cases

Dialogue systems
Chatbot
Build efficient chatbot applications
Content generation
Text creation
Assist in various text creation tasks
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase