L

Llama 3.1 8B Instruct FP8

Developed by nvidia
FP8 quantized version of Meta Llama 3.1 8B Instruct model, featuring an optimized transformer architecture autoregressive language model with 128K context length support.
Downloads 3,700
Release Time : 8/29/2024

Model Overview

This model is the FP8 quantized version of Meta Llama 3.1 8B Instruct, optimized for TensorRT-LLM and vLLM inference, suitable for text generation tasks.

Model Features

FP8 Quantization
Reduces model disk size and GPU memory requirements by approximately 50% with FP8 quantization technology, achieving 1.3x speedup on H100.
Long Context Support
Supports 128K context length, ideal for long-text processing tasks.
High-Performance Inference
Optimized for TensorRT-LLM and vLLM, delivering efficient inference performance.

Model Capabilities

Text Generation
Long Text Processing
Instruction Following

Use Cases

Content Generation
Article Continuation
Generates coherent article content based on given prompts
Dialogue Systems
Builds intelligent conversational assistants
Education
Problem-Solving Assistance
Helps solve problems in subjects like math and science
Achieves 83.1% accuracy on GSM8K dataset
Featured Recommended AI Models
ยฉ 2025AIbase