L

Llama 3.1 405B Instruct FP8

Developed by nvidia
The NVIDIA Llama 3.1 405B Instruct FP8 model is a quantized version of Meta's Llama 3.1 405B Instruct model. It uses an optimized Transformer architecture and is an autoregressive language model. This model can be used for commercial or non-commercial purposes.
Downloads 10.91k
Release Time : 8/29/2024

Model Overview

This model is the FP8 quantized version of Meta-Llama-3.1-405B-Instruct. By reducing the disk size and GPU memory requirements, it achieves a 1.7x speedup on the H200. It supports two inference engines, TensorRT-LLM and vLLM.

Model Features

FP8 Quantization Optimization
By quantizing weights and activations to the FP8 data type, it reduces the disk size and GPU memory requirements, achieving a 1.7x speedup on the H200.
Multi-Platform Support
It supports two inference engines, Tensor(RT)-LLM and vLLM, and hardware microarchitectures such as NVIDIA Blackwell, NVIDIA Hopper, and NVIDIA Lovelace.
Commercially Available
This model can be used for commercial or non-commercial purposes.
High Performance
It performs excellently in benchmark tests such as MMLU, GSM8K (CoT), and ARC Challenge.

Model Capabilities

Text Generation
Language Understanding
Question-Answering System
Content Creation

Use Cases

General Text Generation
Content Continuation
Generate coherent subsequent content based on a given text fragment.
Generate smooth and coherent text
Question-Answering System
Answer various questions raised by users.
Accurately answer various types of questions
Education
Mathematical Problem Solving
Solve complex mathematical problems.
Achieve an accuracy of 96.2% in the GSM8K (CoT) test
Featured Recommended AI Models
ยฉ 2025AIbase