M

Meta Llama 3 70B Instruct FP8

Developed by RedHatAI
Meta-Llama-3-70B-Instruct-FP8 is a quantized version of Meta-Llama-3-70B-Instruct. It reduces disk size and GPU memory requirements through FP8 quantization while maintaining high performance. It is suitable for English business and research purposes.
Downloads 22.10k
Release Time : 5/24/2024

Model Overview

This model is the FP8 quantized version of Meta-Llama-3-70B-Instruct, suitable for assistant-like chat scenarios, mainly used for English business and research purposes.

Model Features

FP8 Quantization
By quantizing weights and activations to the FP8 data type, it significantly reduces disk size and GPU memory requirements while maintaining high performance.
High Performance
It has an average score of 79.16 in the OpenLLM benchmark test, close to the 79.51 of the unquantized model, with a recovery rate of up to 99.55%.
Efficient Deployment
It supports efficient deployment using the vLLM backend and provides services compatible with OpenAI.

Model Capabilities

English Text Generation
Chat Assistant
Business and Research Purposes

Use Cases

Business and Research
Chat Assistant
Used to build assistant-like chatbots, supporting English conversations.
It performs excellently in the OpenLLM benchmark test, close to the unquantized model.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase