L

Llama 3.3 70B Instruct Quantized.w8a8

Developed by RedHatAI
This is a quantized version of Llama-3.3-70B-Instruct. It supports multilingual text generation and can be used in commercial and research scenarios, performing excellently in multiple benchmark tests.
Downloads 19.02k
Release Time : 1/20/2025

Model Overview

A quantized version of Llama-3.3-70B-Instruct. It optimizes weights and activations through INT8 quantization, reducing GPU memory requirements and improving computational throughput while maintaining the performance of the original model.

Model Features

Multilingual Support
Supports text generation in multiple languages such as English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Quantization Optimization
Performs INT8 quantization on weights and activations, reducing GPU memory requirements by approximately 50%, increasing matrix multiplication computational throughput by about 2 times, and reducing disk size requirements by approximately 50%.
Extensive Evaluation
Evaluated in multiple benchmark tests such as OpenLLM v1, OpenLLM v2, HumanEval, and HumanEval+. It performs excellently compared to the non - quantized model.

Model Capabilities

Multilingual Text Generation
Commercial and Research Use
Chat Assistant Scenarios

Use Cases

Business and Research
Multilingual Chat Assistant
Used to build a chat assistant supporting multiple languages, suitable for global commercial and research scenarios.
Performs excellently in the multilingual MMLU test, with a recovery rate close to 100%.
Code Generation
Used to generate and complete code, supporting multiple programming languages.
The pass@1 score exceeds 80% in the HumanEval and HumanEval+ tests.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase