L

Llama 3.2 3B Instruct FP8 Dynamic

Developed by RedHatAI
FP8 quantized version of Llama-3.2-3B-Instruct, suitable for multilingual commercial and research purposes, particularly ideal for assistant-like chat scenarios.
Downloads 986
Release Time : 9/25/2024

Model Overview

This model is a quantized version of Meta-Llama-3.2-3B-Instruct, reducing disk size and GPU memory requirements by approximately 50% by quantizing weights and activations to FP8 data type.

Model Features

FP8 quantization
Quantization of weights and activations to FP8 data type, reducing disk size and GPU memory requirements by approximately 50%.
Multilingual support
Supports multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Efficient inference
Optimized model suitable for efficient inference with the vLLM backend.

Model Capabilities

Text generation
Multilingual chat
Commercial and research purposes

Use Cases

Chatbot
Multilingual chat assistant
Ideal for assistant-like chat scenarios with multilingual support.
Achieved an average score of 50.88 in the OpenLLM benchmark.
Business applications
Business consultation
Provides business consultation and Q&A services.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase