Q

Qwen3 235B A22B FP8 Dynamic

Developed by RedHatAI
The FP8 quantized version of the Qwen3-235B-A22B model, which effectively reduces GPU memory requirements and improves computational throughput, suitable for various natural language processing scenarios.
Downloads 2,198
Release Time : 5/4/2025

Model Overview

This model is the FP8 quantized version of the Qwen3-235B-A22B model, which can effectively reduce GPU memory requirements and improve computational throughput. It can be used in various natural language processing scenarios such as inference and function call.

Model Features

FP8 Quantization
Perform FP8 quantization on activations and weights, reducing GPU memory requirements by approximately 50%, increasing the computational throughput of matrix multiplication by about 2 times, and reducing disk size requirements by approximately 50%.
Efficient Deployment
Supports efficient deployment using the vLLM backend and is compatible with OpenAI services.
High Performance
Performs excellently in multiple benchmark tests, with an accuracy recovery rate close to 100%.

Model Capabilities

Text Generation
Function Call
Multilingual Instruction Following
Translation

Use Cases

Natural Language Processing
Inference
Used for inference tasks such as text generation and question answering.
Function Call
Supports the function call feature and can be used to build complex applications.
Translation
Supports multilingual translation tasks.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase