D

Dynamic Minilmv2 L6 H384 Squad1.1 Int8 Static

Developed by Intel
QuaLA-MiniLM is a compact language model developed by Intel, integrating knowledge distillation, length-adaptive transformers, and 8-bit quantization technology. It achieves up to 8.8x acceleration on the SQuAD1.1 dataset with less than 1% accuracy loss.
Downloads 172
Release Time : 11/21/2022

Model Overview

This model enables efficient inference by dynamically adjusting computational resource allocation, suitable for natural language processing tasks that require balancing accuracy and efficiency.

Model Features

Dynamic Computation Allocation
Implements dynamic adjustment of token numbers per layer through LAT technology to adapt to varying computation budgets.
Efficient Quantization
Employs 8-bit quantization to reduce model size, with the quantized model being only 30% of the original size.
Knowledge Distillation
Distills knowledge from the RoBERTa-Large teacher model to maintain high accuracy in the compact model.

Model Capabilities

Text understanding
Question answering
Efficient inference

Use Cases

Intelligent Q&A
Wikipedia Content Q&A
Question answering applications based on the SQuAD1.1 dataset
Achieves 8.8x speedup while maintaining 87.68% F1 accuracy
Edge Computing
Mobile Q&A System
Deployment of efficient language models on resource-constrained devices
Quantized model size is only 84.86MB
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase