Phi 4 Multimodal Instruct Onnx
ONNX version of the Phi-4 multimodal model, quantized to int4 precision with accelerated inference via ONNX Runtime, supporting text, image, and audio inputs.
Downloads 159
Release Time : 2/25/2025
Model Overview
This is a lightweight open-source multimodal foundation model combining language, vision, and speech research from Phi-3.5 and 4.0 models, supporting a context length of 128K tokens.
Model Features
Multimodal support
Supports processing text, image, and audio inputs to generate text output.
Efficient inference
Quantized to int4 precision with accelerated inference via ONNX Runtime.
Long context support
Supports a context length of 128K tokens.
Lightweight
Lightweight open-source multimodal foundation model suitable for various application scenarios.
Model Capabilities
Text generation
Image analysis
Speech recognition
Speech summarization
Speech translation
Visual question answering
Use Cases
Speech processing
Automatic speech recognition
Convert speech to text.
Speech summarization
Generate summaries of speech content.
Speech translation
Translate speech content into other languages.
Visual processing
Visual question answering
Answer questions based on image content.
Featured Recommended AI Models