P

Phi 4 Multimodal Instruct Onnx

Developed by microsoft
ONNX version of the Phi-4 multimodal model, quantized to int4 precision with accelerated inference via ONNX Runtime, supporting text, image, and audio inputs.
Downloads 159
Release Time : 2/25/2025

Model Overview

This is a lightweight open-source multimodal foundation model combining language, vision, and speech research from Phi-3.5 and 4.0 models, supporting a context length of 128K tokens.

Model Features

Multimodal support
Supports processing text, image, and audio inputs to generate text output.
Efficient inference
Quantized to int4 precision with accelerated inference via ONNX Runtime.
Long context support
Supports a context length of 128K tokens.
Lightweight
Lightweight open-source multimodal foundation model suitable for various application scenarios.

Model Capabilities

Text generation
Image analysis
Speech recognition
Speech summarization
Speech translation
Visual question answering

Use Cases

Speech processing
Automatic speech recognition
Convert speech to text.
Speech summarization
Generate summaries of speech content.
Speech translation
Translate speech content into other languages.
Visual processing
Visual question answering
Answer questions based on image content.
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase