P

Phi 3 Small 8k Instruct Onnx Cuda

Developed by microsoft
Phi-3 Small is a 7B-parameter lightweight cutting-edge open-source model, optimized for NVIDIA GPUs in ONNX format, supporting 8K context length with strong inference capabilities.
Downloads 115
Release Time : 5/19/2024

Model Overview

This model is the ONNX Runtime inference conversion version of Phi-3 Small-8K-Instruct, running on GPU devices such as server platforms, Windows, and Linux via ONNX Runtime.

Model Features

High-performance inference
FP16 CUDA version is up to 4x faster than PyTorch, INT4 CUDA version up to 10.9x faster
Lightweight design
7B parameter scale, maintaining high performance while reducing resource consumption
Long context support
Supports 8K token context length, suitable for long-text tasks
Multi-platform compatibility
Supports various devices and operating systems through ONNX Runtime

Model Capabilities

Text generation
Instruction following
Common-sense reasoning
Language understanding
Mathematical computation
Code generation
Logical reasoning

Use Cases

Dialogue systems
Intelligent assistant
Build high-performance, low-latency conversational assistants
Achieves 74.62 tokens per second generation speed on A100 GPU
Content generation
Long-text generation
Generate coherent long-form content using 8K context length
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
Š 2025AIbase