An ONNX-optimized version of Phi-3-Medium-4K-Instruct, designed for high-speed, efficient inference on NVIDIA GPUs, supporting FP16 and INT4 quantization for enhanced performance.
Phi-3-Medium-4K-Instruct ONNX-CUDA is a high-performance AI model from Microsoft, optimized for fast execution on NVIDIA GPUs using ONNX Runtime. This version is quantized for FP16 and INT4 precision, enabling low-latency, high-speed processing while maintaining 4K and 128K token context lengths for structured reasoning and AI-powered instruction-following tasks.
MIT
Microsoft
Text Generation
N.A.
Open
Sector Agnostic
12/03/25 06:35:39
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.