A state-of-the-art multimodal AI model optimized for vision and text-based reasoning, supporting tasks like image understanding, OCR, and chart analysis, designed for both research and commercial applications.
Phi-3.5-Vision-Instruct is a lightweight, high-performance multimodal AI model from Microsoft, trained on synthetic and high-quality filtered datasets to enhance visual and text-based AI capabilities. It is part of the Phi-3 model family, supporting a 128K token context length for long-context comprehension. The model underwent fine-tuning, direct preference optimization (DPO), and supervised learning to ensure robust instruction adherence and precise responses.
MIT
Microsoft
Multimodal Language Model
N.A.
Open
Sector Agnostic
12/03/25 06:35:23
0
MIT
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.