Indian Flag
Government Of India
A-
A
A+

Phi-4-Multimodal-Instruct - Multimodal Foundation Model

A lightweight multimodal AI model that processes text, image, and audio inputs, optimized for multilingual reasoning, speech recognition, vision-language tasks, and generative AI applications.

About Model

Phi-4-Multimodal-Instruct is an advanced multimodal foundation model developed by Microsoft, designed to integrate language, vision, and speech for research and commercial applications. It builds upon the Phi-3.5 and Phi-4 models, supporting 128K token context length and incorporating supervised fine-tuning, direct preference optimization, and reinforcement learning from human feedback (RLHF) to enhance performance and safety. Key Features: Supports multiple modalities: Text: 24 languages, including Arabic, Chinese, English, French, Spanish, and more. Vision: Optimized for English image understanding. Audio: Supports English, Chinese, German, French, Italian, Japanese, Spanish, and Portuguese speech processing. Enhanced capabilities: Speech recognition and speech translation (outperforms WhisperV3 and SeamlessM4T). Strong reasoning in math, logic, and general knowledge. Vision-language understanding (chart/table comprehension, optical character recognition). Multi-image comparison and summarization. Speech summarization and QA. Function and tool calling for AI agents. State-of-the-art performance: Ranked #1 on the HuggingFace OpenASR leaderboard for speech recognition (March 2025). Vision processing benchmarks surpass models like Gemini-1.5-Pro and InternOmni-7B. Optimized for real-world applications: Works in memory-constrained environments and low-latency scenarios. Trained on 5 trillion text tokens, 2.3 million speech hours, and 1.1 trillion image-text tokens. Intended Uses: Phi-4-Multimodal-Instruct is designed for broad multilingual and multimodal research and commercial applications, including: 1. General AI assistants for reasoning and knowledge retrieval. 2. Speech AI models for transcription, translation, and summarization. 3. Computer vision AI for image-text comprehension and optical character recognition (OCR). 4. Medical AI research for language-vision understanding. 5. Education and coding AI for knowledge-based tasks.

Phi-4-Multimodal-Instruct - Multimodal Foundation Model

Metadata Metadata

MIT

Microsoft

Multimodal Language Model

N.A.

Open

Sector Agnostic

12/03/25 06:35:15

0

Activity Overview Activity Overview

  • Downloads0
  • Redirect 3
  • Views 186
  • File Size 0

Tags Tags

  • Multimodal
  • Text Generation
  • Speech Recognition
  • Visual Question Answering
  • Multilingual
  • Audio Processing
  • Transformers
  • NLP
  • Microsoft

License Control License Control

MIT

More Models from Microsoft Corporation (India) Pvt. Ltd. More Models from Microsoft Corporation (India) Pvt. Ltd.

TAPEX: Large SQL Execution Model (Table Pre-training via Learning a Neural SQL Executor)
A large-sized TAPEX model pre-trained to simulate neural SQL execution, enabling the execution of SQL queries on given tables.
Transformers
SQLExecution
PreTrainedModel
TAPEX
DataRetrieval
NeuralExecutor
BART
  • See Upvoters0
  • Downloads5
  • File Size0
  • Views143
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

TAPEX: Large Model (Table Pre-training via Learning a Neural SQL Executor)
A large-sized pre-trained model designed to enhance table-based question answering and fact verification tasks.
BART
TableQuestionAnswering
FactVerification
PreTrainedModel
LargeModel
  • See Upvoters0
  • Downloads5
  • File Size0
  • Views96
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

TAPEX: TabFact Data enabled Large Finetuned (Table Pre-training via Learning a Neural SQL Executor) Model
A large-sized TAPEX model fine-tuned on the TabFact dataset, designed to enhance performance in table-based fact verification tasks.
FactVerification
NaturalLanguageProcessing
Transformers
BART
DataValidation
FineTunedModel
TabFact
TAPEX
  • See Upvoters0
  • Downloads5
  • File Size0
  • Views73
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

TAPEX (Table Pre-training via Learning a Neural SQL Executor) Large Finetuned Model
A large-sized TAPEX model fine-tuned on the WikiTableQuestions dataset, designed to enhance performance in table-based question answering tasks.
TAPEX
TableQuestionAnswering
NaturalLanguageProcessing
Transformers
BART
DataExtraction
FineTunedModel
WikiTableQuestions
  • See Upvoters0
  • Downloads2
  • File Size0
  • Views96
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

TAPEX: Base Model (Table Pre-training via Learning a Neural SQL Executor)
A base-sized pre-trained model designed to enhance table-based question answering and fact verification tasks.
BART
TableQuestionAnswering
FactVerification
PreTrainedModel
TabularData
  • See Upvoters0
  • Downloads3
  • File Size0
  • Views69
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

TAPEX: WikiTable Questions Data enabled Base Finetuned (Table Pre-training via Learning a Neural SQL Executor) Model
A base-sized TAPEX model fine-tuned on the WikiTableQuestions dataset, designed to enhance performance in table-based question answering tasks.
NaturalLanguageProcessing
TableQuestionAnswering
TAPEX
WikiTableQuestions
FineTunedModel
DataExtraction
BART
Transformers
  • See Upvoters0
  • Downloads3
  • File Size0
  • Views81
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

TAPEX: WikiSQL Data enabled Base Finetuned (Table Pre-training via Learning a Neural SQL Executor) Model
A large-sized TAPEX model fine-tuned on the WikiSQL dataset, optimized for translating natural language questions into SQL queries for effective table-based question answering.
Transformers
NaturalLanguageProcessing
SQLQueryGeneration
TAPEX
WikiSQL
FineTunedModel
DataRetrieval
BART
  • See Upvoters0
  • Downloads5
  • File Size0
  • Views82
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

TAPEX: TabFact Data enabled Base Finetuned (Table Pre-training via Learning a Neural SQL Executor) Model
A base-sized TAPEX model fine-tuned on the TabFact dataset, tailored for verifying the factual accuracy of textual statements against tabular data.
FactVerification
TAPEX
TabFact
FineTunedModel
DataValidation
BART
Transformers
NaturalLanguageProcessing
  • See Upvoters0
  • Downloads4
  • File Size0
  • Views74
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

TAPEX: Base Finetuned (Table Pre-training via Learning a Neural SQL Executor) Model
A base-sized TAPEX model fine-tuned on the WikiSQL dataset, designed to enhance performance in table-based question answering tasks.
DataExtraction
NaturalLanguageProcessing
Transformers
BART
TableQuestionAnswering
FineTunedModel
WikiSQL
TAPEX
  • See Upvoters0
  • Downloads5
  • File Size0
  • Views104
Updated 7 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.

BiomedBERT - Domain-Specific Biomedical Language Model
A biomedical NLP model pre-trained from scratch on abstracts and full-text articles from PubMed and PubMed Central, achieving state-of-the-art performance on biomedical language understanding tasks.
Transformers
inference endpoints
exbert
Bert
English
JAX
PyTorch
Fill-Mask
  • See Upvoters0
  • Downloads72
  • File Size0
  • Views967
Updated 10 month(s) ago

MICROSOFT CORPORATION (INDIA) PVT. LTD.