Enables video-to-text models, summarization, and question answering from clips.
The PE Video Dataset is a large-scale video dataset released by Meta AI, designed to support multimodal video understanding. It includes video clips paired with rich textual annotations, enabling learning across visual and language modalities. The dataset captures diverse activities, scenes, and interactions, focusing on temporal reasoning and semantic understanding of video content.
Pvd Is Used For Training And Evaluating Video–language Models On Tasks Such As Video Captioning, Summarization, And Video Question Answering. It Supports Research In Multimodal Reasoning, Temporal Alignment Between Video And Text, And Instruction-following From Visual Input. The Dataset Is Valuable For Advancing Video-based Foundation Models And Multimodal Ai Systems.
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.