Visual Genome

A dataset that connects structured image concepts to language, offering detailed annotations of objects, attributes, and relationships within images, facilitating scene understanding.

About Dataset

Visual Genome is a densely annotated vision–language dataset that connects images with objects, attributes, relationships, region descriptions, and question–answer pairs. Developed by researchers at Stanford University, the dataset provides structured representations of visual scenes beyond simple object labels. It aims to capture detailed scene understanding by modeling how objects relate to one another within images.

Purpose of Dataset

Visual Genome Is Used To Train And Evaluate Models On Complex Visual Reasoning, Scene Graph Generation, And Multimodal Understanding. It Supports Tasks Such As Visual Question Answering, Relationship Detection, And Grounded Language Learning. The Dataset Helps Models Move Beyond Object Recognition Toward Deeper Semantic And Relational Understanding Of Visual Scenes.