ORGANISATION

FERMAT

FERMAT is a benchmark designed to test the multimodal reasoning and auto-evaluation capabilities of VLMs using real-world handwritten math problems.

About Dataset

Each solution features realistic student mistakes categorized along four key axes:

🧑‍💻 Computational Errors
🤔 Conceptual Misunderstandings
✍️ Notation Errors
📑 Presentation Issues

Additionally, some solutions contain superficial variations that don't actually affect correctness (e.g., "16 cm" vs. "16.0 cm")—perfect for testing the subtlety of your models!

Dataset Metadata

License

Attribution 4.0 International (CC BY- 4.0)

Geographical coverage

Sector

Sector Agnostic

Author

Oikantik Nath

Source Organisation

AI4Bharat

Uploaded by

Nikhil Narasimhan

Data Quality Score (Beta)

Dataset type

Structured

Frequency

Static

Time Granularity

Year range

N.A.

Date & Time

10/07/25 04:55:21

Visibility

Open

Hosted / Redirected

Hosted

Activity Overview

License Control

Attribution 4.0 International (CC BY- 4.0)

Accessibility options by UX4G

FERMAT

About Dataset

Dataset Metadata

Activity Overview

Tags

License Control

AIKosh

Resources

Support