Indian Flag
Government Of India
A-
A
A+
FERMAT

FERMAT

FERMAT is a benchmark designed to test the multimodal reasoning and auto-evaluation capabilities of VLMs using real-world handwritten math problems.

About Dataset

FERMAT has 244 handwritten math solutions, carefully annotated across core mathematical domains -  πŸ”’ Arithmetic | πŸ“ Algebra | πŸ“ Geometry | πŸ“ Mensuration | 🎲 Probability | πŸ“Š Statistics | πŸ“ Trigonometry | πŸ“ˆ Calculus.

Each solution features realistic student mistakes categorized along four key axes:

  1. πŸ§‘β€πŸ’» Computational Errors
  2. πŸ€” Conceptual Misunderstandings
  3. ✍️ Notation Errors
  4. πŸ“‘ Presentation Issues

Additionally, some solutions contain superficial variations that don't actually affect correctness (e.g., "16 cm" vs. "16.0 cm")β€”perfect for testing the subtlety of your models!

Activity Overview Activity Overview

  • Downloads0
  • Redirect 7
  • Views 49
  • File Size 0

Tags Tags

  • Benchmark
  • vision-language model
  • natural language processing (NLP)
  • Document Processing
  • visual-document-understanding

License Control License Control

Attribution 4.0 International (CC BY- 4.0)