
Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation
A dataset designed to study sentiment analysis models hosted on Hugging Face, especially in relation to model popularity, performance, and documentation quality.
Size:
~168,000 examples.
Tasks & Subsets:
Sentiment classification on Reddit comments.
Two main splits:
human_annotated: manually labeled subset for gold-standard evaluation.
raw_reddit: broader, unannotated raw content.
Annotations:
Around 80,000 human annotations collected to evaluate sentiment models.
Purpose:
To empirically investigate:
Whether high popularity (e.g., downloads, likes, recency) correlates with model performance.
Whether model documentation completeness predicts performance.
@misc{kadasi2025modelhubsbeyondanalyzing,
title={Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation},
author={Pritam Kadasi and Sriman Reddy and Srivathsa Vamsi Chaturvedula and Rudranshu Sen and Agnish Saha and Soumavo Sikdar and Sayani Sarkar and Suhani Mittal and Rohit Jindal and Mayank Singh},
year={2025},
eprint={2503.15222},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.15222},
}Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.