Indian Flag
Government Of India
A-
A
A+
Model Hubs and Beyond

Model Hubs and Beyond

Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation

About Dataset

A dataset designed to study sentiment analysis models hosted on Hugging Face, especially in relation to model popularity, performance, and documentation quality.

  • Size:

    • ~168,000 examples.

  • Tasks & Subsets:

    • Sentiment classification on Reddit comments.

    • Two main splits:

      • human_annotated: manually labeled subset for gold-standard evaluation.

      • raw_reddit: broader, unannotated raw content.

  • Annotations:

    • Around 80,000 human annotations collected to evaluate sentiment models.

  • Purpose:

    • To empirically investigate:

      1. Whether high popularity (e.g., downloads, likes, recency) correlates with model performance.

      2. Whether model documentation completeness predicts performance.

Citation:

If you use this dataset, please cite the following work:

@misc{kadasi2025modelhubsbeyondanalyzing,
      title={Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation}, 
      author={Pritam Kadasi and Sriman Reddy and Srivathsa Vamsi Chaturvedula and Rudranshu Sen and Agnish Saha and Soumavo Sikdar and Sayani Sarkar and Suhani Mittal and Rohit Jindal and Mayank Singh},
      year={2025},
      eprint={2503.15222},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.15222}, 
}

Activity Overview Activity Overview

  • Downloads0
  • Downloads 4
  • Views 94
  • File Size 18.34 MB

Tags Tags

  • model-evaluation

License Control License Control

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

No Record(s) Found

Select a file to preview its contents.

Data Quality Score BetaData Quality Score Beta

Version Control Version Control

FolderVersion 1(18.34 MB)
  • admin·6 month(s) ago
    • undefined
      raw_reddit.parquet
    • undefined
      reddit_human_annotated.parquet