Hugging Face Dataset Cards are documentation cards that accompany Hugging Face NLP datasets and are used to alert users to potential biases within a given dataset to promote responsible dataset usage for ML purposes. Similar to Datasheets for Datasets, Dataset Cards also document the provenance, creation, and use of ML datasets; however, Dataset Cards are displayed through the Hugging Face interface and are embedded into the process of uploading a dataset to the Hub.
Fun Fact! The conceptualization of Dataset Cards was inspired by Model Cards proposed by Mitchell and colleagues (which we will cover in the next module!)
ML practitioners and dataset creators/curators can create their own dataset card through React, a JavaScript library for building user interfaces.
Explore the application and read more about dataset cards here.
The Stanford Natural Language Inference (SNLI) corpus (version 1.0) is a collection of 570,000 manually labeled, person-written English sentence pairs.
Explore its dataset card on Hugging Face!