Mini Section:
Dataset Nutrition Label
Key Idea: What is a Dataset Nutrition Label?
The Dataset Nutrition Label (DNL), developed by the Data Nutrition Project, is a dataset documentation summary that draws from the analogy of Nutrition Facts Label on food products. Like nutrition labels, DNLs highlight the ‘ingredients’ of a dataset to inform whether a given dataset is appropriate for a particular statistical use case to mitigate potential harms of inappropriate use of these data to train or inform automated systems. DNLs provide at-a-glance information about a given dataset that is mapped to a set of common use cases.
Optional Reading: The Concept Paper
The first and second iterations of the Dataset Nutrition Label papers provide insight to the necessity of standardized dataset disclosure methods.
Read the first concept paper by Sarah Holland et al. and the second concept paper by Kasia Chmielinski et al.
Cite as:
Holland S, Hosny A, Newman S, Joseph J, Chmielinski K. The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards. Published online 2018. doi:10.48550/ARXIV.1805.03677
Chmielinski KS, Newman S, Taylor M, et al. The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence. Published online 2022. doi:10.48550/ARXIV.2201.03954
Explore: Dataset Nutrition Label Overview
Explore the Dataset Nutrition Label overview page to learn about the anatomy of a DNL, including sections outlining common use cases, "badges," and modeling alerts.
Explore: SIIM-ISIC Melanoma Classification Challenge Dataset Nutrition Label
The 2020 Society for Imaging Informatics in Medicine (SIIM)-International Skin Imaging Collaboration (ISIC) Melanoma Classification dataset was created to train models to identify melanoma in lesion images.
Explore its Dataset Nutrition Label here!