Mini Section:
Data Statements for NLP
Key Idea: What are Data Statement for NLP?
Data Statements for Natural Language Processing (NLP) are documented characterizations of a dataset that offer context to better understand generalizability of experimental NLP results, appropriate deployment of software, and what biases might be present in systems built on the software.
Read: The Concept Paper
Read the Data Statements for NLP concept paper written by Emily Bender and Batya Friedman.
Consider the similarities between data statements and other dataset disclosures you have learned about thus far (datasheets for datasets, dataset cards, dataset nutrition labels).
Cite as: Bender EM, Friedman B. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. TACL. 2018;6:587-604. doi:10.1162/tacl_a_00041
Explore: Data Statements for NLP Reading Summary
On his website, Morgan Klaus Scheuerman offers an excellent high-level summary of the Data Statements for NLP concept paper.
Review the summary and pay particular attention to the condensed data statement schema and provided definitions.
Cite as: Scheuerman MK. Summary of Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science. Morgan Klaus.