Matching Uses and Protections for Government Data Releases: Presentation at the Simons Institute
(The blog had been on hiatus during 2019 as CREOS launched. This post is part of a series of catch-up blog post summarizing talks presented over the last 10 months.)
In the work included below, and presented at the Simons Institute, we describe work-in progress that aims to align emerging methods of data protections with research uses. We use the American Community Survey as an exemplar case for examining the range of ways that government data is used for research. We identify the range of research uses by combining evidence of use from multiple sources including research articles; national and local media coverage; social media; and research proposals. We then employ human and computer-assisted coding methods to characterize the range of data analysis methodologies that researchers employ. Then, building on previous work cataloging that surveys and characterizes computational and technical controls for privacy, we match these methods to available and emerging privacy and data security controls.
Our preliminary analysis suggests that tiered-access to government data will be necessary to support current and new research in the social and health sciences. We argue that discovery research (currently) requires access beyond limits of formal protections — empirically guided exploratory research, theory generation, process tracing, novel syntheses (etc.) are incompletely understood and formalized. This is in part because in analysis of privacy tradeoffs ‘‘worst-case’ analysis is used for risks, but average-case analysis used for benefits.
For those interested in these questions and related areas of interest, writing on modern approaches to privacy principles and protectins are linked from my web site.