Characterizing Data and Software for Social Science Research
This presentation, invited for a workshop on data preservation for open science, held at JCDL 2013, gives a brief tour of a large topic — how do we understand the types of data and software used in social science research. In this presentation I characterize the intellectual landscape across 9 dimensions of data structure, content, measure, and use. I then use this framework to characterize three interesting use cases.
This illustrates some particular challenges for long-term access to and replication of social science research, including the use of “messy” human sensors; the wide mix of data types, structures, sparsity; complex legal constraints; pervasive use of manual and computer-aided coding; use of niche commercial software and bespoke software; and very long-term access requirements.