Best (Or Probably Not-Bad) Practices for Sharing Data
Best practices aren’t.
The core issue is that there are few models for the systematic valuation of data: We have no robust general proven ways of answering the question of how much data X be worth to community Y at time Z. Thus the “bestness” (optimality) of practices are generally strongly dependent on operational context.. and the context of data sharing is currently both highly complex and dynamic Until there is systematic descriptive evidence that best practices are used, predictive evidence that best practices are associated with future desired outcomes, and causal evidence that the application of best practices yields improved outcomes, we will be unsure that practices are “best”.
Nevertheless, one should use established “not-bad” practices, for a number of reasons. First, to avoid practices that are clearly bad; second, because use of such practices acts to document operational and tacit knowledge; third because selecting practices can help to elicit the underlying assumptions under which practices are applied; and finally because not-bad practcies provide a basis for auditing, evaluation, and eventual improvement.
Specific not-bad practices for data sharing fall into roughly three categories :
- Analytic practices: lifecycle analysis & requirements analysis
- Policy practices for: data dissemination, licensing, privacy, availability, citation and reproducibility
- Technical practices for sharing and reproducibility, including fixity, replication, provenance
This presentation at the Second Open Economics International Workshop (sponsored by the Sloan Foundation, MIT and OKFN) provides an overview of these and links to specific practices recommendations, standards, and tools: