Comments on National Science Board Data Policies Report
The National Science Board offered a recent opportunity to comment on the draft report on ‘Digital Research Data Sharing and Management’ by the task force on data policies. The following is my individual perspective on the report.
Response to request for public comments
Dr. Micah Altman
Senior Research Scientist, IQSS, Harvard U. (until 2/29)
Director of Research; Head/Scientist, Program on Information Research — MIT Library, Massachusetts Institute of Technology (as of 3/1/2012)
Non-Resident Senior Fellow, The Brookings Institution
(Writing in a personal capacity)
Thank you for the opportunity to respond to this report. I believe this report will advance the discussion of research data sharing and management, raises many thoughtful questions, and makes recommendation that will have a positive impact on the conduct of scientific research.
As a practicing social scientist, my collaborators and I have attempted to replicate and extend research in my field, and published on the challenges of reproducibility. [Altman, et. al 2003] And in my role as an administrator, I have lead projects and contributed to community-wide efforts to build and maintain open infrastructure and standards for the documentation, dissemination and preservation of research data. [Altman et. al 2001; Altman & King 2007] My contribution is made with this perspective.
The task force may wish to take note of broad-based and thoughtful commentary on data management that has emerged from the research community recently, such as the following:
- Numerous responses to the recent ANPRM on proposed changes to the common rule commented on the relationship between data sharing and privacy. Notably, two responses by data privacy and computer science researchers provide a roadmap for simultaneously increasing data sharing and privacy protections by leveraging advances in theoretical computer science, and by establishing mechanisms for accountability and transparency in data sharing. [Sweeney, et al., 2010; Vadhan, et al. 2010]
- Numerous responses to the recent OSTP Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research, comment on the benefits of data access, and draw attention to protocols and standards for open data access and interoperability. Notably, the responses of the National Digital Stewardship Alliance, and of the Data-Preservation Alliance for the Social Sciences point to successful exemplars community-based standards for open data dissemination, discovery, and preservation. [NDSA 2011, Data-PASS 2011]
- The NRC’s recently released prepublication report on Communicating Science and Engineering in the Information Age (which supersedes the letter report cited by the task force) develops a number of recommendations that although directed at NCSES are readily applicable to research data management and dissemination in general. Specifically, recommendations 3-1, 3-2, 3-3, and 3-4 together represent good practice for data management in general: Published results are more likely to be reliable, when management of the data supporting them incorporates versioning, open formats and protocols, machine actionable metadata, and management of provenance from data collection through publication. [ NRC 2011 ]
The intent of these responses is not to dispute the recommendations of the task force, but to identify areas of emerging standardization that could be used to further refine and extend the recommendations.
Recommendation 1, and the discussion related to it, calls for NSF to provide leadership in policy development, notes the diversity of stakeholder communities, and cautions against one-size-fits-all solutions. This point is well-taken, as each discipline should be empowered to set priorities for embargo policies, documentation standards, and the like. Nevertheless, as the NDSA recommendations emphasize, some baseline requirements should be applied to all research data management:
Notwithstanding, there are still baseline conditions or requirements that apply to all data
regardless of discipline, particularly as they relate to archiving and preservation. For most
data, “open access” is needed not only for the short term, but for the long term. And
scientific disciplines have focused primarily on short-term access. There are critical
standards for metadata exchange, fixity information and verification, and persistent
citation that can support long-term access to data, preservation, and the long-term
reproducibility of public results. [NDSA 2011]
Recommendation 2 calls for grantees to make data, methods and techniques available to verify and extend figures, tables, findings, and conclusions. The recommendation also notes that data should be shared using persistent electronic identifiers.
This point is also well-taken, and would greatly accelerate scientific progress in many fields. The task force may also wish to consider the emerging body of work that demonstrates that scientific publications should, in addition to including persistent identifiers for data, treat references to data in a manner consistent with references to other scientific works — publications should include full citations to data in the standard reference section, and these should be indexed along with other references. [Data-PASS 2011; Altman & King 2007]
Recommendations 4 and 5, and the discussion related to them, emphasize the need for the stakeholders to convene and explore business models; the need for an expansion sustainable data management; and the lack of sufficient standards and business models.
This is clearly right. Notwithstanding, there are a number of successful standards and models that have emerged in different communities, and which the task force may wish to consider as exemplars. Moreover, it is important to note that standards and business models are insufficient. In addition, as the NDSA response points out, it is critical that the capability for data management be demonstrated, rather than asserted:
“Memory institutions such as archives, libraries and museums have an extensive track record with these functions and collaborative organizations such as NDSA could serve the essential purpose of developing or implementing frameworks that thoroughly test and certify assertions.” [NDSA 2011]
Altman, M., Gill, J., & McDonald, M. (2003). Numerical issues in statistical computing for the social scientist. New York: John Wiley & Sons
Altman, M., & King, G. (2007). A Proposed Standard for the Scholarly Citation of Quantitative Data. DLib Magazine, 13(3/4), Available from: http://www.dlib.org/dlib/march07/altman/03altman.html
Data-PASS 2011. “Response to Office of Science and Technology Policy Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research”. Available from: http://www.data-pass.org/sites/default/files/datapass-otsp-rfi-response.pdf
NDSA 2011. “Response to Office of Science and Technology Policy Request for Information on Public Access to Digital Data Resulting from Federally Funded Scientific Research”.
Available from: http://digitalpreservation.gov/documents/NDSA_ResponseToOSTP.pdf
NRC. (2011). Communicating Science and Engineering Data in the Information Age. National Academies Press. Available from: http://www.nap.edu/catalog.php?record_id=13282
Sweeney, L. , et al. 2010. “Comments from Data Privacy Researchers”. Available from: http://dataprivacylab.org/projects/irb/DataPrivacyResearchers.pdf
Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections”. Available from: http://dataprivacylab.org/projects/irb/Vadhan.pdf