My colleague, Nancy McGovern, who is Head of Curation and Preservation services presented this as part of the Program on Information Science Brown Bag Series.
DIPIR employs qualitative and quantitative data collection to investigate the reuse of digital data in quantitative social sciences, archaeology, and zoology. It’s main research focus is on significant properties.
The team has also recently published an evaluation of the perception of researchers about what constitutes a trustworthy repository. In DIPIR’s sample, perceptions of trust were influenced by transparency, metadata quality, data cleaning, and reputation with colleagues. Notably absent are such things as certifications, sustainable business models, etc. Also, as in most studies of trust in this area, the context of “trust” is left open — the factors that make an entity trustworthy as a source of information are different than those that might make cause one to trust an entity to preserve deposited with it. Since researchers tend to use data repositories for both, its difficult to tease these apart.
Personal information is ubiquitous and it is becoming increasingly easy to link information to individuals. Laws, regulations and policies governing information privacy are complex, but most intervene through either access or anonymization at the time of data publication.
Trends in information collection and management — cloud storage, “big” data, and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.
This session presented as part of the the Program on Information Science seminar series, examines trends information privacy. And the session will also discuss emerging approaches and research around managing confidential research information throughout its lifecycle.
Much of what we know about scholarly communication and the “science of science” relies on the scholarly record”of journal publications, monographs, and books; and upon the patterns of findings, evidence, and collaborations that analysis of this record reveals. In contrast, research data, in its current state, represents a type of ’scholarly dark matter’ that underlies the current visible evidentiary relationships among publications. Improved data citation practices have the potential to make this dark matter visible.
Yesterday the Data Science Journal published a special issue devoted to data citations: Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. This is a comprehensive review of data citations principles, practices, infrastructure, policy and research. And I’m very pleased to have contributed to writing and researching this document as part of the CODATA-ICSTI Task Group on Data Citation Standards and Practices.
This is a rapidly evolving area, and representatives from the CODATA-ICSTI task group, Force 11, the Research Data Alliance and a number of other groups, have formed a synthesis group which is developing an integrated statement of principles to promote broad adoption of a consistent policy for data citation across disciplines and venues.
My collaborator Michael McDonald and I have been analyzing the data that resulted from the crowd-sourcing participative electoral mapping projects we were involved in and other public redistricting efforts, and this blog includes two earlier articles from this line of research. In this research article, to appear in the Proceedings of the 47th Annual Hawaii International Conference on System Sciences (IEEE/Computer Society Press) we reflect on initial lessons learned about public participation and technology from the last round of U.S. electoral mapping.
Three major factors influenced the effectiveness of efforts to increase public input into the political process through crowdsourcing. First, open electoral mapping tools were a practical necessity to enable substantially greater levels increase public participation. Second, the interest and capacity of local grassroots organizations was critical to catalyzing the public to engage using these tools. Finally, the permeability of government authorities to public input was needed for such participation to have a significant effect.
The impermeability of government to public input in a democratic state can take a number of more-or-less subtle forms, each of which was demonstrated in the last round of electoral mapping: Authorities blatantly resist public input by providing no recognized channel for it; or by creating a nominal channel, but leaving it devoid of funding or process; or procedurally accepting input, but substantively ignoring it
Authorities can also resist public participation and transparency indirectly through the way they make essential information available to the public. For example, mapping authorities that do not wish to have potential political consequences of their plans easily evaluated publicly will not provide election results merged with census geography — although they assuredly use such merged information for internal evaluation of their plans. Redistricting authorities may purposefully restrict the scope of the information they make available. For example, a number of states chose to make available boundaries and information related to the approved plan only. Another subtle way by which authorities can hinder transparency is by releasing information plans in a non-machine readable format. An even more subtle, but substantial barrier is the interface through which representations of plans are made available.
This resistance appears to have been in large part, effective. Public participation increased by an order of magnitude in the last round of redistricting. However, except in a few exemplary cases, visible direct effects on policy outcomes appears modest. You can find more details, in the article.
Data citation supports attribution, provenance, discovery, provenance, and persistence. It is not (and should not be) sufficient for all of these things, but its an important component. In the last 2 years, there have been several major efforts to standardize data citation practices, build citation infrastructure, and analyze data citation practices.
This session presented as part of the the Program on Information Science seminar series, examines data citation from an information lifecycle approach: what are the use cases, requirements and research opportunities. And the session will also discuss emerging infrastructure and standardization efforts around data citation.
A number of principles have emerged for citation — the most central is that data citations should be treated consistently with citations to other objects:Data citations should at least provide the minimal core elements expected in other modern citations; should be included in the references section along with citations to other elements; and indexed in the same way.
Adoption of data citation by journals can provide positive and sustainable incentives for more reproducible science and more complete attribution. This would act to brighten the dark matter of science — revealing connections among evidence bases that are not now visible through citations of articles.
Digital stewardship is vital for the authenticity of public records, the reliability of scientific evidence, and the enduring accessibility to our cultural heritage. Knowledge of ongoing research, practice, and organizational collaborations has been distributed widely across disciplines, sectors, and communities of practice. A few days ago I was honored to officially announce the NDSA’s National Agenda for Digital Stewardship at Digital Preservation 2013. This identifies the highest-impact opportunities to advance the state of the art; the state of practice; and the state of collaboration within the next 3-5 years.
The 2014 Agenda integrates the perspective of dozens of experts and hundreds of institutions, convened through the Library of Congress. It outlines the challenges and opportunities related to digital preservation activities in four broad areas: Organizational Roles, Policies, and Practices; Digital Content Areas; Infrastructure Development; and Research Priorities.
Slides and video of the short (5-min) talk below:
Read the full report here:
This presentation, invited for a workshop on data preservation for open science, held at JCDL 2013, gives a brief tour of a large topic — how do we understand the types of data and software used in social science research. In this presentation I characterize the intellectual landscape across 9 dimensions of data structure, content, measure, and use. I then use this framework to characterize three interesting use cases.
This illustrates some particular challenges for long-term access to and replication of social science research, including the use of “messy” human sensors; the wide mix of data types, structures, sparsity; complex legal constraints; pervasive use of manual and computer-aided coding; use of niche commercial software and bespoke software; and very long-term access requirements.