Early Results from Auditing Distributed Preservation Networks
I was pleased to participate in the 2012 PLN Community Meeting.
Over the last decade, replication has become a required practice for digital preservation. Now, Distributed Digital Preservation (DDP) networks are emerging as a vital strategy to ensure long-term access to the scientific evidence base and cultural heritage. A number of DDP networks are currently in production, including CLOCKSS, Data-PASS, MetaArchive, COPPUL, Lukll, PeDALS, Synergies, Data One, and new networks, such as DFC and DPN are being developed.
These networks were created to mitigate the risk of content loss by diversifying across software architectures, organizational structures, geographic regions, as well as legal, political, and economic environments. And many of these networks have been successful at replicating a diverse set of content.
However, the point of the replication enterprise is recovery. Archival recovery is an even harder problem because one needs to validate not only that a set of objects is recoverable, but also that the collection recovered also contains sufficient metadata and contextual information to remain interpretable! A demonstration of the difficulty of this was the AIHT exercise sponsored by the Library of Congress, which demonstrated that many collections thought to be substantially “complete” could not be successfully re-ingested (i.e. recovered) by another archive, even in the absence of bit-level failures.
In a presentation co-authored with Jonathan Crabtree, we summarized some lessons learned from trial audits of several production distributed digital preservation networks. These audits were conducted using the open source SafeArchive system (www.safearchive.org), which enables automated auditing of a selection of TRAC criteria related to replication and storage. An analysis of the trial audits demonstrates both the complexities of auditing modern replicated storage networks, and reveals common gaps between archival policy and practice. It also reveals gaps in the auditing tools we have available. Our presentation, below, focused on the importance of designing auditing systems to provide diagnostic information that can be used to diagnose non-confirmations of audited policies. Tom Lipkis followed with specific planned and possible extensions in LOCKSS that would enhance diagnosis and auditing.
You may also be interested in the other presentations from the workshop, which are posted on the PLN2012 Website.