Amazon recently announced integration of their core S3 service with their low-cost storage system, Glacier. This facilitates the ability to add rules to S3 (or their reduced redundancy store) based on age, date, and S3 bucket prefix.
Regular incremental improvement and integration is a signature of Amazon’s modus operandi for its cloud services: Amazon has a pattern of announcing updates every few weeks that add services, integrate existing services, or (occasionally) lower prices. And they have introduced incremental improvements to the AWS platform over a dozen times since Glacier was announced at the end of August.
Interestingly, Glacier is an apt metaphor for this low-cost service in that it not only signifies “cold” storage, but also signals a massive object making its way slowly but inexorably across a space, covering everything in its path.
Why Glacier is Important
Why is Glacier important? First, as James Hamilton (disclosure, James is VP and Distinguished Engineer at Amazon) aptly summarizes, Glacier provides the volume economics for multi-site, replicated, cold storage (near-line) to small-scale and medium-scale users. While do-it-yourself solutions based on automated tape libraries can still beat Glacier’s price by a huge margin, the sweet spot for this approach has been shifted out so that only very large enterprises are likely to beat the price on Glacier by rolling out their own solutions using tape libraries, etc.
Second, many businesses, and also many services, are built upon or backed up through AWS and S3. Amazon’s continued integration of Glacier into AWS will make it increasingly straightforward to integrate low-cost cold-storage replication into preservation services such as DuraCloud, backup services such as Zmanda, and even into simple software tools like Cyberduck.
Overall, I’m optimistic that this is a Good Thing, and will improve the likelihood of meaningful future access to digital content. However, there are a number of substantial issues to keep in mind when considering Glacier as part of a digital preservation solution.
Issue 1. Technical infrastructure does not guarantee long term durability
Although some commenters have stated that Glacier will “probably outlive us all“, these claims are based on little evidence. The durability of institutions and services relies as much upon economic models, business models, organizational models, and organizational mission as upon technology. Based on the history of technology companies, one must consider that there is a substantial probability that Amazon itself will not be in existence in fifty years, and the future existence of any specific Amazon service is even more doubtful.
Issue 2. Lock-in and future cost projections
As Wired dramatically illustrated, the costs of retrieving all of one’s data from Glacier can be quite substantial. Further, as David Rosenthal has repeatedly pointed out, the long-term cost-competitiveness of preservation services depends “not on their initial pricing, but on how closely their pricing tracks the Kryder’s Law decrease in storage media costs”. And that “It is anyone’s guess how quickly Amazon will drop Glacier’s prices as the underlying storage media costs drop.” The importance of this future price-uncertainty is magnified by the degree of lock-in exhibited by the Glacier service.
Issue 3. Correlated failures
Amazon claims a ‘design reliability’ of ‘99.999999999%‘. This appears to be an extremely optimistic number without any formal published analysis backing it. The number appears to be based on a projection of theoretical failure rates for storage hardware (and such rates are wildly optimistic under production conditions), together with the (unrealistic) assumption that all such failures are statistically independent. Moreover, this ‘design reliability’ claim is unsupported (at time of writing) by Glacier’s terms of service, SLA, or customer agreement. To the contrary, the agreements appear to indemnify Amazon against any loss of damage, does not appear to offer a separate SLA for Glacier, and limits recovery under existing SLA’s (for services, such as S3) to refund of fees for periods the service was unavailable. If Amazon were highly confident in the applicability of the quoted ‘design reliability’ to production settings, one might expect a stronger SLA. However, despite these caveats, my guess is that Glacier’s will still turn out to be, in practice, substantially more reliable than the DIY solutions that most individual organizations can afford to implement entirely in-house.
Nevertheless, as previously discussed (most recently at Digital Preservation 2012), a large part of risk mitigation for digital assets is to diversify against sources of correlated failure. Although implementation details are not complete, Glacier does appear to diversify against some common risks to bits — primarily media failure, hardware failure, and localized natural disaster (such as fire, flood). This is good, but far from complete. A number of likely single-point (or highly correlated) vulnerabilities remain, including software failure (e.g. a bug in the AWS software for its control backplane might result in permanent loss that would go undetected for a substantial time; or cause other cascading failures — analogous to those we’ve seen previously); legal threats (leading to account lock-out — such as this, deletion, or content removal); or other institutional threats (such as a change in Amazon’s business model). It is critical that diversification against these additional failures be incorporated into a digital preservation strategy.
To sum up, Glacier is an important service, and appears to be a solid option for cold storage, but institutions that are responsible for digital preservation and long-term access should not use the quoted design reliability in modeling likelihood of loss, nor rely on Glacier as the sole archival mechanism for their content.