Archive for January, 2014

3D Printing for Fun, Science & Libraries

January 31, 2014 Leave a comment

“Could new Maker Spaces together with a reinforced commitment to learning-by-doing create the next generation of tinkerers, fluent in advanced manufacturing and rapid prototyping techniques?” [1]

Rapid fabrication resonates particularly well with “mens et manus” , the MIT philosophy of combining learning and doing. And Neil Gershenfeld has noted that  MIT has had a long-standing joke that a student is allowed to graduate only when their thesis can walk out of the printer. 

For the last year, the Institute has been thoughtfully reflecting on the future of education, and how “doing” will remain a part of it. One exciting vision involves organizing around a combination of academic villages and maker spaces that catalyze and combine on-line activities, in-person interactions and hands-on experiences.

My colleague Matt Bernhardt  was prevailed upon to give an overview of some of the key technologies that promise to enable this future. Matt, who is the Libraries current Web Developer, was trained as an architect and  founded and ran a fabrication space at the University of Ohio, and is acting as an expert advisor. We collaborated to organize a workshop summarizing the current generation of rapid fabrication technologies as an IAP session and as part of the Program on Information Science Brown Bag Series.

Matt’s excellent talk provided a general overview of the digitation-fabrication cycle and the broad categories of technologies for 3-d scanning and for rapid fabrication: subtractive, deformative, and additive methods, and their variants. His talk also provides exemplars of the state-of-the-practice in additive fabrication technologies, emerging methods and the range of capabilities (e.g. for scale, materials, precision) currently available in practice. (These slides are embedded below: )

For thousands of years, libraries have had a major role in the discovery, management, and sharing of information. Rapid fabrication can be seen, in a way, as offering the ability to materialize information. So the question of  what roles the libraries might take on in relationship to supporting  fabrication and  managing the intellectual assets produced and used  is of natural interest from a library and information science point of view.  And it is not just of theoretical interest —  a recent survey by the Garnder-Harvey Library found that  substantial proportion of libraries were providing or planning to provide at least some support for “Maker Spaces”.

As  a complement to Matt’s talk, I outlined in the presentation below how fabrication fits into the research information life cycle, and some of the models for library support:

Clearly this area is of interest to MIT affiliates. The IAP talk rapidly reached its cap of 35 registrants, with nearly 70 more on the wait list, and participants  in the session discussed exploring how rapid fabrication can be used in a variety of ongoing research and scholarly projects, such as collaborative design of a satellite, rapid development of robots, modeling human anatomy for biomedical engineering, and fashion!

More generally, fabrication technologies are now used in production for medical implants, prosthetics, teaching aids, information visualization, research on rare/fragile objects, architecture, art, and advanced manufacturing. And use for creating custom pharmaceuticals, “printing” or “faxing” biological systems, and for printing active objects with embedded sensors and electronics is on the horizon. Having a dissertation “walk out of the printer” won’t be a joke for much longer.

As Lipson and Kerman [3] astutely point out, rapid advances in fabrication technologies are rapidly lowering a number of different barriers faced by researchers (and others), barriers that had previously made it prohibitively difficult for most individuals, researchers, or organizations to manufacture objects without substantial investment in obtaining manufacturing skills and equipment; to manufacture complex objects; to offer a wide variety of different objects; to easily customize and individualize manufacturing; to manufacture objects locally, or on-site; to manufacture objects with little lead time (or just-in-time); or to easily and precisely replicate physical objects. Furthermore, as they point out, additive fabrication technologies open up new forms of design (“design spaces”) such as localized design (based on local conditions and needs), reactive design (where objects are manufactured that collect sensor information that is then used to manufacture better objects), generative design (physical objects based on mathematical patterns and processes), and the application sample-remix-and-burn to physical objects.

Increasingly, fabrication is becoming part of various stages of the research lifecycle. These technologies may be use early on as part of prototyping for research interventions or to embed sensors for research data collection; or later on as part of analysis or research collaboration (e.g. by materializing models for examination and sharing). And, naturally, these technologies produce intellectual assets — sensor data and digitization, models and methods, that are potentially valuable to other researchers for future reuse and replication. The Library may have a useful role to play in managing these assets.

And this is only the beginning. Current technologies allow control over shape. Emerging technologies (as Matt’s talk shows) are beginning to allow control over material composition. And as any avid science-fiction reader could tell you — control over the behavior of matter is next, and a real replicator should be able to print a table that can turn into a chair at the press of a button. (Or for those aficionados of 70’s TV —  a floor wax that can turn itself into a dessert topping. )

Libraries have a number of core competencies that are complementary to fabrication.

  • Libraries have special competency in managing information. Fabrication technologies make information material and help make material objects into information.
  • Libraries support the research process. Use of fabrication technologies requires a core set of skills and knowledge (such as databases of models) outside of specific research domainsand requires skills and knowledge that are not in the sole domain  of any one discipline.
  • Libraries promote literacy broadly. And the use of fabrication technologies promote design, science, technology, engineering, art, and mathematics.
  • Libraries are responsible for maintaining the scholarly record. The digitizations, designs, and models produced as part of rapid fabrication approaches can constitute unique & valuable parts of the scholarly record. 
  • Libraries create physical spaces designed for research and learning. Successful ‘makerspaces’ bring together accessible locations; thoughtfully designed space; curated hardware & software; skilled staff;  local information management; and global ‘reference’ knowledge.

The seminars provoked a lively discussion, and this is a promising area for  further experiments and pilot projects. The Program has invested in an MakerBot and 3d scanner for use in further exploration and pilot projects, and our program intern, is currently conducting a review of existing websites, policies, and documentation in support of rapid fabrication at other libraries.


[1] Institute-Wide Task Force on the Future of MIT Education, Preliminary Report. <>


[3] Lipson & Kerman, 2013. Fabricated. Wiley.

Categories: Uncategorized

What’s new in managing confidential research data this year?

January 18, 2014 Leave a comment

What’s new in managing confidential research data this year?

For MIT’s independent activities periods (IAP) the Program on Information Science regularly leads a practical workshop on managing confidential data.  This is in part a result of research through the Privacy Tools project.  As I was updating the workshop for this semester, I had an opportunity to reflect upon what’s new on the pragmatic side of managing confidential information.

Most notably, because of the publicity surrounding the NSA, more people (and in higher places) are paying attention.  (And as an information scientist I note that one benefit of the NSA scandal is that everyone now recognizes the term “metadata”).

Also, generally, personal information continues to become more available  and  increasingly easy to link information to individuals. New laws, regulations and policies  governing information privacy continue to emerge, increasing the complexity of management. . Trends in information collection and management — cloud storage, “big” data,  and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.

On the pragmatic side, new privacy laws continue to emerge at the state level. Probably the most notable is the California “right to be forgotten”  — for teens. This year California became the  the first state to pass a law  (“The Privacy Rights for California Minors in the Digital World”)  that gives (some) individuals the right to remove (some) content they have posted online.
The California law takes effect next year (Jan 1, 2015) — by which time we’re likely to see new information privacy initiatives in some other states. This year wa are also likely to see the release of specific  data sharing requirements from federal funders (as a result of the OSTP “Holdren Memo”, NIH’s big data to knowledge initiative, and related efforts); from journals and from professional societies. Farther off in the wings looms the possibility of a general right to be forgotten law in the EU; changes to how the “common rule” evaluates information risks and controls (on which subject the NAS recently issued a new set of recommendations); and possible “sectoral” privacy laws targeted at “revenge-porn”, “mug-shot” databases, mobile-phone data, or other issues-de-jour.
This creates an interesting tension and will require increasingly sophisticated approaches that can provide both privacy and  appropriate access.  From a policy point of view one possible way of setting this balance is by using “least restrictive terms” language — the OKF’s open economic principles may provide a viable approach.
In a purely operational sense — the biggest change in confidential data management for researchers is the wider availability of “safe-sharing” services for exchanging research data within remote collaborations:
  • On the do-it-yourself front. The increasing flexibility of the FISMA-certified Amazon Web Services  GovCloud makes running a remote, secure research computing environment easier and more economical. Although this still complex and expensive to maintain, and one still has to trust Amazon — although the FISMA certifications make that trust better justified.
  • The second widely used option — combining file-sharing services like DropBox with encrypted filesystems like TrueCrypt also received a boost this year, with the success of a crowdfunded effort to independently audit the TrueCrypt source. This is good news, and the transparency and verifiability of TrueCrypt is its big strength. The approach  remains limited  in practice to secure publishing of information — it doesn’t support simultaneous remote updates (not unless you like filesystem corruption); multiple keys for different users or portions of the filesystem; key distribution — etc.
  • A number of simpler solutions have emerged this year.
    – Bittorrent Sync provides “secure” P2P replication and sharing based on a secret private key.
    – SpiderOak Hive;; and BoxCryptor all offer zero-knowledge cloud-storage, client-side encrypted data sharing. The ease of use and functionality of these systems for secure collaboration is very attractive compared to the other available solutions. BoxCryptor offers an especially wide a range of enterprise features such as  key distribution, revocation, master and group-key-chaining, and other enterprise features, that would make managing sharing among heterogenous groups easier. However, the big downside is the amount of “magic” in these systems. None are open source, nor are any sufficiently well documented (at least externally) or certified (no FISMA, there) to engender trust among us untrusting folk…  ( Although   SpiderOak in particular seems to have a good reputation for trustworthiness…  and the others no doubt have pure hearts, I’d rest easier with the ability to audit source codes, peer-reviewed algorithms, etc.)

For those interested in the meat of the course, which gives an overview of legal, policy, information technology/security, research design, and statistical pragmatics, the new slides are here:

Categories: Uncategorized

The Future of the Future of Digital Stewardship

January 8, 2014 Leave a comment

In December, my colleagues from NDSA and I had the pleasure of attending CNI to present the 2014 National Agenda for Digital Stewardship and to lead a discussion of priorities for 2015. We were gratified to have the company of a packed room of engaged attendees, who participated in a thoughtful and lively discussion.

For those who were unable to attend CNI, the presentation is embedded below.

(Additionally, the Agenda will be discussed this Spring at NERCOMP, in a session I am leading especially for Higher Education IT leaders; at IASSIST in a poster session, represented by Coordination Committee member Jonathan Crabtree; and at IDCC, in a poster session represented by Coordination Committee member Helen Tibbo.)

Discussions of the Agenda at CNI were a first step in the input gathering for the next version of the Agenda. In January, NDSA will start an intensive and systematic process of revising the Agenda for priorities in 2015 and beyond. We expect to circulating these revisions for peer and community review in April and present a final or near-final version (depending on review comments) at the annual Digital Preservation conference in July.

Part of the discussions at our CNI session echoed selected themes in Cliff Lynch’s opening plenary “Perspective” talk, particularly his statements that:

  • We [as a stewardship community] don’t know how well we’re doing with our individual preservation efforts, in general. — We don’t have an inventory of the class of content that is out there, what is covered, and where the highest risks are.

  • There is a certain tendency to “go after the easy stuff”, rather than what’s at risk – our strategy needs to become much more systematic.

In our discussion session these questions were amended and echoed in different forms:

  • What are we doing in the stewardship community, and especially what are we doing well?

  • What makes for collaborative success, and how do we replicate that?

I was gratified that Cliff’s questions resonated well with the summary we’d articulated in the current edition of the National Agenda. The research section, in particular, lays out key questions about information value, risk assessment, and success evaluation, and outlines the types of approaches that are most likely to lead to the development of a systematic, general evidence base for the stewardship community. Moreover, the Agenda calls attention to many examples of things we are doing well.

That said, a question that was posed at our session, and that I heard echoed repeatedly at side conversations during CNI, was “Where are we (as a group, community, project, etc.)  getting stuck in the weeds?”

This question is phrased in a way that attracts negative answers — a potentially positive and constructive rephrasing is: What levels of analysis are most useful for the different classes of problems we face in the stewardship community?

As an information scientist and a formally (and mathematically) trained social scientist, I tend to spend a fair amount of time thinking about and building models of individual, group, and institutional behaviors, tracing the empirical implications of these models, and designing experiments (or seeking natural experiments) that have the potential to provide evidence to distinguish among competing plausible models. In this general process, and in approaching interventions, institutions, and policies generally, I’ve found the following levels of abstraction perennially useful:

The first level of analysis concerns local engineering problems, in which one’s decisions neither affect the larger ecosystem nor provoke strategic reactions by other actors. For example, the digital preservation problem of selecting algorithms for fixity, replication, and verification to yield cost-effective protection against hardware and media failures is best treated at this level in most cases. For this class of problem, the tools of decision theory [1] (of which “cost-benefit” analysis is a subset), economic comparative statics, statistical forecasting, monte-carlo simulation, and causal inference [2] are helpful.

The second level concerns tactical problems, in which other actors react and adapt to your decisions ( e.g., to compete, or to avoid compliance), but the ecosystem (market, game structure, rules, norms) remains essentially fixed. For example, the problem that a single institution faces in setting (internal/external) prices (fees/fines) or usage and service policies; is a strategic one. For tractional problems, applying the tools described above is likely to yield misleading results, and some form of modeling is essential — models  from game theory, microeconomics, behavioral economics, mechanism design, and sociology are often most appropriate. Causal inference remains useful, but must be combined with modeling.

The Agenda itself is not aimed at these two levels of analysis; however, much of the NDSA working groups‘ projects and interests are at the first, local-engineering level:  NDSA publications such as content case studies, the digital preservation in a box toolkit, and the levels of preservation taxonomy may provide guidance for first-level decisions. Many of the the other working group outputs such as storage, web, and staffing surveys, although they do not describe tactical models, do provide baseline information and peer comparisons to inform decisions at the tactical level.

The third level is systems design (in this case legal-socio-technical systems) — in these types of problems, the larger environment (market, game structure, rules, norms) can be changed in meaningful ways.  Systems analysis involves understanding an entire system of simultaneous decisions and their interactions and/or designing an optimal (or at least improved) system. Examples of systems analysis are common in theory: any significant government regulation/legislation should be based on systems analysis. For institutional scale systems analysis, a number of conceptual tools are useful, particularly market design and market failure [3]; constitutional  design [4]; and the co-design of institutions and norms to manage “commons” [5].

Working at this level of analysis is difficult to do well: One must avoid the twin sins of getting lost in the weeds (too low a level of analysis for the problem) and having one’s head in the clouds (thinking at such level of generality that analysis cannot be practically applied, or worse, is vacuous). Both the Agenda and Cliff’s landscape talk are aimed at this level of analysis and manage to avoid both sins to a reasonable degree.

Academics often do not go beyond this level of designing systems that would be optimal (or at least good) and stable if actually implemented. However, it’s exceedingly rare that a single actor (or unified group of actors) has the opportunity to design entire systems at institutional scale– notable examples are the authoring of constitutions, and (perhaps) the use of intellectual property law to create new markets.

Instead, policy makers, funders and other actors with substantial influence at the institutional level are faced with a fourth level of analysis —  represented by the question of “Where do I put attention, pressure, or resources to create sustainable positive change”? And system-design alone doesn’t answer this: Design is essential for identifying where one wants to go, but policy analysis and political analysis are required to understand what actions to take in order to get (closer to) there.

This last question, of the “where do we push now” variety, is what I’ve come to expect (naturally) from my boss, and from other leaders in the field. When pressed, I’ve thus far managed to come up with (after due deliberation) some recommendations (or, at least, hypotheses) for action, but these generally seem like the hardest level of solution to get right, or even to assess as good or bad. I think the difficulty comes from having to have both a coherent high-level vision (from the systems design level) while simultaneously getting back “down into the details” (though not, “in the weeds”) to understand the current arrangements and limitations of power, resources, capacity, mechanism, attention, knowledge, and stakeholders.

Notwithstanding, although we started by aiming more at systems design than policy intervention, some recommendations of the policy intervention sort are to be found in the current Agenda. I expect that this year’s revisions, and the planned phases of external review and input, will add more breadth to these recommendations, but that it will require years more of reflection, iteration, and refinement to identify specific policy recommendation across the entire breadth of issues covered by the Agenda at the systems level.


[1] For an accessible broad overview of decision theory, game theory, and related approaches see M. Peterson [2009], An Introduction to Decision Theory . For a classic introduction to policy applications see Stokey & Zeckhauser 1978, A Primer for Policy Analysis.

[2] There are many good textbooks on statistical inference, ranging from the very basic, accessible and sensible Problem Solving by Chatfield (1995) to the sophisticated and modern Bayesian Data Analysis 3rd edition, by Gelman et. al (2013). There are relatively few good textbooks on causal inference — Judea Pearl’s (2009) Causality: Models, Reasoning and Inference 2nd edition is as definitive as a textbook can be, but challenging; Counterfactuals and Causal Inference: Methods and Principles for Social Research, Morgan & Winship’s textbook, is more accessible.

[3] Market failure is a broad topic, and most articles and even books address only some of the potential conditions for functioning markets. A, good, accessible overview is Stiglitz’s Regulation and Failure , but it doesn’t cover all areas. For information stewardship policy the economics of non-consumptive goods is particularly relevant — see Foray, The Economics of Knowledge (2006); and increasing returns and path dependence are particularly important in social/information network economies — see Arthur 1994, Increasing Returns and Path Dependence in the Economy.

[4] See Lijphart, Arend. “Constitutional design for divided societies.” Journal of democracy 15.2 (2004): 96-109. and Shugart, Matthew Soberg, and John M. Carey. Presidents and assemblies: Constitutional design and electoral dynamics. Cambridge University Press, 1992.

[5] The late Lin Ostrom work was fundamental in this area. See for example, Ostrom, Elinor. Understanding institutional diversity. Princeton University Press, 2009.

Categories: Uncategorized