Guest Post: Diana Hellyar on Library Use of New Visualization Technologies

April 26, 2016 Leave a comment

Diana Hellyar  who is a Graduate Research Intern in the program, reflects on her investigations into augmented reality, virtual reality, and related technologies

Libraries Can Use New Visualization Technology to Engage Readers

My research as a Research Intern for the MIT Libraries Program on Information Science is focused on the applications of emerging virtual reality and visualization technology to library information discovery. The area of virtual reality and other visualization technology is a rapidly changing field. Staying on top of these technologies and applying them into libraries can be difficult since there is little research on the topic. While I was researching the uses of virtual reality in libraries, I came across an example of how some libraries were able to incorporate augmented reality into their children’s department. Out of a dozen examples, this one caught my attention for many reasons. This example is not just a prototype; it was being used in multiple libraries. It was also easily adopted by non-technical librarians and was easy enough to be used by children.

The mythical maze app (available here) has been downloaded more than 10,000 times to date. Across the United Kingdom children participated in the Reading Agency’s 2014 Summer Reading Challenge, Mythical Maze, by downloading the Mythical Maze app on their mobile devices. Liz McGettigan discusses the app in an article published on the Charter Institute of Library and Information Professions website by explaining how it uses augmented reality to make posters and legend cards around the library come to life. The article links to The Reading Agency’s promotional video (watch it here). The video discusses how mythical creatures are hidden around the library and how children can look for these mythical creatures with their app. If they find the creatures, they can use the app to unlock mini-games. The app also allows children to scan stickers they receive from reading books, which unlocks rewards and allows children to learn more about the mythical creatures.

Using apps and integrating augmented reality is a fun way to do a summer reading challenge. The Reading Agency reported that 2014 was record-breaking year for their program. They state that participation increased by 3.6% and that 81,908 children joined the library to participate in the program, up 22.7% from the previous year. These statistics show that children are responding positively to augmented reality in their libraries.

I think that the best part about this app is that it allows the children’s room to come alive. Children can interact with the library in a way they never have been before.  Encouraging children to use their devices in the library in a fun and educational way is groundbreaking. They may never have been allowed to play with and learn from their devices at the library before.

The article about the summer reading challenge also discussed the idea of “transliteracy”. The author, Liz McGettigan, says that transliteracy is defined as the “ability to read, write and interact across a range of platforms and tools”. It’s important to encourage children to learn how to use their devices to find the information they are looking for. Encouraging children to use their devices for the summer reading challenge helps them to learn how to do this.

What can libraries do with this? I think that libraries can learn from this example and not just for a summer reading program. The librarians can create scavenger hunts for kids that are either for fun or to help them learn about the library and its services. Children can collect prizes for the things they find in the library using the app. Librarians can even use it to have kids react to and rate the books they read. An app can be designed so that if a child hovers their device over a book they can see other children’s ratings and comments about the book. They can do any of these things and more to create new excitement for their library.

One way for this to work would be if publishers teamed up with libraries to create content for similar apps. Then, there would be many more possibilities for interactive content without worrying about copyright issues. Libraries could create a small section of books that would be able to interact with the app. Then, with the device hovered over a book, the story comes to life and is read to them.

There are so many possibilities for teaching, learning, and reading  while using augmented reality in children’s departments of libraries. The Mythical Maze summer reading program is hopefully only the beginning in terms of using this technology to engage children. With the success of the summer reading challenge, I hope other libraries will consider including it in their programming. Using this technology will only enhance learning and will create fun new ways to get children excited about reading.

This example illustrates the possibility of using augmented reality to engage in new visualization technologies. Many types of libraries can implement this technology and allow their users to interact with physical materials in a way they never have before.

Additional Resources:

 

Categories: Uncategorized

Guest Post: Lucy Taylor on LibrePlanet 2016, Software Curation, and Presevation

April 21, 2016 Leave a comment

Lucy Taylor,  who is a Graduate Research in the program, reflects on software curation at the recent LibrePlanet Conference:

LibrePlanet 2016, Software Curation and Preservation

This year’s LibrePlanet conference, organized by the Free Software Foundation, touched on a number of themes that relate to research on software curation and preservation taking place at MIT’s Program on Information Science.

The two day conference hosted at MIT aimed to “examine how free software creates the opportunity of a new path for its users, allows developers to fight the restrictions of a system dominated by proprietary software by creating free replacements, and is the foundation of a philosophy of freedom, sharing, and change.” In a similar way, at the MIT program on Information Science, we are investigating the ways in which sustainable software might positively impact academic communities and shape future scholarly research practices. This was a great opportunity to compare and contrast the concerns and goals of the Free Software movement with those who use software in research.

A number of recurring themes emerged over the course of the weekend that could inform research on software curation. The event kicked off with a conversation between Edward Snowden and Daniel Kahn Gillimor. They tackled privacy and security, and spoke at length about how current digital infrastructures limit our freedoms. Interestingly, they also touched on how to expand the Free Software community and raise awareness with non technical folks about the need to create, and use, Free Software. A lack of incentives for “newbies” inhibits the growth of the Free Software movement; Free Software needs to compete with proprietary software’s low entry levels and user experience. Similarly, the growth of sustainable, reusable, academic software through better documentation, storage, and visibility is inhibited by a lack of incentives for researchers and libraries to improve software development practices and create curation services.

The talks “Copyleft for the next decade: a comprehensive plan” by Bradley Kuhn and “Will there be a next great Copyright Act?” by Peter Higgins both examined the ways in which licensing and copyright are impacting the Free Software movement. The future seems somewhat bleak for GPL licensing and copyleft  with developers being discouraged from using this license, and instead putting their work under more permissive licenses which then allow companies to use and profit from other’s software. In comparison, research gateways like NanoHub and HubZero encounter the same difficulties in encouraging researchers to make their software freely available to others to use and modify. As both speakers touched on, the general lack of understanding, and also fear, surrounding copyright needs to be remedied. Scihub was also mentioned as an example of a tool that, whilst breaking copyright law, is also revolutionary in nature in that no library has ever aggregated more scientific literature on one platform. How can we create technologies that make scholarly communication more open in the future? Will the curation of software contribute to these aims? Within wider discussions on open access, it is also worthwhile to think about how software can often be a research object in its own right that merits the same curation and concern as journal papers and datasets.

The ideas discussed in the session “Getting the academy to support free software and open science” had many parallels to the research being carried out here at the MIT Program on Information Science. The three speakers spoke about Free Software activities within their home institutions and the barriers that are created by the heavy use of proprietary software at universities. Not only does the continued use of this software result in high costs and the perpetuation of the “centralized web” that relies on companies like Google, Microsoft, and Apple, but this also encourages students to think passively about the technologies they use. Instead, how can we encourage students to think of software as something they can build on and modify through the use of Free Software? Can we develop more engaged academic communities who think and use software critically through the development of software curation services and sustainable software practices? This was a really interesting discussion that explored problematic infrastructures in higher education.

Finally, Alison Macrina and Nima Fatemi’s talk on the “Library Freedom Project: the long overdue partnership between libraries and free software” put the library front and centre in the role of engaging the wider community in Free Software and advocating for better privacy and more freedom. The Library Freedom Project not only educates librarians and patrons on internet privacy but has also rolled out Tor browsers in a few public libraries. What can academic libraries do to build on this important work and to increase awareness about online freedom within our communities?

The conference was a great way to gain insight into the wider activities of the software community and to talk with others from a multitude of different disciplines. It was interesting to think about how research on software curation services could be informed by these broader discussions on the future of Free Software. Academic librarians should also think about how they can advocate for Free Software in their institutions to encourage better understanding of privacy and to foster environments in which software is critically evaluated to meet user needs. Can libraries embrace the Free Software movement as they have the Open Access movement?

Categories: Uncategorized

Why search is not a solved (by google) problem, and why Universities Should Care: Ophir Frieder’s Talk

March 18, 2016 Leave a comment

Ophir Frieder, who holds the Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt L.C.H.S. Chair in Computer Science and Information Processing at Georgetown University and is Professor of Biostatistics, Bioinformatics, and Biomathematics at the Georgetown University Medical Center,  gave this talk on  Searching in Harsh Environments as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Ophir  rebuts the myth that “Google has solved search”, and discusses the challenges of searching for complex objects, through hidden collections, and in harsh environments

In his abstract, Ophir summarizes as follows:

Many consider “searching” a solved problem, and for digital text processing, this belief is factually based.  The problem is that many “real world” search applications involve “complex documents”, and such applications are far from solved.  Complex documents, or less formally, “real world documents”, comprise of a mixture of images, text, signatures, tables, etc., and are often available only in scanned hardcopy formats.   Some of these documents are corrupted.  Some of these documents, particularly of historical nature, contain multiple languages.  Accurate search systems for such document collections are currently unavailable.

The talk discussed three projects. The first project involved developing methods to search collections of complex digitized documents which varied in format, length, genre, and digitization quality; contained diverse fonts, graphical elements, and handwritten annotations; and were subject to errors due to document deterioration and from the digitization process. A second project involved developing methods to enable searchers who arrive with sparse, fragmentary, error-ridden clues  about places and people to successfully find relevant  connected  information in the Archives Section of the United States Holocaust Memorial Museum. A third project involved monitoring Twitter for public health events without relying on a prespecified hypothesis.

Across these projects, Frieder raised a number of themes:

  • Searching on complex objects is very different from searching the web. Substantial portions of complex objects are invisible to current search. And current search engines do understand the semantics of relationships within and among objects — making the right answers hard to find.
  • Searching across most online content now depends on proprietary algorithms, indices, and logs.
  • Researchers need to be able to search collections of content that may never be made available publicly online by Google or other companies.

Despite the increasing amount of born digital material, I speculate that these issues will become more salient to research, and that libraries have a role to play in addressing them.

While much of the “scholarly record” is currently being produced in the form of “pdf”s, which are amenable to the Google searching approach, much web-based content is dynamically generated and customized, and scholarly publications are increasingly incorporating dynamic and interactive features. Searching these will effectively will require engaging with scientific output as complex objects

Further, some areas of science, such as the social sciences, increasingly rely on proprietary collections of big data from commercial sources. Much of this growing evidence base is currently accessible only through proprietary API’s. To meet the heightened requirements for transparency and reproducibility, stewards are needed for these data who can ensure nondiscriminatory long-term research access.

More generally, it is increasingly well recognized that the evidence base of science not only includes published articles, community datasets (and benchmarks); but also may extends to scientific software, replication data, workflows, and even electronic lab notebooks. The article produced at the end is simply a summary description of one pathway the evidence reflected in theses scientific objects. Validating, reproducing, and building on science may increasingly require access to, search over, and understanding of this entire complex set.  

Categories: Uncategorized

Roles for Digital Scholarship

March 4, 2016 Leave a comment

Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern’s English Department gave a talk on  Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Julia  discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.

In her abstract, Julia summarizes as follows:

Twenty-five years ago, jobs in humanities computing were largely interstitial: located in fortuitous, anomalous corners and annexes where specific people with idiosyncratic skill profiles happened to find a niche. One couldn’t train for such jobs, let alone locate them in a market. The emergence of the field of “digital humanities” since that time may appear to be a disciplinary and methodological phenomenon, but it also has to do with labor: with establishing a new set of jobs for which people can be trained and hired, and which define the contours of the work we define as “scholarship.”

In the research described in her talk Julia identifies seven different roles involved in digital humanities scholarship: developer, administrator, manager, scholar, analyst, data creator, and information manager. She then describes the various skills and metaknowledge required for each and how these roles interact.

(I will note here that the libraries and press have conducted complementary research and engaged in standardization around describing contributorship roles. For more information on this see the Project CREDIT site.)

The talk notes the tensions that develop when these roles are out of balance in a project, and particularly the need for balance among scholar, developer, and anlayst roles. Her talk notes that a combination of scholar, developer, and analyst in a single person is very productive but rare. More typically, early career researchers start as data creators/coders, learn a particular tool set, and evolve into scholars. In the absence of a strong analyst role this creates “a peculiar relationship with tools: a kind of distance (on the scholar’s part) and on the other hand an intensive proximity (on the coder’s part) that may not yet have critical distance or meta-knowledge: the awareness needed to use the tools in a fully knowing way.”

  Observing commercial and research software development projects over thirty years — one of the most common causes of catastrophic failure is the gap between the developer’s understanding of the problem being solved and the customer’s understanding of the same problem. A good analyst (often holding a “product manager” title in the corporate world) has the skills to understand both the business and technical domains sufficiently to probe for these misunderstandings and ensure that discussion converges to a common understanding. In addition the analyst aids in abstracting both the technical and domain problems so that the eventual software solution not only meets the needs of the small number of customers in the loop, but is broad enough for a target community.  Moreover, librarians often have knowledge in components of the technical domain and in the subject domain — which can serve libraries with particular competitive advantage in developing people in these critical bridge roles.

Categories: Uncategorized

The Importance of Tacit Knowledge in Academia, and the Role of Libraries

February 25, 2016 Leave a comment

Chaoqun Ni,  who is an Assistant Professor in the School of Library Science at Simmons, presented a talk  on  Transformative Interactions in the Scientific Workplace as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Chaoqun uses bibliometric data to  analyze the sociality, equality, and dynamicity of the scientific workforce.

In her abstract, Chaoqun describes her argument as follows:

I argue that, for a country to be scientifically competitive, it must maximize its human intellectual capital-base and support this workforce equitably and efficiently. I propose here a large-scale and heterogeneous analysis of the sociality, equality, and dynamicity of the scientific workforce through novel computational models for understanding and predicting the career trajectory of scientists based on their transformative interactions, gender, and levels of funding. This analysis will be able to isolate factors that contribute to the health and well-being of the scientific workforce. The computational models will quantify the impact of those transformative events and interactions and provide models to predict the career trajectory of scientists based on their gender, the size and position of the social network, and other demographic factors.

According to the talk,  there are three types of events that are particularly likely to transform scholarly careers: being mentored, publishing, and receiving grants. Of these, mentoring occurs earliest in a scholar’s career and has a persistent effect on publication and grants. The relationship is not simple and automatic — mentees do not automatically inherit their mentors success in publication and grant funding. Instead the mentoring relationship is mediated by transfer of knowledge, norms, advice, and connections. And gender disparities are persistent and visible.

This talk resonated with a number of areas in which the Program and Library engage:

First, diversity is a core library value, and this research suggests ways in which the libraries can support a more diverse academic community.  The success of early career scholars depends in part on developing a substantial number of specialized career skills that are not part of a specific scientific discipline — including, among many other things (see for example, these slides on reputation and communication), navigating the scholarly publishing process, writing grant proposals, managing bibliographies, and curating data. Much of this knowledge is tacit — it is not explicitly taught but instead transferred through personal mentoring.  Libraries are one of the rare parts of the university that are able to successfully capture this tacit knowledge and make it more widely available across the community. The libraries IAP courses  are an excellent example of this.

Second, most of the data used for this research is based on Library-mediated collections — citations drawn from journal collections and metadata from dissertation collections. Further, as there is increasing pressure on universities for quantitative evaluation, and increasing desire to actively catalyze collaboration and productivity, there is an increasing need for rich access to Library collections as data, for guidance on tools and approaches (see, for  an overview our class on citation analysis), and for expert assistance.  Since few researchers have methodological or domain expertise related to bibliometric and scientometric data, this presents an unusual opportunity for libraries to be entrepreneurial in collaborating on new research.

Third, during this talk, Chaoqun noted that that the most laborious and time-consuming phase of the research was the data cleaning and linking phase — particularly dealing with name disambiguation. ORCID, in which the library serves a leadership role (and which MIT has adopted), aims to eliminate this problem. ORCID has spread widely — and just within this month over a dozen major publishers announced their intent to require ORCID’s for journal submissions.

 

Categories: Uncategorized

Brown Bag Report: Kim Dulin on Perma.CC

July 6, 2015 Leave a comment

Kim Dulin,  who is director of the Harvard Library Innovation Lab and Associate Director for Collection Development and Digitization for the Harvard Law School Library, and former co-director of the Harvard Library Innovation Lab, presented a talk  on Taking on Link Rot: Harvard Innovation Lab’s Perma.CC  as part of the Program on Information Science Brown Bag Series.

In the talk, illustrated by the slides below, Kim  discusses how libraries can mitigate link rot in legal scholarship by coordinating on digital preservation.

In her abstract, Kim describes her talk as follows:

Perma.cc (http://perma.cc) is a web archiving platform and service developed by the Harvard Library Innovation Lab (LIL) to help combat link rot. Link rot occurs when links to websites point to web pages whose content has changed or disappeared. Perma.cc allows authors and editors to create permanent links for citations to web sources that will not rot. Upon direction from an author, Perma.cc will retrieve and save the contents of a cited web page and assign it a permanent Perma.cc link. The Perma.cc link is then included in the author’s references. When users later follow those references, they will have the option of proceeding to the website as it currently exists or viewing the cached version of the website as the creator of the Perma.cc link saw it. Regardless of what happens to the website in the future, the content will forever be accessible for scholarly and educational purposes via Perma.cc.

According to the talk, link rot in law publications is very high — approximately fifty percent of links in Supreme Court of the US opinions are rotten, and the situation is worse in law journals. Perma.cc has been successful in part because durability is a very important selling point for attorneysA signal of this is that the latest edition of the official editorial manual for law publications (the “blue book“), now recommends that links included in legal publication be archived.

Perma provides a workable solution to a problem of concern, centered on libraries. In her talk Kim focuses on the diverse role that libraries play. Libraries act as gatekeepers for the content to be preserved; as long-term custodians of the content (and technically as mirrors); and as direct access points.

(I will also note that libraries are critical in conducting research and develop standards in this area.  The MIT Library is engaged in developing practices for collaborative stewardship as a member of the National Digital Stewardship Alliance, and the  Program is engaged in research on data management and stewardship.)

The talk discusses a number of new directions for perma, including perma-link plugins for Word and wordpress; an API to the service; creating a private LOCKSS network to replicate archival content;  and establishing a formal structure of governance, archival policies, and sustainability (funding and resources),

These directions resonates with me. Perma.cc is currently a  project that has been very successful  at approaching the very general problem of link rot within a specific community of practice.  The success of the project in part has to do with the knowledge of, connections with, and adaptation to a specific community. It will be interesting to see how governance and sustainability evolves to enable the transition from a project to community-supported infrastructure . 

Categories: Uncategorized

Scientific Reliability from an Informatics Perspective

June 18, 2015 Leave a comment
It is an old saw that science is founded reproducibility… However, the truth is that, reproducibility has always been more difficult than generally assumed — even where the underlying phenomenon and are robust. Since Ioannidis’s PLOS article  in 2005, there has been increasing attention in the medical research to the issue of reproducibility; and attention has been unprecedented in the last two years, with even the New York Times commenting on  “jarring” instances of irreproducible, unreliable, or fraudulent research results.
Scientific reproducibility is most viewed through a methodological or statistical lens, and increasingly, through a computational lens. (See for example, our book on reproducible statistical computation.)  Over the last several years, I’ve taken part in collaborations to that approach reproducibility from the perspective of informatics: as a flow of information across a lifecycle that spans collection, analysis, publication, and reuse.
I had the opportunity to present a sketch of this approach at a recent workshop on reproducibility at the National Academy of Sciences, and at one our Program on Information Science brown bag talks.
The slides from the brown bag talk discuss some definitions of reproducibility, and outline a model for understanding reproducibility as an an information flow:
 
(Also see these videos from workshop on informatics approaches, and other definitions of reproducibility)
In the talk, the talk shows how  reproducibility claims as generally discussed in science, are not crisply defined, and the same reproducibility terminology is used to refer to very different sorts of assertions about the world, experiments, and systems. I outline an approach which takes each type of reproducibility claim and assesses: What are the use cases involving this claim? What does each type of reproducibility claim imply for  information properties, flow and systems? What are proposed or potential interventions in information systems that would strengthen the claims?
For example, a set of reproducibility issues is associated with validation of results. There are several distinct use cases and claims embedded in this — one of which I label as “fact-checking” because of its similarities to the eponymous journalistic use case:
  • Use Case: Post-publication reviewer wants to establish that published claims correspond to analysis method performed.
  • Reproducibility claim: Given public data identifier & analysis algorithm, an independent application of the algorithm yields a new estimate that is within the originally reported uncertainty.
  • Some potential supporting informatics claims:
    1. Instance of data retrieved via identifier is semantically equivalent to instance of data used to support published claim
    2. analysis algorithm is robust to choice of reasonable alternative implementation
    3. implementation of algorithm is robust to reasonable choice of execution details and context
    4. published direct claims about data are semantically equivalent to subset of claims produced by authors previous application of analysis
  • Some potential informatic interventions:
    • In support of claim 1:
      • Detailed provenance history for data from collection through analysis and deposition.
      • Automatic replication of direct data claims from deposited source
      • Cryptographic evidence
        (e.g. cryptographic signed {analysis output including, cryptographic hash of data} & {cryptographic hash of data retrieved via identifier})
    • In support of claim 2:
      • Standard implementation, subject to community review
      • Report of results of application of implementation on standard testbed
      • Availability of implementation for inspection
Overall, my conjecture is that if we wish to support reproducibility  broadly in information systems there are a number of properties/design principles for of information systems that will enhance reproducibility. Within information systems I conjecture that we should designing to maintain properties of: transparency, auditability, provenance, fixity, identification, durability, integrity, repeatability, non-repudiation, and self-documentation. When designing the policies, incentives, and human interactions with these systems we should consider: barriers to entry, ease of use, support for intellectual communities of practice, personalization, credit and attribution, security, performance, sustainability,cost, and trust engineering.
Categories: Uncategorized
Follow

Get every new post delivered to your Inbox.