Science photographer Felice Frankel who is a research scientist in the Center for Materials Science and Engineering at the Massachusetts Institute of Technology, gave this talk on The Visual Component: More Than Pretty Pictures, as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated through the slides below, Felice made the argument that images and figures are first class intellectual objects — and should be considered just as important as words in publication, learning, and thinking.
In her abstract, Felice summarizes as follows:
Visual representation of all kinds are becoming more important in our ever growing image-based society, especially in science and technology. Yet there has been little emphasis on developing standards in creating or critiquing those representations. We must begin to consider images as more than tangential components of information and find ways to seamlessly search for accurate and honest depictions of complex scientific phenomena. I will discuss a few ideas to that end and show my own process of making visual representations in sciences and engineering. I will also make the case that representations are just as “intellectual” as text.
The talk presented many visual representations from a huge variety of scientific domains and projects. Across these projects, the talk returned to a number of key themes.
- When you develop a visual representations it is vital for you to identify the overall purpose of the graphic (for example, whether it is explanatory or exploratory); the key ideas that the representation should communicate about the science; and the context in which the representation will be viewed.
- There are a number of components of visual design that are universal across subject domain, including: composition, abstraction, coloring, and layering. And small, incremental refinements in the representation can dramatically improve the quality of the representation.
- The process of developing visual representations engages both students and researchers in critical thinking about science; and this process can be used as a mechanism for research collaboration.
- Representations are not the science, they are communications of the science; and all representations involve design and manipulation. Maintaining scientific integrity requires transparency about what is included, what is excluded, and what manipulations were used in preparing the representation.
In my observation, information visualization is becoming increasingly popular, and tools for creating visualizations are increasingly accessible to a broad set of contributors. Universities would benefit from supporting students and faculty in visual design for research, publication, and teaching; and in supporting the discovery and curation of collections of representations.
Library engagement in this area is nascent, and there are many possible routes for engagement. Library support for scientific representations is often limited — especially compared to the support for pdf documents or bibliographic citations. I speculate that there are at least five productive avenues for involvement.
- Libraries could provide support for researchers in curating personal collections of representations; in sharing them for collaboration; and in publishing them as part of research and educational content. Further researchers have increasing opportunities to cycle between physical and virtual representations of information, thus support for curating information representations can dovetail with library support for making and makerspaces.
- Library information systems seldom incorporate information visualization effectively in support of resource discovery and navigation. New information and visualization technologies and methods offer increased opportunities to make library more accessible and more engaging.
- Image-based searching is another area that demonstrates that search is not a solved problem. Image-based search provides a powerful means of discovering content that is almost completely absent from current library information systems.
- Visual design and communication skills are seldom explicitly documented or transmitted in the academy. Libraries have a vital role to play in making accessible the body of hidden (“tacit”) knowledge and skills that are critical for success in developing careers.
- Libraries have a role in helping researchers to engage in evolving systems of credit and attribution. For example, the CredIT taxonomy (which we helped to develop, and which is being adopted by scholarly journals such as Cell and PLOS) provides a way to formally record attribution for those who contribute scientific visualizations.
Diana Hellyar who is a Graduate Research Intern in the program, reflects on her investigations into augmented reality, virtual reality, and related technologies
Libraries Can Use New Visualization Technology to Engage Readers
My research as a Research Intern for the MIT Libraries Program on Information Science is focused on the applications of emerging virtual reality and visualization technology to library information discovery. The area of virtual reality and other visualization technology is a rapidly changing field. Staying on top of these technologies and applying them into libraries can be difficult since there is little research on the topic. While I was researching the uses of virtual reality in libraries, I came across an example of how some libraries were able to incorporate augmented reality into their children’s department. Out of a dozen examples, this one caught my attention for many reasons. This example is not just a prototype; it was being used in multiple libraries. It was also easily adopted by non-technical librarians and was easy enough to be used by children.
The mythical maze app (available here) has been downloaded more than 10,000 times to date. Across the United Kingdom children participated in the Reading Agency’s 2014 Summer Reading Challenge, Mythical Maze, by downloading the Mythical Maze app on their mobile devices. Liz McGettigan discusses the app in an article published on the Charter Institute of Library and Information Professions website by explaining how it uses augmented reality to make posters and legend cards around the library come to life. The article links to The Reading Agency’s promotional video (watch it here). The video discusses how mythical creatures are hidden around the library and how children can look for these mythical creatures with their app. If they find the creatures, they can use the app to unlock mini-games. The app also allows children to scan stickers they receive from reading books, which unlocks rewards and allows children to learn more about the mythical creatures.
Using apps and integrating augmented reality is a fun way to do a summer reading challenge. The Reading Agency reported that 2014 was record-breaking year for their program. They state that participation increased by 3.6% and that 81,908 children joined the library to participate in the program, up 22.7% from the previous year. These statistics show that children are responding positively to augmented reality in their libraries.
I think that the best part about this app is that it allows the children’s room to come alive. Children can interact with the library in a way they never have been before. Encouraging children to use their devices in the library in a fun and educational way is groundbreaking. They may never have been allowed to play with and learn from their devices at the library before.
The article about the summer reading challenge also discussed the idea of “transliteracy”. The author, Liz McGettigan, says that transliteracy is defined as the “ability to read, write and interact across a range of platforms and tools”. It’s important to encourage children to learn how to use their devices to find the information they are looking for. Encouraging children to use their devices for the summer reading challenge helps them to learn how to do this.
What can libraries do with this? I think that libraries can learn from this example and not just for a summer reading program. The librarians can create scavenger hunts for kids that are either for fun or to help them learn about the library and its services. Children can collect prizes for the things they find in the library using the app. Librarians can even use it to have kids react to and rate the books they read. An app can be designed so that if a child hovers their device over a book they can see other children’s ratings and comments about the book. They can do any of these things and more to create new excitement for their library.
One way for this to work would be if publishers teamed up with libraries to create content for similar apps. Then, there would be many more possibilities for interactive content without worrying about copyright issues. Libraries could create a small section of books that would be able to interact with the app. Then, with the device hovered over a book, the story comes to life and is read to them.
There are so many possibilities for teaching, learning, and reading while using augmented reality in children’s departments of libraries. The Mythical Maze summer reading program is hopefully only the beginning in terms of using this technology to engage children. With the success of the summer reading challenge, I hope other libraries will consider including it in their programming. Using this technology will only enhance learning and will create fun new ways to get children excited about reading.
This example illustrates the possibility of using augmented reality to engage in new visualization technologies. Many types of libraries can implement this technology and allow their users to interact with physical materials in a way they never have before.
LibrePlanet 2016, Software Curation and Preservation
This year’s LibrePlanet conference, organized by the Free Software Foundation, touched on a number of themes that relate to research on software curation and preservation taking place at MIT’s Program on Information Science.
The two day conference hosted at MIT aimed to “examine how free software creates the opportunity of a new path for its users, allows developers to fight the restrictions of a system dominated by proprietary software by creating free replacements, and is the foundation of a philosophy of freedom, sharing, and change.” In a similar way, at the MIT program on Information Science, we are investigating the ways in which sustainable software might positively impact academic communities and shape future scholarly research practices. This was a great opportunity to compare and contrast the concerns and goals of the Free Software movement with those who use software in research.
A number of recurring themes emerged over the course of the weekend that could inform research on software curation. The event kicked off with a conversation between Edward Snowden and Daniel Kahn Gillimor. They tackled privacy and security, and spoke at length about how current digital infrastructures limit our freedoms. Interestingly, they also touched on how to expand the Free Software community and raise awareness with non technical folks about the need to create, and use, Free Software. A lack of incentives for “newbies” inhibits the growth of the Free Software movement; Free Software needs to compete with proprietary software’s low entry levels and user experience. Similarly, the growth of sustainable, reusable, academic software through better documentation, storage, and visibility is inhibited by a lack of incentives for researchers and libraries to improve software development practices and create curation services.
The talks “Copyleft for the next decade: a comprehensive plan” by Bradley Kuhn and “Will there be a next great Copyright Act?” by Peter Higgins both examined the ways in which licensing and copyright are impacting the Free Software movement. The future seems somewhat bleak for GPL licensing and copyleft with developers being discouraged from using this license, and instead putting their work under more permissive licenses which then allow companies to use and profit from other’s software. In comparison, research gateways like NanoHub and HubZero encounter the same difficulties in encouraging researchers to make their software freely available to others to use and modify. As both speakers touched on, the general lack of understanding, and also fear, surrounding copyright needs to be remedied. Scihub was also mentioned as an example of a tool that, whilst breaking copyright law, is also revolutionary in nature in that no library has ever aggregated more scientific literature on one platform. How can we create technologies that make scholarly communication more open in the future? Will the curation of software contribute to these aims? Within wider discussions on open access, it is also worthwhile to think about how software can often be a research object in its own right that merits the same curation and concern as journal papers and datasets.
The ideas discussed in the session “Getting the academy to support free software and open science” had many parallels to the research being carried out here at the MIT Program on Information Science. The three speakers spoke about Free Software activities within their home institutions and the barriers that are created by the heavy use of proprietary software at universities. Not only does the continued use of this software result in high costs and the perpetuation of the “centralized web” that relies on companies like Google, Microsoft, and Apple, but this also encourages students to think passively about the technologies they use. Instead, how can we encourage students to think of software as something they can build on and modify through the use of Free Software? Can we develop more engaged academic communities who think and use software critically through the development of software curation services and sustainable software practices? This was a really interesting discussion that explored problematic infrastructures in higher education.
Finally, Alison Macrina and Nima Fatemi’s talk on the “Library Freedom Project: the long overdue partnership between libraries and free software” put the library front and centre in the role of engaging the wider community in Free Software and advocating for better privacy and more freedom. The Library Freedom Project not only educates librarians and patrons on internet privacy but has also rolled out Tor browsers in a few public libraries. What can academic libraries do to build on this important work and to increase awareness about online freedom within our communities?
The conference was a great way to gain insight into the wider activities of the software community and to talk with others from a multitude of different disciplines. It was interesting to think about how research on software curation services could be informed by these broader discussions on the future of Free Software. Academic librarians should also think about how they can advocate for Free Software in their institutions to encourage better understanding of privacy and to foster environments in which software is critically evaluated to meet user needs. Can libraries embrace the Free Software movement as they have the Open Access movement?
Why search is not a solved (by google) problem, and why Universities Should Care: Ophir Frieder’s Talk
Ophir Frieder, who holds the Robert L. McDevitt, K.S.G., K.C.H.S. and Catherine H. McDevitt L.C.H.S. Chair in Computer Science and Information Processing at Georgetown University and is Professor of Biostatistics, Bioinformatics, and Biomathematics at the Georgetown University Medical Center, gave this talk on Searching in Harsh Environments as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Ophir rebuts the myth that “Google has solved search”, and discusses the challenges of searching for complex objects, through hidden collections, and in harsh environments
In his abstract, Ophir summarizes as follows:
Many consider “searching” a solved problem, and for digital text processing, this belief is factually based. The problem is that many “real world” search applications involve “complex documents”, and such applications are far from solved. Complex documents, or less formally, “real world documents”, comprise of a mixture of images, text, signatures, tables, etc., and are often available only in scanned hardcopy formats. Some of these documents are corrupted. Some of these documents, particularly of historical nature, contain multiple languages. Accurate search systems for such document collections are currently unavailable.
The talk discussed three projects. The first project involved developing methods to search collections of complex digitized documents which varied in format, length, genre, and digitization quality; contained diverse fonts, graphical elements, and handwritten annotations; and were subject to errors due to document deterioration and from the digitization process. A second project involved developing methods to enable searchers who arrive with sparse, fragmentary, error-ridden clues about places and people to successfully find relevant connected information in the Archives Section of the United States Holocaust Memorial Museum. A third project involved monitoring Twitter for public health events without relying on a prespecified hypothesis.
Across these projects, Frieder raised a number of themes:
- Searching on complex objects is very different from searching the web. Substantial portions of complex objects are invisible to current search. And current search engines do understand the semantics of relationships within and among objects — making the right answers hard to find.
- Searching across most online content now depends on proprietary algorithms, indices, and logs.
- Researchers need to be able to search collections of content that may never be made available publicly online by Google or other companies.
Despite the increasing amount of born digital material, I speculate that these issues will become more salient to research, and that libraries have a role to play in addressing them.
While much of the “scholarly record” is currently being produced in the form of “pdf”s, which are amenable to the Google searching approach, much web-based content is dynamically generated and customized, and scholarly publications are increasingly incorporating dynamic and interactive features. Searching these will effectively will require engaging with scientific output as complex objects
Further, some areas of science, such as the social sciences, increasingly rely on proprietary collections of big data from commercial sources. Much of this growing evidence base is currently accessible only through proprietary API’s. To meet the heightened requirements for transparency and reproducibility, stewards are needed for these data who can ensure nondiscriminatory long-term research access.
More generally, it is increasingly well recognized that the evidence base of science not only includes published articles, community datasets (and benchmarks); but also may extends to scientific software, replication data, workflows, and even electronic lab notebooks. The article produced at the end is simply a summary description of one pathway the evidence reflected in theses scientific objects. Validating, reproducing, and building on science may increasingly require access to, search over, and understanding of this entire complex set.
Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern’s English Department gave a talk on Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Julia discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.
In her abstract, Julia summarizes as follows:
Twenty-five years ago, jobs in humanities computing were largely interstitial: located in fortuitous, anomalous corners and annexes where specific people with idiosyncratic skill profiles happened to find a niche. One couldn’t train for such jobs, let alone locate them in a market. The emergence of the field of “digital humanities” since that time may appear to be a disciplinary and methodological phenomenon, but it also has to do with labor: with establishing a new set of jobs for which people can be trained and hired, and which define the contours of the work we define as “scholarship.”
In the research described in her talk Julia identifies seven different roles involved in digital humanities scholarship: developer, administrator, manager, scholar, analyst, data creator, and information manager. She then describes the various skills and metaknowledge required for each and how these roles interact.
(I will note here that the libraries and press have conducted complementary research and engaged in standardization around describing contributorship roles. For more information on this see the Project CREDIT site.)
The talk notes the tensions that develop when these roles are out of balance in a project, and particularly the need for balance among scholar, developer, and anlayst roles. Her talk notes that a combination of scholar, developer, and analyst in a single person is very productive but rare. More typically, early career researchers start as data creators/coders, learn a particular tool set, and evolve into scholars. In the absence of a strong analyst role this creates “a peculiar relationship with tools: a kind of distance (on the scholar’s part) and on the other hand an intensive proximity (on the coder’s part) that may not yet have critical distance or meta-knowledge: the awareness needed to use the tools in a fully knowing way.”
Observing commercial and research software development projects over thirty years — one of the most common causes of catastrophic failure is the gap between the developer’s understanding of the problem being solved and the customer’s understanding of the same problem. A good analyst (often holding a “product manager” title in the corporate world) has the skills to understand both the business and technical domains sufficiently to probe for these misunderstandings and ensure that discussion converges to a common understanding. In addition the analyst aids in abstracting both the technical and domain problems so that the eventual software solution not only meets the needs of the small number of customers in the loop, but is broad enough for a target community. Moreover, librarians often have knowledge in components of the technical domain and in the subject domain — which can serve libraries with particular competitive advantage in developing people in these critical bridge roles.
Chaoqun Ni, who is an Assistant Professor in the School of Library Science at Simmons, presented a talk on Transformative Interactions in the Scientific Workplace as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Chaoqun uses bibliometric data to analyze the sociality, equality, and dynamicity of the scientific workforce.
In her abstract, Chaoqun describes her argument as follows:
I argue that, for a country to be scientifically competitive, it must maximize its human intellectual capital-base and support this workforce equitably and efficiently. I propose here a large-scale and heterogeneous analysis of the sociality, equality, and dynamicity of the scientific workforce through novel computational models for understanding and predicting the career trajectory of scientists based on their transformative interactions, gender, and levels of funding. This analysis will be able to isolate factors that contribute to the health and well-being of the scientific workforce. The computational models will quantify the impact of those transformative events and interactions and provide models to predict the career trajectory of scientists based on their gender, the size and position of the social network, and other demographic factors.
According to the talk, there are three types of events that are particularly likely to transform scholarly careers: being mentored, publishing, and receiving grants. Of these, mentoring occurs earliest in a scholar’s career and has a persistent effect on publication and grants. The relationship is not simple and automatic — mentees do not automatically inherit their mentors success in publication and grant funding. Instead the mentoring relationship is mediated by transfer of knowledge, norms, advice, and connections. And gender disparities are persistent and visible.
This talk resonated with a number of areas in which the Program and Library engage:
First, diversity is a core library value, and this research suggests ways in which the libraries can support a more diverse academic community. The success of early career scholars depends in part on developing a substantial number of specialized career skills that are not part of a specific scientific discipline — including, among many other things (see for example, these slides on reputation and communication), navigating the scholarly publishing process, writing grant proposals, managing bibliographies, and curating data. Much of this knowledge is tacit — it is not explicitly taught but instead transferred through personal mentoring. Libraries are one of the rare parts of the university that are able to successfully capture this tacit knowledge and make it more widely available across the community. The libraries IAP courses are an excellent example of this.
Second, most of the data used for this research is based on Library-mediated collections — citations drawn from journal collections and metadata from dissertation collections. Further, as there is increasing pressure on universities for quantitative evaluation, and increasing desire to actively catalyze collaboration and productivity, there is an increasing need for rich access to Library collections as data, for guidance on tools and approaches (see, for an overview our class on citation analysis), and for expert assistance. Since few researchers have methodological or domain expertise related to bibliometric and scientometric data, this presents an unusual opportunity for libraries to be entrepreneurial in collaborating on new research.
Third, during this talk, Chaoqun noted that that the most laborious and time-consuming phase of the research was the data cleaning and linking phase — particularly dealing with name disambiguation. ORCID, in which the library serves a leadership role (and which MIT has adopted), aims to eliminate this problem. ORCID has spread widely — and just within this month over a dozen major publishers announced their intent to require ORCID’s for journal submissions.
Kim Dulin, who is director of the Harvard Library Innovation Lab and Associate Director for Collection Development and Digitization for the Harvard Law School Library, and former co-director of the Harvard Library Innovation Lab, presented a talk on Taking on Link Rot: Harvard Innovation Lab’s Perma.CC as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Kim discusses how libraries can mitigate link rot in legal scholarship by coordinating on digital preservation.
In her abstract, Kim describes her talk as follows:
Perma.cc (http://perma.cc) is a web archiving platform and service developed by the Harvard Library Innovation Lab (LIL) to help combat link rot. Link rot occurs when links to websites point to web pages whose content has changed or disappeared. Perma.cc allows authors and editors to create permanent links for citations to web sources that will not rot. Upon direction from an author, Perma.cc will retrieve and save the contents of a cited web page and assign it a permanent Perma.cc link. The Perma.cc link is then included in the author’s references. When users later follow those references, they will have the option of proceeding to the website as it currently exists or viewing the cached version of the website as the creator of the Perma.cc link saw it. Regardless of what happens to the website in the future, the content will forever be accessible for scholarly and educational purposes via Perma.cc.
According to the talk, link rot in law publications is very high — approximately fifty percent of links in Supreme Court of the US opinions are rotten, and the situation is worse in law journals. Perma.cc has been successful in part because durability is a very important selling point for attorneysA signal of this is that the latest edition of the official editorial manual for law publications (the “blue book“), now recommends that links included in legal publication be archived.
Perma provides a workable solution to a problem of concern, centered on libraries. In her talk Kim focuses on the diverse role that libraries play. Libraries act as gatekeepers for the content to be preserved; as long-term custodians of the content (and technically as mirrors); and as direct access points.
(I will also note that libraries are critical in conducting research and develop standards in this area. The MIT Library is engaged in developing practices for collaborative stewardship as a member of the National Digital Stewardship Alliance, and the Program is engaged in research on data management and stewardship.)
The talk discusses a number of new directions for perma, including perma-link plugins for Word and wordpress; an API to the service; creating a private LOCKSS network to replicate archival content; and establishing a formal structure of governance, archival policies, and sustainability (funding and resources),
These directions resonates with me. Perma.cc is currently a project that has been very successful at approaching the very general problem of link rot within a specific community of practice. The success of the project in part has to do with the knowledge of, connections with, and adaptation to a specific community. It will be interesting to see how governance and sustainability evolves to enable the transition from a project to community-supported infrastructure .