A History of the Internet : Commentary on Scott Bradner’s Program on Information Science Talk

December 1, 2017 Leave a comment

A History of the Internet : Commentary on Scott Bradner’s Program on Information Science Talk

Scott Bradner is a Berkman Center affiliate who worked for 50 at Harvard in the areas of computer programming, system management, networking, IT security, and identity management. Scott Bradner was involved in the design, operation and use of data networks at Harvard University since the early days of the ARPANET and served in many leadership roles in the IETF. He presented the talk recorded below, entitled, A History of the Internet — as part of Program on Information Science Brown Bag Series:

Bradner abstracted his talk as follows:

In a way the Russians caused the Internet. This talk will describe how that happened (hint it was not actually the Bomb) and follow the path that has led to the current Internet of (unpatchable) Things (the IoT) and the Surveillance Economy.

The talk contained a rich array of historical details — far too many to summarize here. Much more detail on these projects can be found in the slides and video above; from his publications, and from his IETF talks. (And for those interested in recent Program on Information Science research on related issues of open information governance, see our published reports.)

Bradner describes how the space race, exemplified by the launch of Sputnik, spurred national investments in research and technology — and how the arms race created the need for a communication network that was decentralized and robust enough to survive a nuclear first-strike.

Bradner argues that the internet has been a parent revolution, in part because of its end-to-end design. The internet as a whole was designed so that most of the “intelligence” is encapsulated at host endpoints, connected by a “stupid” network carrier that just transports packets. As a result, Bradner argues, the carrier cannot own the customer, which, critically, enables customers to innovate without permission.

ARPANET, as originally conceived, was focused on solving what was then a grand challenge in digital communications research: To develop techniques and obtain experience on interconnecting computers in such a way that a very broad class of interactions are possible, and to improve and increase computer research productivity through resource sharing.

Bradner argues that the internet succeeded because, despite the scope of the problem, solutions were allowed to evolve chaotically: ARPA was successful in innovating because it required no peer review. The large incumbent corporations in the computing and networking field ignored internet because they believed it couldn’t succeed (and they believed it couldn’t succeed because its design did not allow for the level of control and reliability that the incumbents believed to be necessary to making communications work). And since the Internet was was viewed as irrelevant, there were no efforts to regulate it. It was not until after the Internet achieved success, and catalyzed disruptive innovation that policymakers deemed it, “too important to leave to the people that know how it works.”

Our upcoming Summit supported by a generous grant from the Mellon Foundation, will probe for grand challenge questions in scholarly discovery, digital curation and preservation, and open scholarship. Is it possible that the ideas that could catalyze innovation in these areas are, like the early Internet, currently viewed as impractical or irrelvant? .

Categories: Uncategorized

Safety Nets (for information): Commentary on Jefferson Bailey’s Program on Information Science Talk

November 7, 2017 Leave a comment

Jefferson Bailey is Director of Web Archiving at Internet Archive. Jefferson joined Internet Archive in Summer 2014 and manages Internet Archive’s web archiving services including Archive-It, used by over 500 institutions to preserve the web. He also oversees contract and domain-scale web archiving services for national libraries and archives around the world. He works closely with partner institutions on collaborative technology development, digital preservation, data research services, educational partnerships, and other programs. He presented the talk recorded below, entitled, Safety Nets: Rescue And Revival For Endangered Born-digital Records — as part of Program on Information Science Brown Bag Series:

Bailey abstracted his talk as follows:

The web is now firmly established as the primary communication and publication platform for sharing and accessing social and cultural materials. This networked world has created both opportunities and pitfalls for libraries and archives in their mission to preserve and provide ongoing access to knowledge. How can the affordances of the web be leveraged to drastically extend the plurality of representation in the archive? What challenges are imposed by the intrinsic ephemerality and mutability of online information? What methodological reorientations are demanded by the scale and dynamism of machine-generated cultural artifacts? This talk will explore the interplay of the web, contemporary historical records, and the programs, technologies, and approaches by which libraries and archives are working to extend their mission to preserve and provide access to the evidence of human activity in a world distinguished by the ubiquity of born-digital materials.

Bailey eloquently stated the importance of web archiving: “No future scholarship can study our era without considering materials published (only) on the web.” Further, he emphasized the importance of web archiving for social justice: Traditional archives disproportionately reflect social architectures of power, and the lived experiences of the advantaged. Web crawls capture a much broader (although not nearly complete) picture of the human experience.

The talk ranged over an impressively wide portfolio of initiatives — far too many to do justice discussing in a single blog post. Much more detail on these projects can be found in the slides and video above, Bailey’s professional writings, the Archive blog, and experiments page, and archive-it blog for some insights into these.

A unified argument ran through the Bailey’s presentation. At the risk of oversimplifying, I’ll restate the premises of the argument here:

  1. Understanding our era will require research, using large portions of the web, linked across time.
  2. The web is big — but not too big to collect (a substantial portion of) it.FOOTNOTE: Footnote
  3. Providing simple access (e.g. retrieval, linking) is more expansive than collection;
    enabling discovery (e.g. search) is much harder than simple access;
    and supporting computational research (which requires analysis at web-scale, and over time) —
    is much, much harder than discovery.
  4. Research libraries should help with this (hardest) part.

I find the first three parts of the argument largely convincing. Increasingly, new discoveries in social science are based on analysis of massive collections of data that areis generated as a result of people’s public communications, and depends on tracing these actions and their consequences over time. The Internet Archive’s success to date establishes that much of these public communications can be collected and retained over time. And the history of database design (as well as my and my colleagues experiences in archiving and digital libraries) testifies to the challenges of effective discovery and access at scale.

I hope that we, as research libraries, will be step up to the challenges of enabling large-scale, long-term research over content such as this. Research libraries already have a stake in this problem because most of the the core ideas and fundamental methods (although not the operational platforms) for analysis of data at this scale comes from research institutions with which we are affiliated. Moreover if libraries lead the design of these platforms, participation in research will be far more open and equitable than if these platforms are ceded entirely to commercial actors.

For this among other reasons, we are convening a Summit on Grand Challenges in Information Science & Scholarly Communication, supported by a generous grant from the Mellon Foundation. During this summit we develop community research agendas in the areas of scholarly discovery at scale; digital curation and preservation; and open scholarship. For those interested in these questions and related areas of interest, we have published Program on Information Science reports and blog posts on some of the challenges of digital preservation at scale.

Categories: Uncategorized

Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Information Science Talk

October 6, 2017 Leave a comment

Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Information Science Talk

Cassidy Sugimoto is Associate Professor in the School of Informatics and Computing, Indiana University Bloomington, where researches within the domain of scholarly communication and scientometrics, examining the formal and informal ways in which knowledge producers consume and disseminate scholarship. She presented this talk, entitled Labor And Reward In Science: Do Women Have An Equal Voice In Scholarly Communication? A Brown Bag With Cassidy Sugimoto, as part of the Program on Information Science Brown Bag Series.

In her talk, illustrated by the slides below, Sugimoto highlights the roots of gender disparities in science.

 

Sugimoto abstracted her talk as follows:

Despite progress, gender disparities in science persist. Women remain underrepresented in the scientific workforce and under rewarded for their contributions. This talk will examine multiple layers of gender disparities in science, triangulating data from scientometrics, surveys, and social media to provide a broader perspective on the gendered nature of scientific communication. The extent of gender disparities and the ways in which new media are changing these patterns will be discussed. The talk will end with a discussion of interventions, with a particular focus on the roles of libraries, publishers, and other actors in the scholarly ecosystem..

In her talk, Sugimoto stressed a number of patterns in scientific publication:

  • Demise of single authorshop complicates notions of credit, rewards, labor, and responsibility
  • There are distincted patterns of gender disparity in scientific publications: Male-authored publications predominate in most field (with a few exceptions such as Library Science); women collaborating more domestically than internationally on publication; and woman-authored publications tend to be cited less (even within the same tier of journals).
  • Looking across categories of contribution — the most isolated is performing the experiment. And Women are most likely to fill this role. Further, if we look across male-and-female led teams, we see that the distribution of work across these teams varies dramatically.
  • When surveying teams — women tended to value all of the forms of contributions more than men with one exception. Women judge technical work, which is more likely to be conducted by women, as less valuable.
  • Composition of authorship has consequences for what is studied. Womens’ research focuses more often than men on areas relevant to both genders or to women.

Sugimoto notes that these findings are consistent with pervasive gender discrimination. Further, women as well as men frequently discriminate against other women — for example, in evaluation of professionalism, evaluation of work, and in salary offers

Much more detail on these points can be found in Sugimoto professional writings.

Sugimoto’s talk drew on a variety of sources: publication data in the Web of Science; from acknowledgement and authorship statements in PLOS journals. Open bibliometric data, such as that produced by PLOS, the Initiative for Open Citation, and various badging initiatives can help us to more readily bring disparities to light.

At the conclusion of her talk, Sugimoto suggested the following roles for librarians:

Sugimoto’s talk drew on a variety of sources: publication data in the Web of Science; from acknowledgement and authorship statements in PLOS journals. Open bibliometric data, such as that produced by PLOS, the Initiative for Open Citation, and various badging initiatives can help us to more readily bring disparities to light.

  • Use and promote open access in training sessions
  • Provide programming that lessens barriers to participation for women and minorities
  • Advocate for contributorship models which recognize the diversity of knowledge production
  • Approach new metrics with productive skepticism
  • Encourage engagement between students and scholars
  • Evaluate and contribute to the development of new tools

Reflecting the themes of Sugimato’s talk, the research we conduct here, in the Program on Information Science is strongly motivated by issues of diversity and inclusion — particularly on approaches to bias-reducing systems design. Our previous work in participative mapping aimed at increasing broad public participation in electoral processes. Our current NSF-supported work in educational research focuses on using eye-tracking and other biological signals to track fine-grained learning across populations of neurologically diverse learners. And, under a recently-awarded IMLS award, we will be hosting a workshop to develop principles for supporting diversity and inclusion through information architecture in information systems. For those interested in these and other projects, we have published blog posts and reports in these areas.

Categories: Uncategorized

Guest Post: Towards Strategies for Making Legacy Software Curation-Ready

September 27, 2017 Leave a comment

Alex Chassanoff is a CLIR/DLF Postdoctoral Fellow in the Program on Information Science and continues a series of posts on software curation.

In this blog post, I am going to reflect upon potential strategies that institutions can adopt for making legacy software curation-ready.  The notion of “curation-ready” was first articulated as part of the “Curation Ready Working Group”, which formed in 2016 as part of the newly emerging Software Preservation Network (SPN).  The goal of the group was to “articulate a set of characteristics of curation-ready software, as well as activities and responsibilities of various stakeholders in addressing those characteristics, across a variety of different scenarios”[1].  Drawing on inventory at our own institutions, the working group explored different strategies and criteria that would make software “curation-ready” for representative use cases.  In my use case, I looked specifically at the GRAPPLE software program and wrote about particular use and users for the materials.

This work complements the ongoing research I’ve been doing as a Software Curation Fellow at MIT Libraries [2] to envision curation strategies for software.  Over the past six months, I have conducted an informal assessment of representative types of software in an effort to identify baseline characteristics of materials, including functions and uses.

Below, I briefly characterize the state of legacy software at MIT.

  • Legacy software often exists among hybrid collections of materials, and can be spread across different domains.
  • Different components(e.g., software dependencies, hardware) may or may not be co-located.
  • Legacy software may or may not be accessible on original media. Materials are stored in various locations, ranging from climate-controlled storage to departmental closets.
  • Legacy software may exist in multiple states with multiple contributors over multiple years.
  • Different entities (e.g., MIT Museum, Computer Science and Artificial Intelligence Laboratory, Institute Archives & Special Collections) may have administrative purview over legacy software with no centralized inventory available.
  • Collected materials may contain multiple versions of source code housed in different formats (e.g., paper print outs, on multiple diskettes) and may or may not consist of user manuals, requirements definitions, data dictionaries, etc.
  • Legacy software has a wide range of possible scholarly use and users for materials. These may include the following: research on institutional histories (e.g., government-funded academic computing research programs), biographies (e.g., notable developers and/or contributors of software),  socio-technical inquiries (e.g., extinct programming languages, implementation of novel algorithms), and educational endeavors (e.g., reconstruction of software).

We define curation-ready legacy software as having the following characteristics: being discoverable, usable/reusable, interpretable, citable, and accessible.  Our approach views curation as an active, nonlinear, iterative process undertaken throughout the life (and lives) of a software artifact.

Steps to increase curation-readiness for legacy software

Below, I briefly describe some of the strategies we are exploring as potential steps in making legacy software curation-ready.  Each of these strategies should be treated as suggestive rather than prescriptive at this stage in our exploration.

Identify appraisal criteria. Establishing appraisal criteria is an important first step that can be used to guide decisions about selection of relevant materials for long-term access and retention. As David Bearman writes, “Framing a software collecting policy begins with the definition of a schema which adequately depicts the universe of software in which the collection is to be a subset.”[3]  It is important to note that for legacy software, determining appraisal criteria will necessarily involve making decisions about both the level of access and preservation desired.  Decision-making should be guided by an institutional understanding of what constitutes a fully-formed collection object. In other words, what components of software should be made accessible? What will be preserved? Does the software need to be executable? What levels of risk assessment should be conducted throughout the lifecycle?  Making these decisions institutionally will in turn help guide the identification of appropriate preservation strategies (e.g., emulation, migration, etc) based on desired outcomes.

Identify, assemble, and document relevant materials. A significant challenge with legacy software lies in the assembling of relevant materials to provide necessary context for meaningful access and use.  Locating and inventorying related materials (e.g., memos, technical requirements, user manuals) is an initial starting point. In some cases, meaningful materials may be spread across the web at different locations.  While it remains a controversial method in archival practice, documentation strategy may provide useful framing guidance on principles of documentation [4].

Identify stakeholders. Identifying the various stakeholders, either inside or outside of the institution, can help ensure proper transfer and long-term care of materials, along with managing potential rights issues where applicable.  Here we draw on Carlson’s work developing the Data Curation Profile Toolkit and define stakeholders as any group, organizations, individuals or others having an investment in the software, that you would feel the need to consult regarding access, care, use, and reuse of the software[5].

Describe and catalog materials. Curation-readiness can be increased by thoroughly describing and cataloging select materials, with an emphasis on preserving relationships among entities. In some cases, this may consist of describing aspects of the computing environment and relationships between hardware, software, dependencies, and/or versions. Although the software itself may not be accessible, describing related materials (i.e., printouts of source code, technical requirements documentation) adequately can provide important points of access. It may be useful to consider the different conceptual models of software that have been developed in the digital preservation literature and decide which perspective aligns best with your institutional needs [6].

Digitize and OCR paper materials. Paper printouts of source code and related documentation can be digitized according to established best practice workflows[7].  The use of optical character recognition (OCR) programs produces machine-readable output, enabling easy indexing of content to enhance discoverability and/or textual transcriptions.  The latter option can make historical source code more portable for use in simulations or reconstructions of software.

Migrate media. Legacy software often reside on unstable media such as floppy disks or magnetic tape. In cases where access to the software itself is desirable, migrating and/or extracting media contents (where possible) to a more stable medium is recommended [8].

Reflections

As an active practice, software curation means anticipating future use and uses of resources from the past. Recalling an earlier blog post, our research aims to produce software curation strategies that embrace Reagan Moore’s theoretical view of digital preservation, whereby “information generated in the past is sent into the future”[9]. As the born-digital record increases in scope and volume, libraries will necessarily have to address significant changes in the ways in which we use and make use of new kinds of resources.  Technological quandaries of storage and access will likely prove less burdensome than the social, cultural, and organizational challenges of adapting to new forms of knowledge-making. Legacy software represents this problem space for libraries/archives today.  Devising curation strategies for software helps us to learn more about how knowledge-embedded practices are changing and gives us new opportunities for building healthy infrastructures [10].

References

[1] Rios, F., Almas, B., Contaxis, N., Jabloner, P., Kelly, H.. (2017). Exploring curation-ready software: use cases. doi:10.17605/OSF.IO/8RZ9E

[2] These are some of the open research questions being addressed by the initial cohort of CLIR/DLF Software Curation Fellows in different institutions across the country.

[3] Bearman, D. (1985). Collecting software: a new challenge for archives & museums. Archives & Museum Informatics, Pittsburgh, PA.

[4] Documentation strategy approaches archival practice as a collaborative work among record creators, archivists, and users.  It often traverses institutions and represents an alternative approach by prompting extensive documentation organized around an “ongoing issue or activity or geographic area.” See:  Samuels, H. (1991). “Improving our disposition: Documentation strategy,” Archivaria 33, http://web.utk.edu/~lbronsta/Samuels.pdf.

[5] The results of two applied research projects provide examples from the digital preservation literature.  In 2002, the Agency to Research Project at the National Archives of Australia developed a conceptual model based on software performance as a measure of the effectiveness of digital preservation strategies. See: Heslop,  H., Davis, S., Wilson, A. (2002). “An approach to the preservation of digital records,” National Archives of Australia, 2002; in their 2008 JISC report, the authors proposed a composite view of software with the following four entities: package, version, variant, and download. See:  Matthew, B., McIlwrath, B., Giaretta, D., Conway, E. (2008).“The significant properties of software: A study,” https://epubs.stfc.ac.uk/manifestation/9506.

[6] Carlson, J. (2010). “The Data Curation Profiles toolkit: Interviewer’s manual,” http://dx.doi.org/10.5703/1288284315651.

[7]  Technical guidelines for digitizing archival materials for electronic access: Creation of production master files–raster images. (2005). Washington, D.C.: Digital Library Federation, https://lccn.loc.gov/2005015382/

[8] For a good overview of storage recommendations for magnetic tape, see: https://www.clir.org/pubs/reports/pub54/Download/pub54.pdf. To read more about the process of reformatting analog media, see: Pennington, S., and Rehberger D. (2012). The preservation of analog video through digitization. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/preservation-of-analog-video-through-digitization/.

[9] Moore, R. (2008). “Towards a theory of digital preservation”, International Journal of Digital Curation 3(1).

[10] Thinking about software as infrastructure provides a useful framing for envisioning strategies for curation.  Infrastructure perspectives advocate “adopting a long term rather than immediate timeframe and thinking about infrastructure not only in terms of human versus technological components but in terms of a set of interrelated social, organizational, and technical components or systems (whether the data will be shared, systems interoperable, standards proprietary, or maintenance and redesign factored in).”  See:  Bowker, G.C., Baker, K., Millerand, F. & Ribes, D. (2010). “Toward information infrastructure studies: Ways of knowing in a networked environment.” In J. Hunsinger, L. Klastrup, & M. All en (Eds.),International handbook of Internet research. Dordrecht; Springer, 97-117.

 

Guest Post: Building Trust: A Primer on Privacy for Librarians

August 6, 2017 Leave a comment

Margaret Purdy is a Graduate Research Intern in the Program on Information Science, researching the area of library privacy.


.

Building Trust: A Primer on Privacy for Librarians

Privacy Protections Build Mutual Trust Between Patrons and Librarians

Librarians have accepted privacy as a central tenet of their professional ethics and responsibilities for nearly eight decades. However, by 2017, privacy as a human right has been simultaneously strengthened and reaffirmed, defended and rebuffed, but rarely do we as librarians take the time to step away and ask why privacy truly matters and what we can do to protect it.

The American Library Association and the International Federation of Library Associations have both asserted that the patrons have the right to privacy while seeking information.1 The ALA in particular brings up the notion of privacy allowing for intellectual freedom – the ability to consume information and know they will not face repercussions such as punishment or judgments based on what they read. Librarians are in the business of disseminating information in order to stimulate knowledge growth. One major stimulus for such growth is the mutual trust between the library and the patron – trust that the patron will not use the knowledge in a destructive way, and trust that the library will not judge the patron for information interests. Ensuring patron privacy is one way for the library to prove that trust. Similarly, the IFLA2 emphasizes the right to privacy in its ethics documentation. In addition to the rights of patron privacy that the ALA ensures, the IFLA also allows for as much transparency as possible into “public bodies, private sector companies and all other institutions whose activities effect [sic] the lives of individuals and society as a whole.” This is yet another way to establish trust between the library and its patrons, ultimately ensuring intellectual freedom and growth of knowledge.

Globally, internet privacy and surveillance are also matters that are currently getting much more notice and debate, and government regulations, such as the EU General Protection of Public Data (GDPR)3, are working to strengthen individuals’ abilities to control their own data and ensure it does not end up being used against them. The GDPR is slated to go into effect in 2018 and will broadly protect the data privacy rights of EU citizens. It will certainly be a policy to watch, especially as a litmus for how effective major legislation can be in asserting privacy protections. Even more practically, however, is that the GDPR protects EU citizens even if the one collecting data is outside the EU. This will potentially affect many libraries across the United States and the world at large, as there is an added level of awareness required to ensure that any collaboration with or service to EU citizens is properly protected.

Libraries Face a Double-Barreled Threat from Government Surveillance and Corporate Tracking

In addition to the ALA and IFLA codes of ethics that ensure librarians work to ensure patrons’ rights to privacy, multiple governmental codes deal with the right to information privacy. In the United States, the fourth amendment protects the right to remain free from searches and seizures, and has often been cited as a protection of privacy. Similarly, federal legislation such as FERPA, which protects the privacy rights of students, and HIPAA, which protects medical records have reasserted that privacy is a vital right. Essentially every US state also has some provisions about privacy, many of which directly relate to the right to privacy in library records.4

However, in recent years, many of the federal government’s protections have begun to slip away. Immediately after 9/11, the USA PATRIOT Act passed, allowing the government much broader abilities to track patron library records. More recently, as digital information became easier to track, programs such as PRISM and other governmental tracking arose. Both of these government programs directly threaten the ability for library patrons to conduct research, information-seeking, and more in privacy.

Businesses have also learned ways of tracking their users’ behaviors online, and using that data for practices such as targeted advertising. While the vast majority of this data is encrypted and could not be easily brought back to personally-identifiable information, it is still personal data that is not necessarily kept in the most secure way possible. And while breaches do happen, even without them, it is not out of the question for an experienced party to be able to reconstruct an individual from the data collected, and to know not only that individual’s browsing history and location, but also potentially information such as health conditions, bank details, or other sensitive information.

While this information is often used for simple outreach, including Customer Relationship Marketing, where a company will recommend new products based on previous purchases, it can also be used in more invasive ways. In 2012, Target sent out a promotional mailing containing deals on baby products to a teenage girl.5 Based on their data they had tracked about her purchases, the algorithm had determined, correctly, that she was highly likely to be pregnant. While this story received extensive media attention, businesses of all types, including retailers, hotels, and even healthcare systems participate in similar practices, using data to personalize the experience. However, when stored irresponsibly, this data can lead to unintentional and unwanted sharing of information – potentially including embarrassing web browsing or shopping habits, dates that homes will be empty for thieves, medical conditions that could increase insurance rates, and more

Growing Public Concern

One of the most pressing risks to privacy protections currently is user behavior and expectations. With the information industry becoming much more digital, information is becoming easier to access, spread, and consume. However, the tradeoff is that users, and the information they view, is much easier to track, by both corporate and government entities, friendly or malicious. Plus, because much of the tracking and surrendering of privacy, including the ability to save passwords, CRM, targeted algorithms, and more, make it more convenient to browse the internet, many patrons willingly give up the right to privacy in favor of convenience.

A recent poll6 showed that between 70% and 80% of internet users are aware that practices such as saving passwords, agreeing to privacy policies and terms of use without reading them, and accepting free information in exchange for advertising or data surrendering is a risk to privacy. However, a large majority of users still participate in those practices. There are several theories as to what causes users to agree to forgo privacy, including the idea that the accepting the risks make browsing the internet much more convenient, and users are hesitant to give up that convenience. Another theory is that there really is no alternative to accepting the risks. Many sites will not allow use without acceptance of the terms of use and/or privacy policy. A 2008 study7 calculated how much time users would spend reading privacy policies were they to actually read all of them, and found that, on average, user would spend nearly two weeks a year just reading policies, not to mention the time taken to fully understand the legalese and complicated implications.

Another similar poll8 shows that more than half of Americans are concerned about privacy risks, and over 80% have taken some precautionary action. However, most of that 80% are unaware of more that they can do to protect themselves. This is true for both government surveillance and corporate tracking. The public has similar levels of awareness and concern about both, but are unaware of how to better protect themselves, and thus, are more likely to allow it to happen.

Best Practices for Librarians

 

Given the increasing public concern and awareness, as well as the longstanding history of librarians’ focus on privacy, librarians have a perfect opportunity to intervene and re-establish the trust from users that their information will not be shared and to meet the professional ethical model of always protecting privacy. There are nearly endless resources that can outline in great detail what librarians should do to defend their patrons against attacks on privacy, whether that comes from government surveillance or corporate tracking. Some of these involve systematic evaluations of all touchpoints in the library and recommendations for implementing best practices. These exist even for areas that do not seem like obvious ways for privacy to be violated, such as anti-theft surveillance on surrounding buildings, or through third-party content vendors.

By dedicating library resources to systematically check for privacy practices, librarians can take some of the burden of inconvenience off of the individual patron. Many of these best practices involve taking the time to change computer settings, read and understand privacy policies, and negotiate with vendors, which few, if any, individuals would do on their own. With the muscle of the library working on it, though, the patrons will still benefit, without needing to dedicate the same amount of time. This serves a dual function as well, as in addition to actual steps to protect patrons, librarians can also serve as an educational resource to help patrons learn simple steps to take to protect their personal systems.

Some examples of protectionary moves are to create policies on library computers that ensure that as little information from user sessions is saved. There are several incredibly simple steps that, while they reduce the convenience slightly, ensure users a safe and private experience. This includes, settings that clear cookies, the cache, and user details after each session (also known as “incognito mode”); or the clearing of patron checkout records once the book is returned.

In addition to those tweaks, the ALA and LITA offer checklists of privacy best practices to systematically implement in libraries. These cover everything from data exchanges, OPACs and patron borrowing records, protection for children, and more in great detail. NISO also provides overarching design principles for approaching library privacy in a digital age. Additionally, there are recommended security audits, many of which Bruce Shuman mentions in his book, Library Security and Safety Handbook: Prevention, Policies, and Procedures.

Additionally, the library, already known for educational programs and community-oriented programming could serve as a location to educate the public about the real risks of tracking and surveillance. There is a definite gap between the public’s awareness of the risks and the public’s action to mitigate those risks. While librarians cannot force behavior, and most would not want to, offering patrons trustworthy information about the risks and how to avoid them in their personal browsing experiences helps re-establish privacy as a core value and gives patrons a reason to trust the library. This recent post from Nate Lord at Digital Guardian offers simple and more in depth steps that patrons can take to ensure their digital information is secure. If a library offered some of these in a training course or as a takeaway, it could serve as a valuable resource in narrowing the gap between patron awareness and activity.

Ultimately, privacy is often one of those words that many people give lip service to, but without fully understanding the risks and consequences, the motivation to give up convenience in order to protect privacy is not always there. However, we as librarians, who value privacy as one of the professions’ core tenets have a real opportunity to help protect patrons’ data against these threats. Resources, such as the aforementioned privacy checklists and audit guides, exist to help librarians ensure their library is in compliance with the current best practices. The threats against privacy are growing, and librarians are well-suited to intervene and ensure patron protection.

Recommended Resources

 


References

1. ALA Code of Ethics. (1939). http://www.ala.org/advocacy/sites/ala.org.advocacy/files/content/proethics/codeofethics/Code%20of%20Ethics%20of%20the%20American%20Library%20Association.pdf

2. IFLA Code of Ethics. https://www.ifla.org/publications/node/11092

3. GDPR Portal (2016). http://www.eugdpr.org/

4. Adams, H. et. al. (2005). Privacy in the 21st century. Westport, Conn.: Libraries Unlimited.

5. Hill, K. (2012). How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did. Forbes.com. https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/#1bd0d38d6668

6. Ayala, D. (2017). Security and Privacy for Libraries in 2017. Online Searcher, 41(3).

7. Cranor, L. (2008). The Cost of Reading Privacy Policies. I/S: A Journal Of Law And Policy For The Information Society.

8. Rainie, L., & Rainie, L. (2017). The state of privacy in post-Snowden America. Pew Research Center. http://www.pewresearch.org/fact-tank/2016/09/21/the-state-of-privacy-in-america/

Categories: Uncategorized

Guest Post: Software as a Collection Object

July 18, 2017 Leave a comment

Alex Chassanoff is a CLIR/DLF Postdoctoral Fellow in the Program on Information Science and continues a series of posts on software curation.

As I described in my first post, an initial challenge at MIT Libraries was to align our research questions with the long-term collecting goals of the institution. As it happens, MIT Libraries had spent the last year working on a task force report to begin to formulate answers to just these sorts of questions. In short, the task force envisions MIT Libraries as a global platform for scholarly knowledge discovery, acquisition, and use. Such goals may at first appear lofty. However, the acquisition of knowledge through public access to resources has been a central organizing principle of libraries since their inception. In his opening statement at the first national conference of librarians in 1853, Charles Coffin Jewett proclaimed, “We meet to provide for the diffusion of a knowledge of good books and for enlarging the means of public access to them. [1]

Archivists and professionals working in special collections have long been focused on providing access to, and preservation of, local resources at their institutions. What is perhaps most unique about the past decade is the broadened institutional focus on locally-created content. This shift in perspective towards looking inwards is a trend noted by Lorcan Dempsey, who describes it thusly:

In the inside-out model, by contrast, the university, and the library, supports resources which may be unique to an institution, and the audience is both local and external. The institution’s unique intellectual products include archives and special collections, or newly generated research and learning materials (e-prints, research data, courseware, digital scholarly resources, etc.), or such things as expertise or researcher profiles. Often, the goal is to share these materials with potential users outside the institution. [2]

Arguably, this shift in emphasis can be attributed to the affordances of the contemporary networked research environment, which has broadened access to both resources and tools. Archival collections previously considered “hidden” have been made more accessible for historical research through digitization. Scholars are also able to ask new kinds of historical questions using aggregate data, and answer historical questions in new kinds of ways.

This begs the question – what unique and/or interesting content does an institution with a rich history of technology and innovation already have in our possession?

Exploring Software in MIT Collections

MIT has of course played a foundational role in the development and history of computing. Since the 1940s, the Institute has excelled in the creation and production of software and software-based artifacts. Project Whirlwind, Sketchpad, and Project MAC are just a few of the monumental research computing projects conducted here. As such, the Institute Archives & Special Collections has over time acquired a significant number of materials related to software developed at MIT.

In our quest to understand how software may be used (and made useful) as an institutional asset, we engaged in a two-pronged approach. First, we aimed to identify the types of software that MIT may consider providing access to What are the different functions and purposes that software at MIT is created used, and reused for? Second, we aimed to understand more about the active practices of researchers creating, using, and/or reusing software. We anticipated that this combined approach might help us develop a robust understanding of existing practices and potential user needs. At the same time, we recognized that identifying and exposing potential pain points could potentially guide and inform future curation strategies. After an initial period of exploratory work, we identified representative software cases found in various pockets across the MIT campus.

Collection #1: The JCR Licklider Papers and the GRAPPLE software

Materials in the collection were first acquired by the Institute for Special Archives and Collections in 1996. Licklider was a psychologist and renowned computer scientist who came to MIT in 1950. He is widely hailed as an influential figure for his visionary ideas around personal computing and human-computer interaction.

In my exploration of archival materials, I looked specifically at boxes 13-18 in the collection, which contained documentation about GRAPPLE, a dynamic graphical programming system developed while Licklider was at the MIT Laboratory for Computer Science. According to the user manual, the focus of GRAPPLE on “the development of a graphical form of a language that already exists as a symbolic programming language.” [3] Programs could be written using computer-generated icons and then monitored by an interpreter.

IMG_9315.JPG

Figure 1. Folder view, box 16, J.C.R. Licklider Papers, 1938-1995 (MC 499),

Institute Archives and Special Collections, MIT Libraries, Cambridge, Massachusetts.

Materials in the collection related to GRAPPLE include:

  • Printouts of GRAPPLE source code
  • GRAPPLE program description
  • GRAPPLE interim user manual
  • GRAPPLE user manual
  • GRAPPLE final technical report
  • Undated and unidentified computer tapes
  • Assorted correspondence between Licklider and the Department of Defense

Each of the documents has multiple versions included in the collection, typically distinguished by date and filename (where visible). The printouts of GRAPPLE source code totaled around forty pages. The computer tapes have not yet been formatted for access.

While the software may be cumbersome to access on existing media, the materials in the collection contain substantial amounts of useful information about the function and nature of software in the early 1980s. Considering the documentation related to GRAPPLE in different social contexts helped to illuminate the value of the collection in relationship to the history of early personal computing.

Historians of programming languages would likely be interested in studying the evolution of the coding syntax contained in the collection. The GRAPPLE team used the now-defunct programming language MDL (which stands for “More Datatypes than Lisp”); the extensive documentation provides examples of MDL “in action” through printouts of code packages.

mdl-code.jpg

Figure 2. Computer file printout, “eraser.mud.1”, 31 May 1983, box 14, J.C.R. Licklider Papers, 1938-1995 (MC 499), Institute Archives and Special Collections, MIT Libraries, Cambridge, Massachusetts.

The challenges facing the GRAPPLE team at the time of coding and development would be be interesting to revisit today. One obstacle to successful implementation that the team notes were the existing limitations of graphical display environments. In their final technical report on the project from 1984, the GRAPPLE team note the potential of desktop icons for identifying objects and their representational qualities.

Our conclusion is that icons have very significant potential advantages over symbols but that a large investment in learning is required of each person who would try to exploit the advantages fully. As a practical matter, symbols that people already know are going to win out in the short term over icons that people have to learn in applications that require more than a few hundred identifiers. Eventually, new generations of users will come along and learn iconic languages instead of or in addition to symbolic languages, and the intrinsic advantages of icons as identifiers (including even dynamic or kinematic icons) will be exploited. [4]

Despite technological advancement, some fundamental dynamics in human-computer interaction remain relatively unchanged; namely, the powerful relationship between representational symbols and the production of knowledge/knowledge structures. What might it look like to bring to life today software that was conceived in the early days of personal computing? Such aspirations are certainly possible. Consider the journey of the Apollo 11 source code, which was transcribed from digitized code printouts and then put onto Github. One can even simulate the Apollo missions using a virtual Apollo Guidance Control (AGC).

Other collection materials also offer interesting documentation of early conceptions of personal computing while also providing clear evidence that computer scientists such as Licklider regarded abstraction as an essential part of successful computer design. A pamphlet entitled “User Friendliness–And All That” notes the “problem” of mediating between “immediate end users” and “professional computer people” to successfully aid in a “reductionist understanding of computers.”

Figure 3. Pamphlet, “User friendliness-And All That”, undated, box 16, J.C.R. Licklider Papers, 1938-1995 (MC 499), Institute Archives and Special Collections, MIT Libraries, Cambridge, Massachusetts.

These descriptions are useful for illuminating how software was conceived and designed to be a functional abstraction. Such revelations may be particularly relevant in the current climate – where debates over algorithmic decision making are rampant. As the new media scholar Wendy Chun asks, “What is software if not the very effort of making something intangible visible, while at the same rendering the visible (such as the machine) invisible?” [5]

Reflections

Building capacity for collecting software as an institutional asset is difficult work. Expanding collecting strategies presents conceptual, social, and technical challenges that crystallize once scenarios for access and use are envisioned. For example, when is software considered an artifact ready to be “archived and made preservable”? What about research software developed and continually modified over the years in the course of ongoing departmental work? What about printouts of source code – is that software? How do code repositories like github fit into the picture? Should software only be considered as such its active state of execution? Interesting ontological questions surface when we consider the boundaries of software as a collection object.

Archivists and research libraries are poised to meet the challenges of collecting software. By exploring what makes software useful and meaningful in different contexts, we can more fully envision potential future access and use scenarios. Effectively characterizing software in its dual role as both artifact and active producer of artifacts remains an essential piece of understanding its complex value.

 

References:

[1] “Opening Address of the President.” Norton’s Literary Register And Book Buyers Almanac, Volume 2. New York: Charles B. Norton, 1854.

[2] Dempsey, Lorcan. “Library Collections in the Life of the User: Two Directions.” LIBER Quarterly 26, no. 4 (2016): 338–359. doi:http://doi.org/10.18352/lq.10170.

[3]  GRAPPLE Interim User Manual, 11 October 1981, box 14, J.C.R. Licklider Papers, 1938-1995 (MC 499), Institute Archives and Special Collections, MIT Libraries, Cambridge, Massachusetts.

[4] Licklider, J.C.R. Graphical Programming and Monitoring Final Technical Report, U.S. Government Printing Office, 1988, 17. http://www.dtic.mil/dtic/tr/fulltext/u2/a197342.pdf

[5] Chun, Wendy Hui Kyong. “On Software, or the Persistence of Visual Knowledge.” Grey Room 18 (Winter 2004): 26-51.

Utilizing VR and AR in the Library Space: Commentary on Matthew Bernhardt’s Program on Information Science Talk

June 27, 2017 Leave a comment

Matt Bernhardt is a web developer in the MIT libraries and a collaborator in our program. He presented this talk, entitled Reality Bytes – Utilizing VR and AR in The Library Space, as part of the Program on Information Science Brown Bag Series.

In his talk, illustrated by the slides below, Bernhardt reviews technologies newly available to libraries that enhance the human-computing interface:

Bernhardt abstracted his talk as follows:

Terms like “virtual reality” and “augmented reality” have existed for a long time. In recent years, thanks to products like Google Cardboard and games like Pokemon Go, an increasing number of people have gained first-hand experience with these once-exotic technologies. The MIT Libraries are no exception to this trend. The Program on Information Science has conducted enough experimentation that we would like to share what we have learned, and solicit ideas for further investigation.

Several themes run through Matt’s talk:

  • VR should be thought of broadly as an engrossing representation of physically mediated space. Such a definition encompasses not only VR, AR and ‘mixed-’ reality — but also virtual worlds like Second Life, and a range of games from first-person-shooters (e.g. Halo) to textual games that simulate physical space (e.g. “Zork”).
  • A variety of new technologies are now available at a price-point that is accessible for libraries and experimentation — including tools for rich information visualization (e.g. stereoscopic headsets), physical interactions (e.g. body-in-space tracking), and environmental sensing/scanning (e.g. Sense).
  • To avoid getting lost in technical choices, consider the ways in which technologies have the potential to enhance the user-interface experience, and the circumstances in which the costs and barriers to use are justified by potential gains. For example, expensive, bulky VR platforms may be most useful to simulate experiences that would in real life be expensive, dangerous, rare, or impossible.

A substantial part of the research agenda of the Program on Information Science is focused on developing theory and practice to make information discovery and use more inclusive and accessible to all. From my perspective, the talk above naturally raises questions about how the affordances of these new technologies may be applied in libraries to increase inclusion and access: How could VR-induced immersion be used to increase engagement and attention by conveying the sense of place of being in an historic archive? How could realistic avatars be used to enhance social communication, and lower the barriers to those seeking library instruction and reference? How could physical mechanisms for navigating information spaces, such as eye tracking, support seamless interaction with library collections, and enhance discovery?

For those interested in these and other topics, you may wish to read some of the blog posts and reports we have published in these areas. Further, we welcome collaboration from library staff and researchers who are interested in collaborating in research and practice. To support collaboration we offer access to fabrication, interface, and visualization technology through our lab.

Categories: Information Science