Guest Post: Graduate Research Intern, Katherine Montgomery, on the inaugural CHI Science Jam

June 4, 2018 Leave a comment

Katie Montgomery is a Graduate Research Intern in the Program on Information Science, researching the areas of usability and accessibility.

 


by Katherine Montgomery

Research libraries are catalysts for interaction with and creation of knowledge. As information and interactions with it become increasingly digital, librarians are increasingly concerned with the way that computers and humans interact. [1]

FOOTNOTE: Footnote

The Computer Human Interface group of the ACM is a group of professionals devoted to studying these interactions. Their annual conference, CHI, is a place where people share the state of the art, and learn to use the state of the practice. CHI itself isn’t a standard library conference but it addresses many of the concerns of librarians in a broader context. For example, focal points include digital privacy (which libraries work to protect), improving UX in virtual and physical realms, gamifying learning interactions, and addressing the pitfalls of automation. The conference is also packed with people the library serves, i.e. academics.

A ‘jam’ or a ‘hackathon’ is distinguished by teams of relative strangers coming together to tackle specific problems in a focused and creative way within a limited time frame. The event fosters personal connections, concrete learning, pride in the product, and has the potential to generate real life changes. Libraries aim to nurture precisely these elements and would do well to look to hackathons and jams and adapt their structure to empower patrons. Here at the MIT libraries, we aim to create and inspire hacks in the great MIT tradition of using ingenuity and teamwork to create something remarkable.

Attending the Science Jam is a great way to start CHI, especially if you’re coming from a library background. The Science Jam enables you to interact with your prototypical patrons on problems that interest both of you and in a fashion that familiarizes you with patron needs. The Science Jam itself is a way to hack the conference. [2]

This is the first year they’ve run the program, and if you’ve never heard of a Science Jam before here’s the lowdown: it’s essentially a hackathon for scientists. You form teams, come up with a problem, pose a question, create a hypothesis, design a test, run the test, analyze your results, and present your study, all in 36 hours. About 60 people attended this year’s jam. We formed ten teams, broke into two rooms (so we could use each-other as test subjects the next day without contaminating our sample with knowledge of the study), and began the stimulating and occasionally frantic process. My team tackled privacy. Our initial problem? People share other people’s data without thinking about it or even realizing it. Our question was, how could we change this behavior? In order to create something testable we quickly honed the question to a much more specific issue and hypothesis. When people attend large conferences, or festivals, or concerts, or other public events they often take pictures that focus on a screen, or a float, or a stage, but include strangers in the foreground or to the sides. They then upload those pictures to their social media accounts where, even if they aren’t tagged, those strangers are vulnerable to facial analysis software and the eyes of the public. We hypothesized that if given cues that they are sharing the faces of strangers people might change their behavior by altering the photo to obscure those faces. Our initial hope was to create a digital interface but time and tech constraints limited us to a paper prototype. We took photographs which contained bystanders but were focussed on a different element, in this case a sign or a presenter with slides. We gave our participants the choice of selecting one of these photos to hypothetically upload to their social media account (we asked the participants to imagine that these were pictures they had taken). After selecting the photo they were presented with an upload interface with the option to go back and select another photo, crop the image, or upload the photo. However, these were given to three different groups with three additional caveats. The first group was given no textual cues as to the presence of potential bystanders in the photo (our control). The second group was given textual cues that there were potential bystanders in the picture, ie “this photo may contain two people, inside, standing up”. The third group was given visual cues that there were potential bystanders, ie blown up images of the faces beneath the main image. threefaces.png

These images were used with the express permission of the people they depict

For the most part, people uploaded the pictures anyway, not bothering to crop out the bystanders and not expressing concern for privacy in the follow-up questionnaire. The cues didn’t make a significant difference between behaviors, but we were surprised that such a technologically enlightened group didn’t take measures to protect people’s privacy more. Of course, our test group only contained 15 people (five per scenario), our prototype was on paper, and there were a number of other potential issues with our methodology, but the question and premise remain sound. How can we help people be aware of the fact that they may be violating other people’s privacy when uploading photographs to social media? And how do we help them alter that behavior?

The next day I attended a presentation given by Roberto Hoyle about his work testing the efficacy of various photo alterations in protecting privacy. Afterwards, we got to talking and posited an idea. What if Facebook added a feature to their image upload interface that asked a simple question: “Do you want to protect the privacy of the people you don’t know in this picture?”. If the person said yes then Facebook could auto-blur the faces it didn’t recognize as friends. The blur feature could be removed or modified, but it would bring the issue to the attention of the user and make it easy (and hopefully aesthetically pleasing, or at least acceptable), to obscure the faces of strangers.

While we agreed it was probably a moon shot I decided to go down to the exhibition hall and talk with the Facebook folks at their booth. I was met with a combination of skepticism and interest. Since then I’ve been in touch with a couple people at Facebook advocating for the idea. If your Facebook interface changes you’ll know it’s been a success. If not? Then the benefits are exclusively mine. Because of the Science Jam I had the opportunity to meet and work with people I would otherwise have never known, pursue meaningful ideas, improve my teamwork, practice scientific testing and analysis with a tight deadline, exercise my presentation skills, and make friends ahead of the conference itself. Libraries could benefit from implementing a similar model ahead of extended programming. Doing a week of events on graphic novels? Include a Cartoon Jam where people can come in, team up, generate ideas, produce some sketches and storylines, and share them with each other! Running a summer of gardening programs? Engage a couple of professionals in your area and encourage patrons to bring in photographs of their trouble gardens (lots of shade, rocky, hot, snow spill), form groups, hit the books, and pick each other’s minds for solutions. Trying to get the library more involved with the school letterpress? Collaborate with the experts there and run a Book Jam [3], challenging your students to connect e-readers and the early practice of printing. There are any number of ways that Libraries can take advantage of the Jam/hackathon model to engage their patrons and further the goal of becoming hubs for creation, not just consumption.

Excited? Inspired? Ready to work up a plan for your own hackathon or Jam? Take a look at the resources below and get cooking.

Notes:

  1. Current research in the Program on Information Science focuses on how measures of attention and emotion could be integrated into these interactions.  
  2. CHI will be in beautiful Scotland next year. Attend the Science Jam. You won’t regret it.  Oh, and if you want to check out some of the documentation from this year’s Science Jam take a look at #ScienceJam #CHI2018 on Twitter.
  3. The very cool Codex Hackathon is already taken
Categories: Uncategorized

Crosspost: How Big Data Challenges Privacy, and How Science Can Help

May 21, 2018 Leave a comment

This originally appeared in the Washington DC 100 May 8th edition . It was co-written with Alexandra Wood, at the Berkman Klein Center, and briefly summarizes our joint paper:

Micah Altman, Alexandra Wood, David R. O’Brien, and Urs Gasser, “Practical approaches to big data privacy over time,” International Data Privacy Law, Vol. 8, No. 1 (2018), https://doi.org/10.1093/idpl/ipx027.


 

The collection of personal information has become broader and more threatening than anyone could have imagined. Our research finds traditional approaches to safeguarding privacy are stretched to the limit as thousands of data points are collected about us every day and maintained indefinitely by a host of technology platforms.

We can do better. Privacy is not the inevitable price of technology. Computer science research provides new methods that protect privacy much more effectively than traditional approaches.

And research practices in health and social sciences show that it possible to strike a good balance between individual privacy and beneficial public knowledge.

 

Categories: Uncategorized

Guest Post: Graduate Research Intern, Ada van Tine, on Libraries & Neurodiversity

December 22, 2017 Leave a comment

Ada van Tine is a Graduate Research Intern in the Program on Information Science, researching the area of library privacy.

 


Our Libraries and Neurodiversity

By Ada van Tine

Andover-Harvard Theological Library Stacks by Ada van Tine

It is a quiet day the library where you work, you find it peaceful. But that is not the case for everyone. One of your patrons, Anna, is an 18 year old woman who falls on the autism spectrum. She needs to do research for her college final paper on W.E.B. Du Bois. She lives with her parents nearby the school and library, but their house is noisy and full of visiting relatives right now. However Anna doesn’t consider the library to be a calm alternative and is very nervous about going to the library because the fluorescent lights highly irritate her, their buzzing endlessly permeating her brain, causing nausea. To cope with this she often does repetitive movements with her hands. In the past, librarians and other patrons have been really awkward with her because of her hand movements and reaction to the lights. But she really needs to get these books for her paper, what will you do as a librarian to help this patron meet her needs? For individuals who are members of a neurominority, libraries can be extremely stressful, upsetting, and in the worst cases traumatic.

In libraries, we understand that we need to accommodate people who are different, but the problem is that sometimes we are not aware of who we might be failing to serve and why. If Anna gives feedback about the library in a suggestion box, the you might well schedule a replacement of the fluorescent lights as part of the library’s renovations. That is a small step toward progress, however we should not wait around for an invitation to make our libraries more bearable, leaving the chance that some patrons might be suffering in silence in the meantime. Librarians need to be radically proactive so as not to make their spaces only welcoming to the part of the population with neurotypical leanings. The solution, however, is not merely a focus on those who are “different” and need some kind of special accommodation.

Rather, the researchers and advocates who talk about neurodiversity now stress that neurodiversity is “the idea that neurological differences like autism and ADHD are the result of normal, natural variation in the human genome.” (Robinson, What is Neurodiversity?) Simply said: all humans fall on neurological spectra of traits, and all of us have our own variances from the norm. For each person in the world there exists a different way of perceiving and interacting with other people and information. For instance, people with dyslexia, people with autism, people with ADHD, and people who have not had a good night’s sleep all perceive the world and the library differently. The concept of Neurodiversity is another way to recognize that.

Furthermore, new research is continually helping us to evolve our ideas about neurodiversity. Therefore, libraries should stay abreast of advancements in technology for the neurodiverse population because they will benefit every patron. “Actively engaging with neurodiversity is not a question of favoring particular personal or political beliefs; rather, such engagement is an extension of librarians’ professional duties insofar as it enables the provision of equitable information services” (Lawrence, Loud Hands in the Library, 106-107). Librarians are called through the ALA Core Values of Access and Diversity to make all information equitably available to all patrons. To not recognize the existence of neurodiversity would be to ignore a segment of the whole society which we are called to serve.

There are immediate ways that your library can better serve a larger portion of the neurodiverse population. For example, below are some relatively low cost interventions:

  • For dyslexic individuals have a small reading screen available. esearch has shown that those with dyslexia can read more easily and quickly off of smaller screens with small amounts of text per page (Schneps).
  • Audiobooks, text-to-speech, and devices that can show text in a color gradient also help dyslexic patrons with their information needs.
  • For people who are on the autism spectrum replace the older fluorescent lights in the library, and don’t focus solely on open collaborative spaces in the library layout (Lawrence, Loud Hands, 105). Also train yourself and your employees to recognize and know how to react properly with autistic individuals who may express non verbal body language such as repetitive movements (Lawrence, Loud Hands, 105).
  • For people with ADHD, have quiet private rooms available so they can better concentrate at the library as well as audio books and text-to-speech programs so that they can listen to their research and reading while doing other things (Hills, Campbell, 462).
  • Train staff to never touch a person who is on the autism spectrum without their explicit permission, be aware of their sensory needs and hold the interview in a quiet place with no background noise such as an office fountain, and with no fluorescent lights. Some people on the autism spectrum are also smell sensitive, so notify staff to refrain from wearing perfume. (http://www.autismacceptancemonth.com/wp-content/uploads/2014/03/AAM-Sensory-Accomodations.pdf)

New technologies and findings in cognitive science are being developed to better adapt to those individuals who are members of a neurominority. For example, a new reading program is being developed by Dr. Matthew Schneps that combines a reading acceleration program with compressed text-to-speech and visual modifications which has so far proven to drastically increase the speed of dyslexic and non dyslexic readers alike (Shneps). There are many studies on the ways in which modern technology can be used to better communicate with and educate autistic students. The future is hopeful.

Addressing neurodiversity in our libraries and in our societies is not a solved problem. For example there is research and development being done to reframe digital programs to be viewed as an ever growing ecosystem, never in stasis, so that they may better adapt to every user’s need as well as be transparent about the metadata of programs so that users can know which parts of the system are enabling or disabling their assistive technology (Johnson, 4). There are many steps to take that can help make the library more friendly to a neuro diverse audience, but the most important thing to keep in mind is that we must all plan to change and adapt now and over time to make our society a better, more liveable place for everyone. So that maybe when Anna comes to research the library and staff will be prepared to be a little more welcoming than she expected, and maybe she’ll even want to come back.

What to do next:

 

You may feel overwhelmed by the vast and complicated nature of this important task. The first step is always to educate yourself and get a grounding in basic literature about a subject. Many resources are included in the next section to aid in this discovery process.

You may wish to start off by learning about neurodiversity in general (What is Neurodiversity?,Definition of Neurodiversity). If you’ve identified a specific population need in your community — you may want to dig in deeper with resources specific to that neurominority, here are a few. (Autism Spectrum, ADHD, Dyslexia).

There are some good books and articles specifically about neurodiversity and libraries included in the resources. (Library Services for Youth with Autism Spectrum Disorders, Programming for Children and Teens with Autism Spectrum Disorder,

Loud Hands in the Library, Neurodiversity in the Library).

As it turns out, there is a lack of literature relating to best practices and programming in libraries in reference to neurodiversity. However, to understand and engage with this topic and community librarians should consider attending events and workshops — a number held by advocacy and research organizations are included below. (ADHD, Dyslexia, The A11Y project, International Society for Augmentative and Alternative Communication, The Center for AAC and Autism).

 

Resources

Reference List

The American Association of People with Disabilities. Retrieved from http://www.aapd.com/about/

Autistic Self Advocacy Network. Retrieved from http://autisticadvocacy.org/

The A11Y project. Retrieved from https://a11yproject.com/about

Campbell, I., Hills, K. (2011). College Programs and Services. In M. DeVries, S. Goldstein, & J. Naglieri (Eds), Learning and Attention Disorders in Adolesence and Adulthood (457-466). Hoboken, New Jersey: John Wiley & Sons, Inc.

The Center for AAC and Autism. Retrieved from https://www.aacandautism.com/

Children and Adults with Hyperactive Attention Deficit/Hyperactivity Disorder. Retrieved from http://www.chadd.org/

Eng, A. (2017). Neurodiversity in the Library: One Librarian’s Experience. In The Library With The Lead Pipe, 1.

http://ezproxy.simmons.edu:2048/login?url=https://search-ebscohost-com.ezproxy.simmons.edu/login.aspx?direct=true&db=lls&AN=124086508&site=ehost-live&scope=site

Farmer, L. S. J. (2013). Library Services for Youth with Autism Spectrum Disorder. Chicago: American Library Association.

How Educators Can Help Autistic People by Sensory Accommodations. Retrieved from http://www.autismacceptancemonth.com/wp-content/uploads/2014/03/AAM-Sensory-Accomodations.pdf

International Dyslexia Association. Retrieved from https://dyslexiaida.org

International Society for Augmentative and Alternative Communication. Retrieved from https://www.isaac-online.org/english/about-isaac/

Johnson, Rick. (2017, Sept 25). Accessibility: Ensuring that Edtech Systems Work Together to Serve All Students. Educause Review. Retrieved from https://er.educause.edu/articles/2017/9/accessibility-ensuring-that-edtech-systems-work-together-to-serve-all-students

 

Klipper, B. (2014). Programming for Children and Teens with Autism Spectrum Disorder. Chicago: American Library Association.

Lawrence, E. (2013). Loud Hands in the Library. Progressive Librarian, (41), 98-109. http://ezproxy.simmons.edu:2048/login?url=https://search.ebscohost.com/login.aspx?direct=true&db=lls&AN=91942766&site=ehost-live&scope=site

Neurodiversity. Retrieved from http://www.autismacceptancemonth.com/resources/101-3/autism-acceptance/neurodiversity/

Ploog, B. O., Scharf, A., Nelson, D., & Brooks, P. J. (2013). Use of computer-assisted technologies (CAT) to enhance social, communicative, and language development in children with autism spectrum disorders. Journal Of Autism And Developmental Disorders, (2), 301. doi:10.1007/sl0803-012-1571-3

Robison, John Elder. (2013, Oct 7). What is Neurodiversity? Psychology Today. Retrieved from https://www.psychologytoday.com/blog/my-life-aspergers/201310/what-is-neurodiversity

Schneps, Matthew H. (2015). Using Technology to Break the Speed Barrier of Reading. Scientific American. Retrieved from https://www.scientificamerican.com/article/using-technology-to-break-the-speed-barrier-of-reading/

Categories: Uncategorized

A History of the Internet : Commentary on Scott Bradner’s Program on Information Science Talk

December 1, 2017 Leave a comment

A History of the Internet : Commentary on Scott Bradner’s Program on Information Science Talk

Scott Bradner is a Berkman Center affiliate who worked for 50 at Harvard in the areas of computer programming, system management, networking, IT security, and identity management. Scott Bradner was involved in the design, operation and use of data networks at Harvard University since the early days of the ARPANET and served in many leadership roles in the IETF. He presented the talk recorded below, entitled, A History of the Internet — as part of Program on Information Science Brown Bag Series:

Bradner abstracted his talk as follows:

In a way the Russians caused the Internet. This talk will describe how that happened (hint it was not actually the Bomb) and follow the path that has led to the current Internet of (unpatchable) Things (the IoT) and the Surveillance Economy.

The talk contained a rich array of historical details — far too many to summarize here. Much more detail on these projects can be found in the slides and video above; from his publications, and from his IETF talks. (And for those interested in recent Program on Information Science research on related issues of open information governance, see our published reports.)

Bradner describes how the space race, exemplified by the launch of Sputnik, spurred national investments in research and technology — and how the arms race created the need for a communication network that was decentralized and robust enough to survive a nuclear first-strike.

Bradner argues that the internet has been a parent revolution, in part because of its end-to-end design. The internet as a whole was designed so that most of the “intelligence” is encapsulated at host endpoints, connected by a “stupid” network carrier that just transports packets. As a result, Bradner argues, the carrier cannot own the customer, which, critically, enables customers to innovate without permission.

ARPANET, as originally conceived, was focused on solving what was then a grand challenge in digital communications research: To develop techniques and obtain experience on interconnecting computers in such a way that a very broad class of interactions are possible, and to improve and increase computer research productivity through resource sharing.

Bradner argues that the internet succeeded because, despite the scope of the problem, solutions were allowed to evolve chaotically: ARPA was successful in innovating because it required no peer review. The large incumbent corporations in the computing and networking field ignored internet because they believed it couldn’t succeed (and they believed it couldn’t succeed because its design did not allow for the level of control and reliability that the incumbents believed to be necessary to making communications work). And since the Internet was was viewed as irrelevant, there were no efforts to regulate it. It was not until after the Internet achieved success, and catalyzed disruptive innovation that policymakers deemed it, “too important to leave to the people that know how it works.”

Our upcoming Summit supported by a generous grant from the Mellon Foundation, will probe for grand challenge questions in scholarly discovery, digital curation and preservation, and open scholarship. Is it possible that the ideas that could catalyze innovation in these areas are, like the early Internet, currently viewed as impractical or irrelevant? .

Categories: Uncategorized

Safety Nets (for information): Commentary on Jefferson Bailey’s Program on Information Science Talk

November 7, 2017 Leave a comment

Jefferson Bailey is Director of Web Archiving at Internet Archive. Jefferson joined Internet Archive in Summer 2014 and manages Internet Archive’s web archiving services including Archive-It, used by over 500 institutions to preserve the web. He also oversees contract and domain-scale web archiving services for national libraries and archives around the world. He works closely with partner institutions on collaborative technology development, digital preservation, data research services, educational partnerships, and other programs. He presented the talk recorded below, entitled, Safety Nets: Rescue And Revival For Endangered Born-digital Records — as part of Program on Information Science Brown Bag Series:

Bailey abstracted his talk as follows:

The web is now firmly established as the primary communication and publication platform for sharing and accessing social and cultural materials. This networked world has created both opportunities and pitfalls for libraries and archives in their mission to preserve and provide ongoing access to knowledge. How can the affordances of the web be leveraged to drastically extend the plurality of representation in the archive? What challenges are imposed by the intrinsic ephemerality and mutability of online information? What methodological reorientations are demanded by the scale and dynamism of machine-generated cultural artifacts? This talk will explore the interplay of the web, contemporary historical records, and the programs, technologies, and approaches by which libraries and archives are working to extend their mission to preserve and provide access to the evidence of human activity in a world distinguished by the ubiquity of born-digital materials.

Bailey eloquently stated the importance of web archiving: “No future scholarship can study our era without considering materials published (only) on the web.” Further, he emphasized the importance of web archiving for social justice: Traditional archives disproportionately reflect social architectures of power, and the lived experiences of the advantaged. Web crawls capture a much broader (although not nearly complete) picture of the human experience.

The talk ranged over an impressively wide portfolio of initiatives — far too many to do justice discussing in a single blog post. Much more detail on these projects can be found in the slides and video above, Bailey’s professional writings, the Archive blog, and experiments page, and archive-it blog for some insights into these.

A unified argument ran through the Bailey’s presentation. At the risk of oversimplifying, I’ll restate the premises of the argument here:

  1. Understanding our era will require research, using large portions of the web, linked across time.
  2. The web is big — but not too big to collect (a substantial portion of) it.FOOTNOTE: Footnote
  3. Providing simple access (e.g. retrieval, linking) is more expansive than collection;
    enabling discovery (e.g. search) is much harder than simple access;
    and supporting computational research (which requires analysis at web-scale, and over time) —
    is much, much harder than discovery.
  4. Research libraries should help with this (hardest) part.

I find the first three parts of the argument largely convincing. Increasingly, new discoveries in social science are based on analysis of massive collections of data that areis generated as a result of people’s public communications, and depends on tracing these actions and their consequences over time. The Internet Archive’s success to date establishes that much of these public communications can be collected and retained over time. And the history of database design (as well as my and my colleagues experiences in archiving and digital libraries) testifies to the challenges of effective discovery and access at scale.

I hope that we, as research libraries, will be step up to the challenges of enabling large-scale, long-term research over content such as this. Research libraries already have a stake in this problem because most of the the core ideas and fundamental methods (although not the operational platforms) for analysis of data at this scale comes from research institutions with which we are affiliated. Moreover if libraries lead the design of these platforms, participation in research will be far more open and equitable than if these platforms are ceded entirely to commercial actors.

For this among other reasons, we are convening a Summit on Grand Challenges in Information Science & Scholarly Communication, supported by a generous grant from the Mellon Foundation. During this summit we develop community research agendas in the areas of scholarly discovery at scale; digital curation and preservation; and open scholarship. For those interested in these questions and related areas of interest, we have published Program on Information Science reports and blog posts on some of the challenges of digital preservation at scale.

Categories: Uncategorized

Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Information Science Talk

October 6, 2017 Leave a comment

Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Information Science Talk

Cassidy Sugimoto is Associate Professor in the School of Informatics and Computing, Indiana University Bloomington, where researches within the domain of scholarly communication and scientometrics, examining the formal and informal ways in which knowledge producers consume and disseminate scholarship. She presented this talk, entitled Labor And Reward In Science: Do Women Have An Equal Voice In Scholarly Communication? A Brown Bag With Cassidy Sugimoto, as part of the Program on Information Science Brown Bag Series.

In her talk, illustrated by the slides below, Sugimoto highlights the roots of gender disparities in science.

 

Sugimoto abstracted her talk as follows:

Despite progress, gender disparities in science persist. Women remain underrepresented in the scientific workforce and under rewarded for their contributions. This talk will examine multiple layers of gender disparities in science, triangulating data from scientometrics, surveys, and social media to provide a broader perspective on the gendered nature of scientific communication. The extent of gender disparities and the ways in which new media are changing these patterns will be discussed. The talk will end with a discussion of interventions, with a particular focus on the roles of libraries, publishers, and other actors in the scholarly ecosystem..

In her talk, Sugimoto stressed a number of patterns in scientific publication:

  • Demise of single authorshop complicates notions of credit, rewards, labor, and responsibility
  • There are distincted patterns of gender disparity in scientific publications: Male-authored publications predominate in most field (with a few exceptions such as Library Science); women collaborating more domestically than internationally on publication; and woman-authored publications tend to be cited less (even within the same tier of journals).
  • Looking across categories of contribution — the most isolated is performing the experiment. And Women are most likely to fill this role. Further, if we look across male-and-female led teams, we see that the distribution of work across these teams varies dramatically.
  • When surveying teams — women tended to value all of the forms of contributions more than men with one exception. Women judge technical work, which is more likely to be conducted by women, as less valuable.
  • Composition of authorship has consequences for what is studied. Womens’ research focuses more often than men on areas relevant to both genders or to women.

Sugimoto notes that these findings are consistent with pervasive gender discrimination. Further, women as well as men frequently discriminate against other women — for example, in evaluation of professionalism, evaluation of work, and in salary offers

Much more detail on these points can be found in Sugimoto professional writings.

Sugimoto’s talk drew on a variety of sources: publication data in the Web of Science; from acknowledgement and authorship statements in PLOS journals. Open bibliometric data, such as that produced by PLOS, the Initiative for Open Citation, and various badging initiatives can help us to more readily bring disparities to light.

At the conclusion of her talk, Sugimoto suggested the following roles for librarians:

Sugimoto’s talk drew on a variety of sources: publication data in the Web of Science; from acknowledgement and authorship statements in PLOS journals. Open bibliometric data, such as that produced by PLOS, the Initiative for Open Citation, and various badging initiatives can help us to more readily bring disparities to light.

  • Use and promote open access in training sessions
  • Provide programming that lessens barriers to participation for women and minorities
  • Advocate for contributorship models which recognize the diversity of knowledge production
  • Approach new metrics with productive skepticism
  • Encourage engagement between students and scholars
  • Evaluate and contribute to the development of new tools

Reflecting the themes of Sugimato’s talk, the research we conduct here, in the Program on Information Science is strongly motivated by issues of diversity and inclusion — particularly on approaches to bias-reducing systems design. Our previous work in participative mapping aimed at increasing broad public participation in electoral processes. Our current NSF-supported work in educational research focuses on using eye-tracking and other biological signals to track fine-grained learning across populations of neurologically diverse learners. And, under a recently-awarded IMLS award, we will be hosting a workshop to develop principles for supporting diversity and inclusion through information architecture in information systems. For those interested in these and other projects, we have published blog posts and reports in these areas.

Categories: Uncategorized

Guest Post: Towards Strategies for Making Legacy Software Curation-Ready

September 27, 2017 Leave a comment

Alex Chassanoff is a CLIR/DLF Postdoctoral Fellow in the Program on Information Science and continues a series of posts on software curation.

In this blog post, I am going to reflect upon potential strategies that institutions can adopt for making legacy software curation-ready.  The notion of “curation-ready” was first articulated as part of the “Curation Ready Working Group”, which formed in 2016 as part of the newly emerging Software Preservation Network (SPN).  The goal of the group was to “articulate a set of characteristics of curation-ready software, as well as activities and responsibilities of various stakeholders in addressing those characteristics, across a variety of different scenarios”[1].  Drawing on inventory at our own institutions, the working group explored different strategies and criteria that would make software “curation-ready” for representative use cases.  In my use case, I looked specifically at the GRAPPLE software program and wrote about particular use and users for the materials.

This work complements the ongoing research I’ve been doing as a Software Curation Fellow at MIT Libraries [2] to envision curation strategies for software.  Over the past six months, I have conducted an informal assessment of representative types of software in an effort to identify baseline characteristics of materials, including functions and uses.

Below, I briefly characterize the state of legacy software at MIT.

  • Legacy software often exists among hybrid collections of materials, and can be spread across different domains.
  • Different components(e.g., software dependencies, hardware) may or may not be co-located.
  • Legacy software may or may not be accessible on original media. Materials are stored in various locations, ranging from climate-controlled storage to departmental closets.
  • Legacy software may exist in multiple states with multiple contributors over multiple years.
  • Different entities (e.g., MIT Museum, Computer Science and Artificial Intelligence Laboratory, Institute Archives & Special Collections) may have administrative purview over legacy software with no centralized inventory available.
  • Collected materials may contain multiple versions of source code housed in different formats (e.g., paper print outs, on multiple diskettes) and may or may not consist of user manuals, requirements definitions, data dictionaries, etc.
  • Legacy software has a wide range of possible scholarly use and users for materials. These may include the following: research on institutional histories (e.g., government-funded academic computing research programs), biographies (e.g., notable developers and/or contributors of software),  socio-technical inquiries (e.g., extinct programming languages, implementation of novel algorithms), and educational endeavors (e.g., reconstruction of software).

We define curation-ready legacy software as having the following characteristics: being discoverable, usable/reusable, interpretable, citable, and accessible.  Our approach views curation as an active, nonlinear, iterative process undertaken throughout the life (and lives) of a software artifact.

Steps to increase curation-readiness for legacy software

Below, I briefly describe some of the strategies we are exploring as potential steps in making legacy software curation-ready.  Each of these strategies should be treated as suggestive rather than prescriptive at this stage in our exploration.

Identify appraisal criteria. Establishing appraisal criteria is an important first step that can be used to guide decisions about selection of relevant materials for long-term access and retention. As David Bearman writes, “Framing a software collecting policy begins with the definition of a schema which adequately depicts the universe of software in which the collection is to be a subset.”[3]  It is important to note that for legacy software, determining appraisal criteria will necessarily involve making decisions about both the level of access and preservation desired.  Decision-making should be guided by an institutional understanding of what constitutes a fully-formed collection object. In other words, what components of software should be made accessible? What will be preserved? Does the software need to be executable? What levels of risk assessment should be conducted throughout the lifecycle?  Making these decisions institutionally will in turn help guide the identification of appropriate preservation strategies (e.g., emulation, migration, etc) based on desired outcomes.

Identify, assemble, and document relevant materials. A significant challenge with legacy software lies in the assembling of relevant materials to provide necessary context for meaningful access and use.  Locating and inventorying related materials (e.g., memos, technical requirements, user manuals) is an initial starting point. In some cases, meaningful materials may be spread across the web at different locations.  While it remains a controversial method in archival practice, documentation strategy may provide useful framing guidance on principles of documentation [4].

Identify stakeholders. Identifying the various stakeholders, either inside or outside of the institution, can help ensure proper transfer and long-term care of materials, along with managing potential rights issues where applicable.  Here we draw on Carlson’s work developing the Data Curation Profile Toolkit and define stakeholders as any group, organizations, individuals or others having an investment in the software, that you would feel the need to consult regarding access, care, use, and reuse of the software[5].

Describe and catalog materials. Curation-readiness can be increased by thoroughly describing and cataloging select materials, with an emphasis on preserving relationships among entities. In some cases, this may consist of describing aspects of the computing environment and relationships between hardware, software, dependencies, and/or versions. Although the software itself may not be accessible, describing related materials (i.e., printouts of source code, technical requirements documentation) adequately can provide important points of access. It may be useful to consider the different conceptual models of software that have been developed in the digital preservation literature and decide which perspective aligns best with your institutional needs [6].

Digitize and OCR paper materials. Paper printouts of source code and related documentation can be digitized according to established best practice workflows[7].  The use of optical character recognition (OCR) programs produces machine-readable output, enabling easy indexing of content to enhance discoverability and/or textual transcriptions.  The latter option can make historical source code more portable for use in simulations or reconstructions of software.

Migrate media. Legacy software often reside on unstable media such as floppy disks or magnetic tape. In cases where access to the software itself is desirable, migrating and/or extracting media contents (where possible) to a more stable medium is recommended [8].

Reflections

As an active practice, software curation means anticipating future use and uses of resources from the past. Recalling an earlier blog post, our research aims to produce software curation strategies that embrace Reagan Moore’s theoretical view of digital preservation, whereby “information generated in the past is sent into the future”[9]. As the born-digital record increases in scope and volume, libraries will necessarily have to address significant changes in the ways in which we use and make use of new kinds of resources.  Technological quandaries of storage and access will likely prove less burdensome than the social, cultural, and organizational challenges of adapting to new forms of knowledge-making. Legacy software represents this problem space for libraries/archives today.  Devising curation strategies for software helps us to learn more about how knowledge-embedded practices are changing and gives us new opportunities for building healthy infrastructures [10].

References

[1] Rios, F., Almas, B., Contaxis, N., Jabloner, P., Kelly, H.. (2017). Exploring curation-ready software: use cases. doi:10.17605/OSF.IO/8RZ9E

[2] These are some of the open research questions being addressed by the initial cohort of CLIR/DLF Software Curation Fellows in different institutions across the country.

[3] Bearman, D. (1985). Collecting software: a new challenge for archives & museums. Archives & Museum Informatics, Pittsburgh, PA.

[4] Documentation strategy approaches archival practice as a collaborative work among record creators, archivists, and users.  It often traverses institutions and represents an alternative approach by prompting extensive documentation organized around an “ongoing issue or activity or geographic area.” See:  Samuels, H. (1991). “Improving our disposition: Documentation strategy,” Archivaria 33, http://web.utk.edu/~lbronsta/Samuels.pdf.

[5] The results of two applied research projects provide examples from the digital preservation literature.  In 2002, the Agency to Research Project at the National Archives of Australia developed a conceptual model based on software performance as a measure of the effectiveness of digital preservation strategies. See: Heslop,  H., Davis, S., Wilson, A. (2002). “An approach to the preservation of digital records,” National Archives of Australia, 2002; in their 2008 JISC report, the authors proposed a composite view of software with the following four entities: package, version, variant, and download. See:  Matthew, B., McIlwrath, B., Giaretta, D., Conway, E. (2008).“The significant properties of software: A study,” https://epubs.stfc.ac.uk/manifestation/9506.

[6] Carlson, J. (2010). “The Data Curation Profiles toolkit: Interviewer’s manual,” http://dx.doi.org/10.5703/1288284315651.

[7]  Technical guidelines for digitizing archival materials for electronic access: Creation of production master files–raster images. (2005). Washington, D.C.: Digital Library Federation, https://lccn.loc.gov/2005015382/

[8] For a good overview of storage recommendations for magnetic tape, see: https://www.clir.org/pubs/reports/pub54/Download/pub54.pdf. To read more about the process of reformatting analog media, see: Pennington, S., and Rehberger D. (2012). The preservation of analog video through digitization. In D. Boyd, S. Cohen, B. Rakerd, & D. Rehberger (Eds.), Oral history in the digital age. Institute of Library and Museum Services. Retrieved from http://ohda.matrix.msu.edu/2012/06/preservation-of-analog-video-through-digitization/.

[9] Moore, R. (2008). “Towards a theory of digital preservation”, International Journal of Digital Curation 3(1).

[10] Thinking about software as infrastructure provides a useful framing for envisioning strategies for curation.  Infrastructure perspectives advocate “adopting a long term rather than immediate timeframe and thinking about infrastructure not only in terms of human versus technological components but in terms of a set of interrelated social, organizational, and technical components or systems (whether the data will be shared, systems interoperable, standards proprietary, or maintenance and redesign factored in).”  See:  Bowker, G.C., Baker, K., Millerand, F. & Ribes, D. (2010). “Toward information infrastructure studies: Ways of knowing in a networked environment.” In J. Hunsinger, L. Klastrup, & M. All en (Eds.),International handbook of Internet research. Dordrecht; Springer, 97-117.