Home > Uncategorized > What’s new in managing confidential research data this year?

What’s new in managing confidential research data this year?

What’s new in managing confidential research data this year?

For MIT’s independent activities periods (IAP) the Program on Information Science regularly leads a practical workshop on managing confidential data.  This is in part a result of research through the Privacy Tools project.  As I was updating the workshop for this semester, I had an opportunity to reflect upon what’s new on the pragmatic side of managing confidential information.

Most notably, because of the publicity surrounding the NSA, more people (and in higher places) are paying attention.  (And as an information scientist I note that one benefit of the NSA scandal is that everyone now recognizes the term “metadata”).

Also, generally, personal information continues to become more available  and  increasingly easy to link information to individuals. New laws, regulations and policies  governing information privacy continue to emerge, increasing the complexity of management. . Trends in information collection and management — cloud storage, “big” data,  and debates about the right to limit access to published but personal information complicate data management, and make traditional approaches to managing confidential data decreasingly effective.

On the pragmatic side, new privacy laws continue to emerge at the state level. Probably the most notable is the California “right to be forgotten”  — for teens. This year California became the  the first state to pass a law  (“The Privacy Rights for California Minors in the Digital World”)  that gives (some) individuals the right to remove (some) content they have posted online.
The California law takes effect next year (Jan 1, 2015) — by which time we’re likely to see new information privacy initiatives in some other states. This year wa are also likely to see the release of specific  data sharing requirements from federal funders (as a result of the OSTP “Holdren Memo”, NIH’s big data to knowledge initiative, and related efforts); from journals and from professional societies. Farther off in the wings looms the possibility of a general right to be forgotten law in the EU; changes to how the “common rule” evaluates information risks and controls (on which subject the NAS recently issued a new set of recommendations); and possible “sectoral” privacy laws targeted at “revenge-porn”, “mug-shot” databases, mobile-phone data, or other issues-de-jour.
This creates an interesting tension and will require increasingly sophisticated approaches that can provide both privacy and  appropriate access.  From a policy point of view one possible way of setting this balance is by using “least restrictive terms” language — the OKF’s open economic principles may provide a viable approach.
In a purely operational sense — the biggest change in confidential data management for researchers is the wider availability of “safe-sharing” services for exchanging research data within remote collaborations:
  • On the do-it-yourself front. The increasing flexibility of the FISMA-certified Amazon Web Services  GovCloud makes running a remote, secure research computing environment easier and more economical. Although this still complex and expensive to maintain, and one still has to trust Amazon — although the FISMA certifications make that trust better justified.
  • The second widely used option — combining file-sharing services like DropBox with encrypted filesystems like TrueCrypt also received a boost this year, with the success of a crowdfunded effort to independently audit the TrueCrypt source. This is good news, and the transparency and verifiability of TrueCrypt is its big strength. The approach  remains limited  in practice to secure publishing of information — it doesn’t support simultaneous remote updates (not unless you like filesystem corruption); multiple keys for different users or portions of the filesystem; key distribution — etc.
  • A number of simpler solutions have emerged this year.
    – Bittorrent Sync provides “secure” P2P replication and sharing based on a secret private key.
    – SpiderOak Hive; Sync.com; and BoxCryptor all offer zero-knowledge cloud-storage, client-side encrypted data sharing. The ease of use and functionality of these systems for secure collaboration is very attractive compared to the other available solutions. BoxCryptor offers an especially wide a range of enterprise features such as  key distribution, revocation, master and group-key-chaining, and other enterprise features, that would make managing sharing among heterogenous groups easier. However, the big downside is the amount of “magic” in these systems. None are open source, nor are any sufficiently well documented (at least externally) or certified (no FISMA, there) to engender trust among us untrusting folk…  ( Although   SpiderOak in particular seems to have a good reputation for trustworthiness…  and the others no doubt have pure hearts, I’d rest easier with the ability to audit source codes, peer-reviewed algorithms, etc.)

For those interested in the meat of the course, which gives an overview of legal, policy, information technology/security, research design, and statistical pragmatics, the new slides are here:

Categories: Uncategorized
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: