Educause Security Discussion mailing list archives
Re: NSF Data Management Plans
From: Joe St Sauver <joe () OREGON UOREGON EDU>
Date: Thu, 5 Aug 2010 14:31:33 -0700
Steve Brukbacher <sab2 () UWM EDU> mentioned (in connection with new required NSF data sharing plans): #The changes are designed to address trends and needs in the modern era #of data-driven science. "Science is becoming data-intensive and #collaborative," notes Ed Seidel, acting assistant director for NSF's #Mathematical and Physical Sciences directorate. "Researchers from #numerous disciplines need to work together to attack complex problems; #openly sharing data will pave the way for researchers to communicate and #collaborate more effectively." # #We're looking at how to assist researchers with this. Has anyone #established any security strategies related to this new requirement? I could see a number of different aspects to that question. Are you primarily concerned about: -- research dataset provenence and integrity (in a nutshell, how do I know that the dataset I think I just retrieved is the one that I think it is, and that it hasn't been accidentally or intentionally altered since it was created? Yes, you could checksum the file, but is that enough?) -- dataset documentation (getting a dump of a dataset doesn't do much good if you don't know how the data was collected and coded, including any inherent limitations to the data, etc. -- in the bad old days when I was providing statistical research support for faculty members and grad students, I have to admit that I occaisionally saw datasets received from offsite that suffered from woefully incomplete and insufficient documentation (presumably because some researchers are like bad programmers, deferring documentation until they "get a couple of minutes," only to never have that quiet time actually turn up). -- or was it more a matter of insuring you simultaneously protect any sensitive data elements (such as human subjects data) while also meeting the NSF's new open access requirements (e.g., questions about how to handle data anonymization schemes, access control and logging, or related sorts of things)? -- other sites might be interested in monitoring data assets for abuse and misuse (conceptually imagine a dataset released for non-profit research use (only), which subsequently gets commercially exploited without permission) -- obviously it can be tricky to find and prove these sort of things, although reportedly some information providers have been known to "salt" things like maps with harmless but non-existent features they've made up -- if you have the bad luck to blindly copy the fictitious feature, well, they arguably have you dead to rights -- potentially some research data might be export controlled, and some sites might want to insure that they don't inadvertently allow proscribed foreign nationals access to export controlled information -- librarians and archivists take a unique long term view, worrying about accessibility and usability of information assets decades or even centuries in the future, and have been known to insist on multiple distributed copies of information assets for redundancy and survivability in the event of adverse events (whether that's fire, flood, institutions going out of business, people getting rid of their last 9 track tape drives or 8" floppy drives, spinning media crashing or non-archival magnetic media deteriorating over time, etc.) -- Or is your query specific to system and network security-related datasets that your researchers may be working with? (If the later, I'd mention that we'll be having the 2nd Data Driven Collaborative Security Workshop for High Performance Networks later this month, and as you might expect from the title, methodological and substantive data-driven collaborative sharing issues relating to security data will likely be "center stage" during those sessions, as they were for the first DDCSW last year, see http://security.internet2.edu/ddcsw/ ) Anyhow, love to hear more about the specific areas related to this topic that you or others may be particularly interested in... I think it's a fascinating (but potentially immense) topic, so narrowing in on the particular aspects you're most interested in would probably be a key first step. Regards, Joe St Sauver (joe () oregon uoregon edu or joe () internet2 edu) Internet2 Security Programs Manager http://www.uoregon.edu/~joe/
Current thread:
- NSF Data Management Plans Steve Brukbacher (Aug 05)
- <Possible follow-ups>
- Re: NSF Data Management Plans Joe St Sauver (Aug 05)
- Re: NSF Data Management Plans Steve Brukbacher (Aug 30)