Educause Security Discussion mailing list archives
Re: Sensitive data detection
From: Brad Judy <Brad.Judy () COLORADO EDU>
Date: Sat, 21 Apr 2007 17:04:15 -0600
I definitely recommend a selective SSN regex like we are using for non-delimited SSNs. If you aren't sure what numbers to use, see if you can dump the first three digits of your student SSN's and do a quick analysis to find the most common. While select schools may be pretty balanced, most of us have a heavy weighting toward local residents. I also wouldn't touch regex without using boundary statements on those expressions. Just adding a \b or \D to the start and end makes a huge difference in reducing false positives if you aren't already using it. Interestingly, one of the false positives we can't reliably shake is actually hyphenated: Japanese telephone numbers which appear to be formated nnn-nn-nnnn. Brad Judy IT Security Office Information Technology Services University of Colorado at Boulder Date: Fri, 20 Apr 2007 17:56:34 -0400 From: Wyman Miles Subject: Re: Sensitive data detection Content-Type: text/plain;charset=iso-8859-1 Runs of 9 are an extremely difficult problem. You can bracket them with a \D (nondigit) or \b (word break), which sometimes helps. Validation against the SSA area/group data helps a little. If your institution draws heavily from a predictable population, you can use the approach Colorado employs and write geographically dependent regexes. But no, there is no silver bullet. SINs are a far easier problem as they're Luhn-derived, like CC#s.
I'd be interested in hearing people's feedback about the issues with
high false positive rates and 9 digit SSNs in evaluating these
tools. Most the datastores I come across here store SSN without
hyphens, and creating regexs for any combination of 9 digit numbers
has always returned high false positives, so much so its borderline
useless. There are some special rules for SSNs, but nothing like
creditcard luhn checks.
At 11:15 AM 4/20/2007, Harold Winshel wrote:
We're also looking to use Cornell's Spider program for
Rutgers-Camden Arts & Sciences faculty and staff.
At 01:52 PM 4/20/2007, you wrote:
On 4/20/07, Curt Wilson <[log in to unmask]> wrote:
Dear Educause security community,
For those that are currently working on a project involving the
identification of sensitive data across campus, I have some items of
potential interest. I know that Teneble (Nessus) recently announced
a
module that can check (with host credentials) a host for the
presence
of
selected types of sensitive data, but what we have chosen is
Proventsure's Asarium software. We are in the early stages of
testing,
but it looks to be a tremendously helpful tool for such a large task
(depending upon the size of your institution).
Thanks Curt. A freeware package that works in this same area is
the Cornell Spider
http://www.cit.cornell.edu/computer/security/tools/
http://www.cit.cornell.edu/computer/security/tools/spider-cap.html
--
Peter N. Wan ([log in to unmask]) 258 Fourth Street, Rich 244
Senior Information Security Engineer Atlanta, Georgia
30332-0700
USA
OIT, Information Security +1 (404) 894-7766 AIM:
oitispnw
Georgia Institute of Technology GT FIRST Team
Representative
Harold Winshel
Computing and Instructional Technologies
Faculty of Arts & Sciences
Rutgers University, Camden Campus
311 N. 5th Street, Room B10 Armitage Hall
Camden NJ 08102
(856) 225-6669 (O)
------------------------------------------------------------------------ ---------------------------
Josh Drummond
Security Architect
Administrative Computing Services, University of California - Irvine
[log in to unmask]
949.824.9574
Wyman Miles Senior Security Engineer Cornell University
Current thread:
- Sensitive data detection Curt Wilson (Apr 20)
- <Possible follow-ups>
- Re: Sensitive data detection Peter Wan (Apr 20)
- Re: Sensitive data detection Harold Winshel (Apr 20)
- Re: Sensitive data detection Josh Drummond (Apr 20)
- Re: Sensitive data detection Randy Marchany (Apr 20)
- Re: Sensitive data detection Wyman Miles (Apr 20)
- Re: Sensitive data detection Brad Judy (Apr 21)