Educause Security Discussion mailing list archives

Re: DLP Best Practices


From: Brad Judy <brad.judy () CU EDU>
Date: Tue, 9 Dec 2014 16:49:56 +0000

We just kicked off a project here focused on end-point data discovery and I'm putting a lot of thought into how to 
approach it for best success.  I often hear negative feedback from end-users/departments in institutions when such 
tools are deployed in ways that make them feel high burden for little gain.

Some key steps I am looking at:


*         Keep the focus on a successful first project phase, not maximum data removal in the first pass.  A well run 
project can ensure support for future iterations whereas a poorly executed one might mean you never get very far.

*         Business stakeholder meeting early in project - get support from data owners who are accountable for the 
security/privacy of sensitive data and feedback on their thoughts of data cleanup.  Often you'll be surprised when a 
data owner wants to be more strict than you'd expect.  Figure out what they want to know about data proliferation - 
come with examples of how you might inform them.  What would be their success criteria for this effort?

*         Focus on highest risk items first - only check for data types (identity theft risk like SSN and CCN, 
regulated data like HIPAA) and data quantities (pick a threshold of X entries per file) that highlight the highest risk 
data repositories.  One of the biggest complaints of these projects is staff members being presented with a very long 
list of files to review.  This can result in the high risk items being lost in the noise of lots of small items.

*         Focus on low false positive data types first - Another frustration from users of such tools is dealing with 
high false positive rates.  Some types of less structured data (like HIPAA) can be difficult to pinpoint, so stay 
focused on high quality output that maximizes everyone's time while they get familiar with the process, tool and 
concepts.

o   A caveat to the above note - I have spent a lot of time on data searching patterns and many data sets across 
incidents, investigations, pro-active scanning, etc. and I have to emphasize that social security numbers are stored as 
non-delimited strings of nine digits the vast majority of the time, so I don't recommend using the common "only 
delimited SSN" option to reduce false positives.  One option I have found very successful is to focus non-delimited SSN 
searching on the SSN prefix for your local area.  Most schools have high numbers of local employees, students and 
patients - enough to trigger a flag.  While the SSA has moved to randomizing the prefix in the past few years, it will 
be more than 10 years before these new SSN holders become our students.

*         Focus on easiest to remediate first - Think about criteria that could highlight data that would be easiest to 
remediate.  Can you factor file last modified or accessed dates into your criteria?  Files that aren't in use are both 
high risk (because people may have forgotten about protecting them) and could possibly be easily deleted or archived 
offline.

*         Consider themes to each round (quarterly might be a good interval to allow for scanning, informing, 
remediation, follow-up) - Maybe focus on one data type per quarter - credit card numbers, social security numbers, 
patient record numbers, clear text passwords, etc.

*         Provide remediation guidance - Years ago I worked on a matrix to provide guidance about remediating sensitive 
information (http://www.colorado.edu/oit/sites/default/files/PrData_QkRef_Table_1.pdf) and I'm thinking of doing 
something similar for this project.  The updated version will likely have something to the effect of seeking approval 
from data owners for keeping immediate access to the data.

*         Revisit with stakeholders on the success of the first round.  Did you meet their success criteria?  Did they 
see the value?  Take their feedback and find out their interests in a next step.  They might guide you towards more 
data types, smaller data sets, different remediation options, etc.  An iterative process lends itself to a feedback and 
process improvement loop.

Those are the thoughts off the top of my head.  I hope they are helpful.


Brad Judy

Director of Information Security
University Information Systems
University of Colorado
1800 Grant Street, Suite 300
Denver, CO  80203
Office: (303) 860-4293
Fax: (303) 860-4302
www.cu.edu

[cu-logo_fl]





From: The EDUCAUSE Security Constituent Group Listserv [mailto:SECURITY () LISTSERV EDUCAUSE EDU] On Behalf Of PERRY 
II, JAMES
Sent: Tuesday, December 09, 2014 9:11 AM
To: SECURITY () LISTSERV EDUCAUSE EDU
Subject: [SECURITY] DLP Best Practices

All,

The University of South Carolina is developing a data loss prevention notification strategy.  Earlier this year we 
selected a DLP product, using the endpoint agent approach, and have an active pilot of ~1,900 reporting workstations.  
I am interested in a.) identifying other institutions that are also using an endpoint agent-based DLP solution; b.) how 
others are communicating DLP scan results to stakeholders; and, c.) how others are leveraging the scan results in the 
implementation of their information security program (i.e. making risk-based decisions using the data).

Regards,

James D. Perry II - Chief Information Security Officer
University of South Carolina
1244 Blossom Street
Columbia, SC  29208
Office: (803) 777-9612
Cell: (803) 521-7563
[emailLogo]


Current thread: