Educause Security Discussion mailing list archives
Re: How much host data collected?
From: "Bridges, Robert A." <bridgesra () ORNL GOV>
Date: Mon, 30 Apr 2018 15:51:59 +0000
Alan, * What's "host security data?" I think we are interested in data that can be used for diagnosing both security-related incidents (intrusions, breaches), and misconfigurations. Any system logs, host-based IPS logs etc. we're interested in understanding. * Are you breaking things out by service? No, but we’ve been having a general understanding that workstation monitoring is generally different (higher number of IPs, less data per IP, different data) than servers. * Are you considering the differences in OSes? This has been a necessity. Folks we talk to generally collect greater sets of system logs for Windows than other OSes on workstations. Is that true in your (everyone out there?)’s operations? * Is compressibility a factor? I haven’t considered this. We’ve been defaulting to whatever the impact is for the resource (memory, HD, disk IO on the host device, and HD space to store logs if you store logs) for whatever format it is in. * Are you interested in event counts or raw byte counts for data? We are interested in bytes per IP per day, and number of IPs. Ideally can estimate the cost of security using cloud costs (for each host we need x amount of memory to run the AV, y amount of disk space..), which is directly translatable to $. Overall, our goals are to understand what host data is collected and how much (in terms of bytes per host per day). We're informing future research efforts. We'd like to know, e.g., the cost of security (e.g., how much memory on the host is used? How much HD space is used to store data? .. ) and then if we can find ways to lower the cost but increase the signal (e.g., only collect high fidelity data after some alert has tripped). If anyone has more information about what and how much data you collect, we’d be interested. Or if there are ideas for next-generation tools research can pursue—e.g., turning on audit logs only after some event? Similarly, if anyone can give costs of an intrusion, that’d be interesting for estimating the opposite side of the coin, i.e., when security is insufficient. Thanks Bobby -- Robert A. Bridges, PhD, Research Mathematician, Cyber & Information Science Research Group, Oak Ridge National Laboratory On 4/26/18, 3:37 PM, "Alan Amesbury" <amesbury () oitsec umn edu> wrote: On Apr 19, 2018, at 20:32 , Bridges, Robert A. <bridgesra () ORNL GOV> wrote: > What is the average amount of host security data your SOC collects per host, per day? [snip] It's hard to say without knowing the full extent of what "security data" entails. Some questions that come to mind include: * What's "host security data?" There's a great deal of overlap between "security" and "operations" as far as I'm concerned. For example, log data generated by the latter domain will almost certainly contain information the former domain finds interesting. However, others might consider system logs to not be "security data." * Are you breaking things out by service? I'm also not sure whether "average" will suffice as a reasonable measure, given that a web server's logs are going to likely be very different than logs from another kind of server, e.g., mail, DHCP, LDAP, domain controller, etc. Workstations (i.e., users' hosts) are also an entirely different category (maybe multiple ones?), too. * Are you considering the differences in OSes? Different OSes also log at significantly different levels depending on their settings. Windows hosts, for example, can produce MASSIVE amounts of data when compared to a Unix host. * Is compressibility a factor? Some log formats are binary, which may not compress very well. Text formatted logs may compress *extremely* well, at better than 10:1. * Are you interested in event counts or raw byte counts for data? There's a vast difference between storing 1000 events and storing 1000 bytes of event data. Data can generally be stored pretty cheaply. Filesystems like ZFS can provide transparent data compression and scale to pretty large sizes while maintaining data integrity (it checksums the data, checksums the metadata, and then checksums the checksums, if I recall correctly, and can use distributed parity to reconstruct corrupted data). If you're talking about being able to *use* the data, then costs tend to go up. Tools available can range from about zero software costs to thousands or millions of dollars depending on scale, ease of use, and a host of other factors. That said, I might be able to give you a rough idea of what we see in terms of event counts by several different sources, although it might make more sense to discuss those specifics off list. -- Alan Amesbury University Information Security http://umn.edu/lookup/amesbury
Current thread:
- How much host data collected? Bridges, Robert A. (Apr 19)
- Re: How much host data collected? Alan Amesbury (Apr 26)
- Re: How much host data collected? Valdis Kletnieks (Apr 26)
- Re: How much host data collected? Bridges, Robert A. (Apr 30)
- Re: How much host data collected? Bridges, Robert A. (Apr 30)
- Re: How much host data collected? Valdis Kletnieks (Apr 26)
- Re: How much host data collected? Alan Amesbury (Apr 26)