Nmap Development mailing list archives
Re: Replacing passwords.lst
From: Brandon Enright <bmenrigh () ucsd edu>
Date: Wed, 17 Mar 2010 01:16:29 +0000
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 16 Mar 2010 18:58:02 -0600 David Fifield <david () bamsoftware com> wrote: [...]
I wrote a simple program to sum the counts from several password files and output the top n passwords. Using the five lists above, I regenerated our nselib/data/passwords.lst. The program automatically does bz2 decompression based on filename so keeping compressed lists isn't inconvenient.Cool, it's good to handle the bz2 compression transparently. I think we can't just sum the lists though without normalizing them to a degree. Otherwise rockyou is weighted too strongly. Ron and I chatted off-list about this a bit. A simple linear weight probably isn't the right choice because things that are only duplicated a few times in phpbb or mypspace would get scaled up too much.I don't understand. All of Ron's lists have counts, not just ranks. So if a myspace password has a count of 1 or 2, it will still have a count of 1 or 2 in the master list and end up way at the bottom.
Yeah I was referring to normalizing their counts. More on that below.
To me, each password list is like a sample from a giant population. That's not totally accurate because different sites have different password policies, but the size of each sample shouldn't matter, right?
Well each is a pretty biased sample of a really huge password population. If our lists were truly random samples from that population then no amount of weighting for sample size would be better than just summing up counts and ordering them. Since we don't know how biased each list is we should just treat them equally. If our goal is to sum the counts up while keeping them equal we have to normalize those counts. Put another way, if we had a list with 10 million passwords and 9M of them were "password" that list would clearly be a very biased sample from all passwords available out there. If we wanted to combine that list with our myspace list, we couldn't let 9M "password" be added to the count of "password" for the myspace list. The bias for our 10M word list would just be too significant in the resulting list. Since rockyou is so huge it dominates the other lists and I think we need to weight them on some factor of their sample size so that our resulting list doesn't just reflect the rockyou list biases. Any weighting we come up with should be a NOP if the lists are unbiased random samples. I think this is pretty easy and natural to do. I'm pretty sure the code and sample results are going to speak louder than words here. I can probably start working on this and testing this weekend. Brandon -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAkugLXMACgkQqaGPzAsl94IlLQCgvHqggnTX8XLLKnqEFCv+wwLI rxYAnAvjsGj0qYfZx+GBeumCs+2eK9dV =0yOW -----END PGP SIGNATURE----- _______________________________________________ Sent through the nmap-dev mailing list http://cgi.insecure.org/mailman/listinfo/nmap-dev Archived at http://seclists.org/nmap-dev/
Current thread:
- Re: Replacing passwords.lst, (continued)
- Re: Replacing passwords.lst Brandon Enright (Mar 05)
- Re: Replacing passwords.lst Fyodor (Mar 06)
- Re: Replacing passwords.lst Ron (Mar 06)
- Re: Replacing passwords.lst David Fifield (Mar 06)
- Re: Replacing passwords.lst Martin Holst Swende (Mar 06)
- Re: Replacing passwords.lst David Fifield (Mar 12)
- Re: Replacing passwords.lst Fyodor (Mar 12)
- Re: Replacing passwords.lst David Fifield (Mar 16)
- Re: Replacing passwords.lst Brandon Enright (Mar 16)
- Re: Replacing passwords.lst David Fifield (Mar 16)
- Re: Replacing passwords.lst Brandon Enright (Mar 16)
- Re: Replacing passwords.lst Fyodor (Mar 16)
- Re: Replacing passwords.lst Ron (Mar 17)
- RE: [BULK] Re: Replacing passwords.lst Norris Carden (Mar 17)
- Re: [BULK] Re: Replacing passwords.lst Ron (Mar 17)
- Re: Replacing passwords.lst Ron (Mar 16)
- Re: Replacing passwords.lst Fyodor (Mar 16)
- Re: Replacing passwords.lst Fyodor (Mar 16)