Educause Security Discussion mailing list archives
Re: Password entropy
From: Valdis Kletnieks <Valdis.Kletnieks () VT EDU>
Date: Sun, 23 Jul 2006 16:52:06 -0400
On Fri, 21 Jul 2006 08:26:59 CDT, Graham Toal said:
I'm not real clear on the "entropy" concept but it has something to do with the pattern?I'm not sure it's the right word in this context, but I believe this is what they're talking about:
Actually, it *is* the right word, and you're basically correct but managed to avoid saying *why* you're correct...
if you have an 8 character password and the characters are chosen randomly, and each character is only lower case alphabetic, then the number of possible passwords available is 26^8
Now remember, this is 8 *randomly chosen* characters. Well-chosen random characters have *high* entropy, which is a measure of how unpredictable the next one is (mathematically, it's actually very similar to measuring the entropy of (for instance) the molecules in a gas - in both cases, the entropy measures the amount of "disorder").
But what is worse is that there is a pattern involved: to make it easier to remember, you use a grammatically correct phrase, such as "subject verb object". Lets say our vocabulary has 9000 nouns and 1000 verbs, then our password space is only 9000*1000*9000.
What chews up more entropy is the patterns *inside* each word. In English, the "next letter" is usually easy to predict (and therefor has little *effective* randomness or entropy). If you're looking at a 'q', the next letter is *almost* guaranteed to be a 'u', so there's very little "uncertainty" there, so that 'u' has a very low entropy. If the two letters you're looking at are 'io', the next one is probably an 'n', less likely to be a 'u' or 'l', and highly unlikely to be a 'z' (looking at some 490K words here): [/usr/share/dict]2 grep 'io' linux.words | sed 's/.*\(io.\).*/\1/' | sort | uniq -c | sort -nr | head -15 15681 ion 2279 iou 1124 iol 711 iot 693 ios 571 iop 450 ior 438 iom 406 ioc 382 iog 360 iod 183 ioi 102 io- 92 iob 74 ioe The *actual* chances are even more biased, since here I treated all words equally. The 92 words that have 'iob' include things like thiobacillus, plesiobiosis, and dithiobenzoic. (Of the 92, 17 also contain 'blast', indicating a medical term like 'angioblast')... (Incidentally, it goes even further - if that next letter is an 'n' as expected, guess what the letter before the 'i' almost always is? [/usr/share/dict]2 grep 'ion' linux.words | sed 's/.*\(.ion\).*/\1/' | sort | uniq -c | sort -nr | head 11889 tion 1715 sion 451 lion 304 nion 262 hion 239 rion 122 pion 104 cion 101 gion 97 xion Yep, a 't'. By the time you've seen a 'tio', you may as well reserve just one bit to store the next letter, because it's almost always going to be an 'n' (some 11K times), so 95% of the time, you can store 'yes, it's the expected N' in one bit, and the other 5% store a 'no it wasn't" as one bit, follow the 'no' with a 5-bit code indicating what it actually was, and *still* save space, as you'll average about 1.25 bits. This is why English text compresses so well (and in fact, the entropy of data is *directly* related to the maximum possible compression of the data). A bit of thought will reveal a lot of other 2 and 3 character combinations that are a lot more common ('ing', etc...). The end result is that running English text averages about 2.5 to 3 bits of entropy per character, and even skript kiddie 'l33t sp33k' and that obfuscated spam stuff is probably still under 4 bits/character (I'll go out on a limb and hypothesis that if it's trying to pass itself off as English, and has over 3.5 bits/char of entropy, it's been too obfuscated to be easily readable....)
By the way, this is why pass phrases have to be quite long to have equivalent strength to a password.
If we had keyboards and brains and systems that accepted Chinese characters that represent words as single characters, an 8-word passphrase would be as long and nearly as strong as an 8-character random password. The reason the passphrase has to be longer is because you get much less randomness and entropy *per character* in a Latin-charset passphrase... And actually, the high redundancy (the inverse of entropy) of most human languages is a Good Thing - it's what our brains use to figure out what was really meant when we hit the the inevitable typo, or can't hear somebody very well in a bar or other noisy environment. There's even at least one example in this paragraph that you probably didn't even notice (two, if I made an intentional typo ;)
(Now, the entropy of a random number source is something quite different, and I think in that case entropy is the right word to use.
It's correct in this context as well - and in fact, a good theoretical way to look at passphrases is as the result of a "not very" random source, and what you want to compute is how much data you have to gather before you have gathered a given level of total randomness.
Attachment:
_bin
Description:
Current thread:
- Re: Password entropy, (continued)
- Re: Password entropy Roger Safian (Jul 20)
- Re: Password entropy Graham Toal (Jul 20)
- Re: Password entropy Valdis Kletnieks (Jul 20)
- Re: Password entropy Basgen, Brian (Jul 20)
- Re: Password entropy Roger Safian (Jul 20)
- Re: Password entropy Basgen, Brian (Jul 20)
- Re: Password entropy Harold Winshel (Jul 20)
- Re: Password entropy Harold Winshel (Jul 20)
- Re: Password entropy Graham Toal (Jul 21)
- Re: Password entropy Roger Safian (Jul 21)
- Re: Password entropy Valdis Kletnieks (Jul 23)
- Re: Password entropy Roger Safian (Jul 23)
- Re: Password entropy Roger Safian (Jul 23)
- Re: Password entropy Paul Russell (Jul 23)
- Re: Password entropy James H Moore (Jul 23)
- Re: Password entropy Valdis Kletnieks (Jul 23)
- Re: Password entropy Harold Winshel (Jul 24)
- Re: Password entropy Robert Kerr (Jul 24)
- Re: Password entropy Graham Toal (Jul 24)
- Re: Password entropy Roger Safian (Jul 24)
- Re: Password entropy Graham Toal (Jul 24)
(Thread continues...)