Nmap Development mailing list archives

Re: favicon survey script


From: Brandon Enright <bmenrigh () ucsd edu>
Date: Thu, 6 Aug 2009 20:26:12 +0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 6 Aug 2009 11:49:03 -0600
David Fifield <david () bamsoftware com> wrote:

On Thu, Aug 06, 2009 at 08:27:24AM +0200, Vlatko Kosturjak wrote:
David Fifield wrote:
Vlatko, did you ever finish mapping the hashes back to favicons
in your research?

Yes, I did. But extracted only top 10 from each survey done
(dmoz,80,443) and have summarized that into favicon-db (just updated
favicon-db in attachment to reflect survey done).

...snip...

Awesome. I would prefer to keep only the hashes that we have measured
to be common. João Correa is going to do some scanning and Brandon
Enright has been scanning as well.

The hash A8FE5B8AE2C445A33AC41B33CCC9A120 is by far the most common
one I found in my scanning, and I think in Brandon's too. Just like
you noted, it is really HTML text:


Indeed, I have been scanning ;-)

Here is what I scanned:

* 100M random IPs (small percentage actually listening on 80)
* 450k IPs resolved from links in Wikipedia (>99% listening on 80)
* 3M names (not IPs) from open directory/dmoz, (>99% listening on 80)

I'm making a compressed (7Zip) tarball of the entire favicon directory
available at:

http://noh.ucsd.edu/~bmenrigh/favicon.tar.7z

It compressed to about 670 MB but be warned, the tarball is 3 GB and
depending on the block size of your filesystem, could extract to 5 or
more GB.

If you want a list of the most popular hashes, something like:

find hash/ | egrep -v 'hash\/$' | xargs wc -l | egrep -v total | sort -n

should do.


Here are the top 50:

   342 hash/68B329DA9893E34099C7D8AD5CB9C940
   344 hash/AF999538CD3D4D0370F3EA92E0A6070F
   353 hash/10BD6AD7B318DF92D9E9BD03104D9B80
   358 hash/A34DEA4BD04BDB816BEA176619C29063
   373 hash/2C0067D9382A7F1751FED2D200F38DB7
   384 hash/63B982EDDD64D44233BAA25066DB6BC1
   404 hash/E9E6C56F63122FB05E6899E1DEDD0734
   406 hash/F30B5ED270A57EABEA60BEB935E2B800
   409 hash/EC49973C1991BF39FCDB53260467F39F
   424 hash/292B586171617B56E77EE694485B1052
   427 hash/E52C40433AA5F9256E521D7C139A05BD
   437 hash/4644F2D45601037B8423D45E13194C93
   458 hash/2C338C26309E13987D315D85F499D7F2
   462 hash/BEFCDED36AEC1E59EA624582FCB3225C
   482 hash/61E029C99ABC5CF058ABC77562A69F98
   487 hash/D16A0DA12074DAE41980A6918D33F031
   494 hash/EDAAEF7BBD3072A3A0C3FB3B29900BCB
   522 hash/A31552D4FCC0EA68D69153E458FE6AB2
   569 hash/73778A17B0D22FFBB7D6C445A7947B92
   582 hash/7194D8AFD9E3A6DD0048149C3F66D60A
   609 hash/D99217782F41E71BCAA8E663E6302473
   618 hash/CA79ABA701B8ED97D4505BCD766DF6F3
   629 hash/B25DBE60830705D98BA3AAF0568C456A
   684 hash/325472601571F31E1BF00674C368D335
   732 hash/0C46689B7D84E977E3C3683C6F316122
   735 hash/81ED5FA6453CF406D1D82233BA355B9A
   752 hash/226FFC5E483B85EC261654FE255E60BE
   866 hash/FF2C8612B75B5F9A6175E016FE4AA609
   899 hash/639B61409215D770A99667B446C80EA1
   903 hash/4EB846F1286AB4E7A399C851D7D84CCA
   924 hash/FA54DBF2F61BD2E0188E47F5F578F736
   942 hash/C1201C47C81081C7F0930503CAE7F71A
  1006 hash/389A8816C5B87685DE7D8D5FEC96C85B
  1277 hash/A5220EF442813C2FC6EE8CF13560278F
  1480 hash/59A0C7B6E4848CCDABCEA0636EFDA02B
  1482 hash/B7EBD6E8609ECBF0F053BAF5F550CB04
  1834 hash/A28EBCAC852795FE30D8E99A23D377C1
  1901 hash/4EE75CA12A52425B9514EE6DE25D23FE
  2347 hash/6F767458B952D4755A795AF0E4E0AA17
  2442 hash/7DBE9ACC2AB6E64D59FA67637B1239DF
  3334 hash/ECAA88F7FA0BF610A5A26CF545DCD3AA
  4040 hash/5B0E3B33AA166C88CEE57F83DE1D4E55
  4225 hash/1CE0C63F8BD1E5D3376EC0AE95A41C08
  4599 hash/E1E8BDC3CE87340AB6EBE467519CF245
  6044 hash/A8FE5B8AE2C445A33AC41B33CCC9A120
  6775 hash/5E1E9CC940D3BFAA59F51282D9FEC510
 11005 hash/64CA706A50715E421B6C2FA0B32ED7EC
 16524 hash/DCEA02A5797CE9E36F19B7590752563E
 25779 hash/9CEAE7A3C88FC451D59E24D8D5F6F166
 72702 hash/D41D8CD98F00B204E9800998ECF8427E

David's directory format is really good so if you want to know about
EC49973C1991BF39FCDB53260467F39F for example:

$ file icon/EC49973C1991BF39FCDB53260467F39F.ico
icon/EC49973C1991BF39FCDB53260467F39F.ico: MS Windows icon resource - 1 icon

$ tail hash/EC49973C1991BF39FCDB53260467F39F
www.jvsbaltimore.org:80
www.kospalace.gr:80
www.laraleigh.com:80
www.linstead.com:80
www.parallels.com:80
www.proudfootkennels.com:80
www.rpmcarbidedie.com:80
www.strikersonline.com:80
www.thestateexpress.com:80
www.work-shop.ch:80

It could take some work to figure out what each icon is from but at
least we have some data now about them.

Hope this helps.

Brandon

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkp7PHYACgkQqaGPzAsl94JWzACgk3IlZTFd7Hr5V0cJLQjksUl5
czkAn09OnTvMM24MRjDQp2KCE6KpMonr
=YRXW
-----END PGP SIGNATURE-----

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

Current thread: