Wireshark mailing list archives
Re: filter for ONLY initial get request
From: Jeffs <jeffs () speakeasy net>
Date: Wed, 11 Aug 2010 09:06:53 -0400
On 8/11/2010 6:12 AM, Sake Blok wrote:
On 10 aug 2010, at 16:48, Jeffs wrote:I have come up with the following tshark formula which seems to address my needs. Since I am not interested in the URLs from advertising agencies, videos and other embedded links in web pages, but only the top level domain I use this. Please let me know if anyone sees any gotchas or potential problems with this formula I'm very new to regex expressions and could use advice. This formula will return only the top level domains and strips out links such as admin.brightcove.com, advertisingserver.amazon.com, tubemogel.videos.com: tshark -r test.cap -R http.request -T fields -e http.host | sed -e 's/?.*$//' | sed -e 's#^\(.*\)\t\(.*\)$#http://\1\2#' | sort | uniq -c | sort -rn | head -n 300 | sed -n -e '/www/p'If you're only interested in an overview of visited top-level domains, without caring what the specific hosts and/or URI's were that were visited. You could use something like tshark -r test.cap -R http.request -T fields -e http.host | sed -e 's/^.*\.\([^.]*\.[^.]*\)$/\1/' | sort | uniq -c | sort -rn | head -n 100 for the top-100 top-level domains (based on individual hits, not user sessions). Cheers, Sake ___________________________________________________________________________ Sent via: Wireshark-users mailing list<wireshark-users () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-users Unsubscribe: https://wireshark.org/mailman/options/wireshark-users mailto:wireshark-users-request () wireshark org?subject=unsubscribe
Thank you for your reply. The issue I am having, and which also happens with the formula you provided, above, is that domains are being reported that are links (mostly advertising and graphic-image links) embedded in the web page which I do not want for they will pollute my results. I only want either the domain for the link clicked, or the domain for the link typed in the browser box. For example, the formula you provided above returns: 71 nytimes.com 15 propertyshark.com 13 fbcdn.net 5 voicefive.com 5 2mdn.net 4 brightcove.com 2 google-analytics.com 2 doubleclick.net 1 yahoo.com 1 imrworldwide.com 1 facebook.com The above doubleclick.net, brightcove.com, 2mdn.net, and fbcdn.net reported domains are for things like advertising links and embedded links in the web page of the landing page for the domain typed or clicked. This is polluting my results. This formula, however, only returns results minus the links and images embedded in the web page: tshark -r test.cap -T fields -e http.host | sed 's/?.*$//' | sed -n '/www./p' | sort | uniq -c | sort -rn | head -n 100 15 www.propertyshark.com 8 www.nytimes.com 2 www.google-analytics.com 1 www.facebook.com However, I am new to regex so I'm sure I may be missing something or losing some links. Thank you. ___________________________________________________________________________ Sent via: Wireshark-users mailing list <wireshark-users () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-users Unsubscribe: https://wireshark.org/mailman/options/wireshark-users mailto:wireshark-users-request () wireshark org?subject=unsubscribe
Current thread:
- Re: filter for ONLY initial get request, (continued)
- Re: filter for ONLY initial get request j.snelders (Aug 08)
- Re: filter for ONLY initial get request Jeffs (Aug 09)
- Re: filter for ONLY initial get request j.snelders (Aug 09)
- Re: filter for ONLY initial get request Sake Blok (Aug 09)
- Re: filter for ONLY initial get request Sake Blok (Aug 09)
- Re: filter for ONLY initial get request Jeffs (Aug 09)
- Re: filter for ONLY initial get request Jeffs (Aug 09)
- Re: filter for ONLY initial get request Sake Blok (Aug 09)
- Re: filter for ONLY initial get request Jeffs (Aug 10)
- Re: filter for ONLY initial get request Sake Blok (Aug 11)
- Re: filter for ONLY initial get request Jeffs (Aug 11)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 11)
- Re: filter for ONLY initial get request Jeffs (Aug 11)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 12)
- Re: filter for ONLY initial get request Sake Blok (Aug 12)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 12)
- Re: filter for ONLY initial get request Sake Blok (Aug 12)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 12)
- Re: filter for ONLY initial get request Jeffs (Aug 12)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 12)
- Re: filter for ONLY initial get request Jeffs (Aug 12)