Wireshark mailing list archives
Re: filter for ONLY initial get request
From: Jeffs <jeffs () speakeasy net>
Date: Wed, 11 Aug 2010 16:08:20 -0400
On 8/11/2010 9:35 AM, Thierry Emmanuel wrote:
-----Original Message----- From: wireshark-users-bounces () wireshark org [mailto:wireshark-users-bounces () wireshark org] On Behalf Of Jeffs Sent: mercredi 11 août 2010 15:07 To: Community support list for Wireshark Subject: Re: [Wireshark-users] filter for ONLY initial get requestThis formula, however, only returns results minus the links and images embedded in the web page: tshark -r test.cap -T fields -e http.host | sed 's/?.*$//' | sed -n '/www./p' | sort | uniq -c | sort -rn | head -n 100 15 www.propertyshark.com 8 www.nytimes.com 2 www.google-analytics.com 1 www.facebook.com However, I am new to regex so I'm sure I may be missing something or losing some links.It is a common mistake to consider that every websites have their main address on a "www" subdomain. If you want a generic filter, you cannot rely on it. If you want a relevant result, you'll have to build a non-restrictive regexp and manually filter unappropriate results, eventually making some rules to exclude well-known advertising sites. A fully automatic solution would be to parse the data checking it is a well-formed html (or xml or plain-text) document. This will purge videos and images from your results.
I agree that not all websites have their main address as "www". But given that I am up until now unable to effectively remove all the extra domains that are captured and I am therefore bringing in a lot of extraneous domain names, I have to choose between the lesser of two evils -- lose some domains or pull in a lot of unwanted domain names that totally pollute my desired results. I wish there was a way to capture ONLY the initially requested URL that is either clicked or typed into the browser address bar. I was thinking that maybe a tap might solve this problem because it would capture only one half of a duplex conversation on one wire (the outgoing request) and thus only capture the requested URL. Your suggestion of parsing the data is indeed unique and intersting. Are you suggesting that dumpcap or ethereal would somehow interogate the link, follow it and then make a determination. This sounds like a very interesting prospect but I'm not fully sure I understand how it would work. Thank you. ___________________________________________________________________________ Sent via: Wireshark-users mailing list <wireshark-users () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-users Unsubscribe: https://wireshark.org/mailman/options/wireshark-users mailto:wireshark-users-request () wireshark org?subject=unsubscribe
Current thread:
- Re: filter for ONLY initial get request, (continued)
- Re: filter for ONLY initial get request j.snelders (Aug 09)
- Re: filter for ONLY initial get request Sake Blok (Aug 09)
- Re: filter for ONLY initial get request Sake Blok (Aug 09)
- Re: filter for ONLY initial get request Jeffs (Aug 09)
- Re: filter for ONLY initial get request Jeffs (Aug 09)
- Re: filter for ONLY initial get request Sake Blok (Aug 09)
- Re: filter for ONLY initial get request Jeffs (Aug 10)
- Re: filter for ONLY initial get request Sake Blok (Aug 11)
- Re: filter for ONLY initial get request Jeffs (Aug 11)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 11)
- Re: filter for ONLY initial get request Jeffs (Aug 11)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 12)
- Re: filter for ONLY initial get request Sake Blok (Aug 12)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 12)
- Re: filter for ONLY initial get request Sake Blok (Aug 12)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 12)
- Re: filter for ONLY initial get request Jeffs (Aug 12)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 12)
- Re: filter for ONLY initial get request Jeffs (Aug 12)
- Re: filter for ONLY initial get request Thierry Emmanuel (Aug 13)
- Re: filter for ONLY initial get request Jeffs (Aug 13)