Wireshark mailing list archives

Re: No tvb_get for string-encoded numbers?


From: Guy Harris <guy () alum mit edu>
Date: Sat, 5 Apr 2014 02:52:16 -0700


On Apr 4, 2014, at 2:01 PM, Hadriel Kaplan <hadriel.kaplan () oracle com> wrote:

For protocols which are actually truly UTF-8, I'm planning to just assume treating them as ASCII is ok, because as 
far as I know the atoi/strtol/etc. functions don't actually care: if they see the ASCII characters for digits (and 
+/-/etc.) they'll parse it, else not. So any non-ASCII UTF-8 character in the sequence is meaningless to them and 
they stop parsing at that character.

Yes, the only valid octets in a number in any "extended ASCII" would be:

        0x2b, 0x2d, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37

        0x38 and 0x39 if the radix is 10 or 16;

        0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x61, 0x62, 0x63, 0x64, 0x65, and 0x66 if the radix is 16;

so anything with the 8th bit set is not valid, meaning that the same routine can handle ASCII, ISO 8859-n, various 
Windows code pages, various Mac code pages, and UTF-8 - the actual character encoding is irrelevant, as long as ASCII 
characters are encoded as a single octet having the ASCII code point value.

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: