Wireshark mailing list archives
Re: [Wireshark-commits] rev 53819: /trunk/epan/ /trunk/epan/dissectors/: packet-gadu-gadu.c /trunk/epan/: charsets.c charsets.h proto.h tvbuff.c
From: Guy Harris <guy () alum mit edu>
Date: Sat, 7 Dec 2013 14:42:16 -0800
On Dec 7, 2013, at 2:10 AM, darkjames () wireshark org wrote:
http://anonsvn.wireshark.org/viewvc/viewvc.cgi?view=rev&revision=53819 User: darkjames Date: 2013/12/07 10:10 AM Log: Add new string proto encoding for windows-1250 (ENC_WINDOWS_1250) - Move windows-1250 to unicode encoding table to charset.c - Add tvb_get_string_unichar2, tvb_get_stringz_unichar2 functions which recode tvb-string to UTF-8.
Note that https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#gunichar2 says of a gunichar2 that it is A type which can hold any UTF-16 code point[4]. with the footnote: https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#ftn.utf16_surrogate_pairs saying [4] surrogate pairs This means that a gunichar2 can hold either 1) a character from the Basic Multilingual Plane (BMP) of Unicode: https://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane or 2) a surrogate pair: https://en.wikipedia.org/wiki/UTF-16#Code_points_U.2B10000_to_U.2B10FFFF so those routines can handle only encodings that don't include characters outside the BMP. This is probably true of most non-Unicode encodings, such as the ISO 8859-n encodings, so it's OK for them, but be careful when using them. ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: http://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- Re: [Wireshark-commits] rev 53819: /trunk/epan/ /trunk/epan/dissectors/: packet-gadu-gadu.c /trunk/epan/: charsets.c charsets.h proto.h tvbuff.c Guy Harris (Dec 07)