Wireshark mailing list archives

Re: [Wireshark-commits] rev 53819: /trunk/epan/ /trunk/epan/dissectors/: packet-gadu-gadu.c /trunk/epan/: charsets.c charsets.h proto.h tvbuff.c


From: Guy Harris <guy () alum mit edu>
Date: Sat, 7 Dec 2013 14:42:16 -0800


On Dec 7, 2013, at 2:10 AM, darkjames () wireshark org wrote:

http://anonsvn.wireshark.org/viewvc/viewvc.cgi?view=rev&revision=53819

User: darkjames
Date: 2013/12/07 10:10 AM

Log:
Add new string proto encoding for windows-1250 (ENC_WINDOWS_1250)

- Move windows-1250 to unicode encoding table to charset.c
- Add tvb_get_string_unichar2, tvb_get_stringz_unichar2 functions which recode tvb-string to UTF-8.

Note that

        https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#gunichar2

says of a gunichar2 that it is

        A type which can hold any UTF-16 code point[4].

with the footnote:

        https://developer.gnome.org/glib/stable/glib-Unicode-Manipulation.html#ftn.utf16_surrogate_pairs

saying

        [4] surrogate pairs

This means that a gunichar2 can hold either

        1) a character from the Basic Multilingual Plane (BMP) of Unicode:

                https://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane

or

        2) a surrogate pair:

                https://en.wikipedia.org/wiki/UTF-16#Code_points_U.2B10000_to_U.2B10FFFF

so those routines can handle only encodings that don't include characters outside the BMP.

This is probably true of most non-Unicode encodings, such as the ISO 8859-n encodings, so it's OK for them, but be 
careful when using them.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe


Current thread: