Wireshark mailing list archives
Re: Display of UTF-8 Characters
From: John Thacker <johnthacker () gmail com>
Date: Sat, 12 Dec 2020 14:24:50 -0500
The problem is the sigma. The black diamond with a question mark is the UTF-8 REPLACMENT CHARACTER, which is being inserted twice for the two bytes that make up the character. There was an issue with UTF-8 sigma and other Greek letters ( https://gitlab.com/wireshark/wireshark/-/issues/17070) that was fixed in the recently released 3.2.9, 3.4.1, and master, but would be broken in 3.3.0, where it would appear as that. A workaround would be to use proto_tree_add_string_format_value() with the last two parameters "%s" and your string value again, which ends up bypassing the flawed format_text() function in that version. Or upgrade or get the patch from that bug. John Thacker On Sat, Dec 12, 2020 at 1:43 PM <jayrturner99 () gmail com> wrote:
I create a GString str = “A{Dagger}B{Sigma}C”; (i.e. “\x41\xE2\x80\xA0\x42\xCE\xA3\x43” where \xE2\x80\xA0 is Dagger and \xCE\xA3 is Sigma). The Dagger is the correct UTF-8 code ( https://www.fileformat.info/info/unicode/char/2020/index.htm) and the Sigma is the correct UTF-8 code ( https://www.fileformat.info/info/unicode/char/03a3/index.htm). I use col_append_lstr(pinfo->cinfo, COL_INFO, str, COL_ADD_LSTR_TERMINATOR); The display is “A{Dagger}B{Sigma}C” where the {Dagger} and {Sigma} are the correct visual single characters. I use proto_string_add_string(…, str); The display is “A{Dagger}B{black-diamond-with-question-mark}{black-diamond-with-question-mark}C” where the {black-diamond-with-question-mark} is the visual single character of a black diamond with a question mark (and it is displayed twice). So col_append_lstr handles UTF-8 and proto_string_add_string partially handles UTF-8. How can I get a proto_string_* function that will display UTF-8 correctly like col_append_lstr does? I do not need any string function to validate my UTF-8 bytes (if I make a mistake, that’s my problem). I just want a consistent display. Environment: Windows 10 Enterprise (10.0.18363) x64 Microsoft Visual Studio Community 2019 Version 16.7.1 QT v5.15.0 using msvc2019_64 Wireshark 3.3.0 with customer dissector Wireshark Font Consolas Regular 12.0 ___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: https://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org ?subject=unsubscribe
___________________________________________________________________________ Sent via: Wireshark-dev mailing list <wireshark-dev () wireshark org> Archives: https://www.wireshark.org/lists/wireshark-dev Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev mailto:wireshark-dev-request () wireshark org?subject=unsubscribe
Current thread:
- Display of UTF-8 Characters jayrturner99 (Dec 12)
- Re: Display of UTF-8 Characters John Thacker (Dec 12)
- Re: Display of UTF-8 Characters Guy Harris (Dec 12)
- Re: Display of UTF-8 Characters Guy Harris (Dec 12)
- Re: Display of UTF-8 Characters jayrturner99 (Dec 12)
- Re: Display of UTF-8 Characters Guy Harris (Dec 12)
- code.wireshark.org git access (was Re: Display of UTF-8 Characters) Gerald Combs (Dec 13)
- Re: code.wireshark.org git access (was Re: Display of UTF-8 Characters) Guy Harris (Dec 13)
- Re: code.wireshark.org git access (was Re: Display of UTF-8 Characters) Gerald Combs (Dec 13)
- Re: Display of UTF-8 Characters Guy Harris (Dec 12)
- Re: Display of UTF-8 Characters jayrturner99 (Dec 13)