Wireshark mailing list archives

Re: FT_STRING, FT_STRINGZPAD, and null padding


From: Guy Harris <gharris () sonic net>
Date: Fri, 4 Sep 2020 23:48:14 -0700

On Sep 4, 2020, at 10:56 PM, John Thacker <johnthacker () gmail com> wrote:

I don't understand this change, which makes me wonder if I understand FT_STRINGZPAD vs FT_STRING in general:

OK, here's the types of strings that there can be in protocols.

1. Counted strings.  A counted string has a count of the number of octets (it might be half the number of octets if 
it's UTF-16 or UCS-2, but it's really a count of octets).  There is nothing special about null characters, other than 
"if some software treats the string as a C string, it won't work correctly if the string happens to include a null 
character, so maybe that software shouldn't treat it as a C string".

2. Null-terminated, non-counted strings.  A null-terminated, non-counted, string has no count of the number of octets; 
it runs until a null terminator (8-bit, 16-bit, whatever) is seen.  Obviously, that string can't contain a null 
terminator.

3. Padded strings in a fixed-size buffer.  If the string is the length of the buffer, there is no null terminator; 
otherwise, there's a null terminator; the protocol may specify that all octets after the null terminator, if the string 
isn't the length of the buffer - 1, must be null, or may say there is no such requirement.

4. Strings that are counted *and* null-terminated.  This is redundant, but maybe somebody wants to make sure that 
there's a null at the end, to make life easier for implementations that use C strings internally.  This doesn't 
*really* make things easier, unless you're in a 100% controlled environment, because, in any environment in which the 
software has to deal with packets from buggy or malicious senders, and uses C strings internally, they have to make 
sure there's a null terminator *anyway* - they *can't* just copy the string and blithely assume that a terminating null 
is part of the counted blob.

For counted strings, it was deemed useful to *warn* if there's a null in the middle of the string, as it might indicate 
a string with which some implementations might have a problem.  The commit that added that check was

        commit 5c36f6166c30b586be3e6cc600f58e1eb5830eb7
        Author: Stig Bjørlykke <stig () bjorlykke org>
        Date:   Mon Aug 27 20:55:36 2018 +0200

            epan: Detect trailing stray characters in strings
    
            Trailing stray characters will not show up in the packet tree item
            when the string is correctly null terminated. This expert info
            will indicate when this occurs, typically from wrongly implemented
            protocol encoders.
    
            This will warn about cases like:
    
              tvb = "foo\0bar"
              proto_tree_add_item(..., tvb, 0, 7, ...)

In the case of a counted string, if the string is a 7-octet value of "foo\0bar", it should show up as exactly that in 
the packet tree item; if not, we should fix it to do so.

For null-terminated, non-counted strings, there's no need to check whether there's a null - if there isn't, we'll run 
past the end of the packet, throw an exception, and correctly report the packet as malformed.

For padded strings in a fixed-size buffer, about the only check worth making is, *if* there's a null character in the 
string, whether there's anything non-null after the null character.  However, that check should *not* be made unless 
the protocol requires it; there are a lot of cases where it does *not* require it, and valid packets don't have it.

For strings that are counted and null-terminated, we should obviously make sure the *last* character is a null, and 
report an error if it's not.

Currently, we use FT_STRING for the first of those, FT_STRINGZ for the second of those, FT_STRINGZPAD for the third of 
those, and FT_STRINGZ for the fourth of those; the difference between the two FT_STRINGZ cases is whether a length was 
specified or not.

Everything in proto.c and tvbuff.c treats FT_STRING and FT_STRINGZPAD exactly the same currently, except for checking 
stray characters. If the length is given as -1, the length to the end of the tvbuff is used. Otherwise, the specified 
length is checked and copied from the tvbuff, and a NUL is added at the end regardless. The last byte ensures that 
they both work even if the buffer is not null terminated.

However, if the string is null terminated somewhere in the middle, all the bytes after that (presumably padding) are 
still copied into the return value, along with the extra '\0'. This doesn't really cause any problems other than 
wasting a little bit of memory and time, since the first null ends the string.

Before that commit, there was absolutely no difference that I could tell. Now the difference is that if the string is 
terminated by a NUL somewhere in the middle before the specified field length, whether or not the bytes in the 
padding after that must be null padding. It is FT_STRING that requires that any padding be zero-padding

That's not padding.  That's characters that might be missed by receiver implementations that might store strings as C 
strings.

(and adds an expert warning if not), whereas FT_STRINGZPAD does not.

While I agree with the commit message that "there are protocols where a string that's not the full length of the part 
of the packet for the string has a null terminator but isn't guaranteed to be fully padded with nulls," and that 
there can be "a separate type for fields where we really *should* check that the padding is all nulls," I don't see 
why FT_STRINGZPAD isn't that latter type. To me, the name FT_STRINGZPAD implies that it is the one where we really 
should check that the padding is zero padding.

The name is historical; we should probably rename it to something that indicates what it really is, and, *IF* there are 
any fields where there's a *requirement* that the padding be all-null, a *new* type be added for that, and a check 
added for that.  There are *definitely* types where there *isn't* such a requirement.

Either that, or those four types should be considered different *encoding* types, to be combined with the character 
encoding and the byte order.

The documentation in README.dissector does not really explain the differences between the use cases,

It needs to be updated.

I have some in-progress work for this.
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev () wireshark org>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request () wireshark org?subject=unsubscribe

Current thread: