Re: [PATCH] EmbeddedPkg/GdbSerialLib: avoid left shift of negative quantity

From: Leif Lindholm <leif.lindholm@linaro.org>
To: Laszlo Ersek <lersek@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>, edk2-devel@lists.01.org
Subject: Re: [PATCH] EmbeddedPkg/GdbSerialLib: avoid left shift of negative quantity
Date: Tue, 19 Jun 2018 10:54:11 +0100	[thread overview]
Message-ID: <20180619095411.52siakmtbfokc6mu@bivouac.eciton.net> (raw)
In-Reply-To: <96210413-f241-63d1-3e96-a77757d3b0e4@redhat.com>

On Tue, Jun 19, 2018 at 12:51:49AM +0200, Laszlo Ersek wrote:
> On 06/18/18 23:57, Leif Lindholm wrote:
> > On Mon, Jun 18, 2018 at 10:49:18PM +0200, Ard Biesheuvel wrote:
> >> Clang complains about left shifting a negative value being undefined.
> > 
> > As well it should.
> > 
> >>   EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c:151:30:
> >>   error: shifting a negative signed value is undefined [-Werror,-Wshift-negative-value]
> >>   OutputData = (UINT8)((~DLAB<<7)|((BreakSet<<6)|((Parity<<3)|((StopBits<<2)| Data))));
> >>
> >> Redefine all bit pattern constants as unsigned to work around this.
> > 
> > So, I'm totally OK with this and
> > Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>
> > but ...
> > would it be worth fixing up BIT0-31 in Base.h and use those?
> 
> If we started with the BITxx macros now, I'd agree. Given the current
> tree, I don't :) I've made an argument against the suggestion earlier:
> 
>   UINT64 Value64;
> 
>   Value64 = ~0x1;
>   Value64 = ~0x1U;
> 
> The two assignments produce different results.
> 
> In the first case, the integer constant 0x1 has type "int". Our C
> language implementation uses, for type "int", two's complement
> representation, 1 sign bit, 31 value bits, and no padding bits. So after
> the bitwise complement, we get 0x1111_1111_1111_1110, also with type
> int, and with value (-2). Taking UINT64 for "unsigned long int" or
> "unsigned long long int", after the conversion implied by the assignment
> we get ((UINT64_MAX + 1) + (-2)), aka (UINT64_MAX - 1).

Err, if by UINT64_MAX + 1 you mean 0, as a way of introducing the
second part, I'm with you. Otherwise I think I need to go back to
school.

> In the second case, the constant 0x1U has type "unsigned int". After the
> bitwise complement, we get (UINT32_MAX - 1), also of type "unsigned
> int". Taking UINT64 for "unsigned long int" or "unsigned long long int",
> after the conversion implied by assignment we get the exact same value,
> (UINT32_MAX - 1).

But these examples make it look like the differences are an
unobservable internal side effect. That's not the case.

> In assembly parlance this is called "sign-extended" vs. "zero-extended".
> I dislike those terms when speaking about C; they are not necessary to
> explain what happens. (The direction is the opposite -- the compiler
> uses sign-extension / zero-extension, if the ISA supports those, for
> implementing the C semantics.)

I always saw it as the C language emulates a machine that supports
sign-/zero-extension.

> So, I don't recommend changing BIT0 through BIT30 in Base.h, unless we'd
> like to audit all their uses :)

I would actually still argue for this if we expected the vast majority
of users to be in the open source trees. I don't think that's the case.

> Note: "through BIT30" is not a typo above. BIT31 already has type
> "unsigned int". That's because of how the "type ladder" for integer
> constants works in C. It's easy to look up in the standard (or, well, in
> the final draft), but here's the mental model I like to use for it:
> 
> - The ladder starts with "int", and works towards integer types with
>   higher conversion ranks, until the constant fits.
> 
> - Normally only signed types are considered; however, when using the 0x
>   (hex) or 0 (octal) prefixes, we add unsigned types to the ladder. Each
>   of those will be considered right after the corresponding signed
>   integer type, with equal conversion rank. The lesson here is that the
>   0x and 0 prefixes *extend* the set of candidate types.
> 
> - The suffix "u" (or equivalently "U") *restricts* the ladder to
>   unsigned types, however. (Regardless of prefix.)
> 
> - The suffixes "l" and "ll" (or equivalently, "L" and "LL", resp.) don't
>   affect signedness, instead they affect how high we set our foot on the
>   ladder at first. And, we climb up from there.

Yeah, this bit I did know, but it's arcane enough (and _really_ shows
the origins of C) that I mainly actively try to avoid it by always
being explicit where it matters.
Certainly the most fun I tend to come across with it is with enums.

> Given our "signed int" and "unsigned int" representations (see above),
> BIT30 (0x40000000) fits in "int", so it gets the type "int". However,
> BIT31 (0x80000000) does not fit in "int". Because we use the 0x prefix
> with it, it gets the type "unsigned int", because there it fits. Because
> BIT31 already gets type "unsigned int", we could append the "u" suffix
> to BIT31 (and BIT31 only), without any change in behavior.
> 
> This also means that you already get very different results for the
> following two assignments:
> 
>   Value64 = ~BIT30;
>   Value64 = ~BIT31;
> 
> Now, in an ideal world:
> - all BIT0..BIT31 macros would carry the U suffix,
> - we'd *never* apply bitwise complement to signed integers (even though
>   the result of that is implementation-defined, not undefined or
>   unspecified),

That would certainly be desirable to implement anyway.
I'll have a look and see what -Weverything does with that.
It should certainly be purged from all Tianocore trees.

> - we'd write all expressions similar to the above as
> 
>   Value64 = ~(UINT64)BIT30;
>   Value64 = ~(UINT64)BIT31;
> 
> I don't think we can audit all such uses now, however.
> 
> The present patch differs because it's -- probably -- not hard to review
> all uses of the macros being modified here.

Really not.

Right, so thanks for the persuasive and educational argument.
I now agree we shouldn't change bits 0-31 (including 31 - what would
be the point?).

I shall commence scheming on replacing them instead.

Regards,

Leif

> Thanks
> Laszlo
> 
> > 
> > /
> >     Leif
> > 
> >> Contributed-under: TianoCore Contribution Agreement 1.1
> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >> ---
> >>  EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c | 10 +++++-----
> >>  1 file changed, 5 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c b/EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c
> >> index 069d87ca780d..7931d1ac4e2b 100644
> >> --- a/EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c
> >> +++ b/EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c
> >> @@ -40,11 +40,11 @@
> >>  //---------------------------------------------
> >>  // UART Register Bit Defines
> >>  //---------------------------------------------
> >> -#define LSR_TXRDY               0x20
> >> -#define LSR_RXDA                0x01
> >> -#define DLAB                    0x01
> >> -#define ENABLE_FIFO             0x01
> >> -#define CLEAR_FIFOS             0x06
> >> +#define LSR_TXRDY               0x20U
> >> +#define LSR_RXDA                0x01U
> >> +#define DLAB                    0x01U
> >> +#define ENABLE_FIFO             0x01U
> >> +#define CLEAR_FIFOS             0x06U
> >>  
> >>  
> >>  
> >> -- 
> >> 2.17.1
> >>
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
> > 
>