Re: [PATCH] EmbeddedPkg/GdbSerialLib: avoid left shift of negative quantity

From: Laszlo Ersek <lersek@redhat.com>
To: Leif Lindholm <leif.lindholm@linaro.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: edk2-devel@lists.01.org
Subject: Re: [PATCH] EmbeddedPkg/GdbSerialLib: avoid left shift of negative quantity
Date: Tue, 19 Jun 2018 00:51:49 +0200	[thread overview]
Message-ID: <96210413-f241-63d1-3e96-a77757d3b0e4@redhat.com> (raw)
In-Reply-To: <20180618215758.vsoetpww4ax64v6g@bivouac.eciton.net>

On 06/18/18 23:57, Leif Lindholm wrote:
> On Mon, Jun 18, 2018 at 10:49:18PM +0200, Ard Biesheuvel wrote:
>> Clang complains about left shifting a negative value being undefined.
> 
> As well it should.
> 
>>   EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c:151:30:
>>   error: shifting a negative signed value is undefined [-Werror,-Wshift-negative-value]
>>   OutputData = (UINT8)((~DLAB<<7)|((BreakSet<<6)|((Parity<<3)|((StopBits<<2)| Data))));
>>
>> Redefine all bit pattern constants as unsigned to work around this.
> 
> So, I'm totally OK with this and
> Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>
> but ...
> would it be worth fixing up BIT0-31 in Base.h and use those?

If we started with the BITxx macros now, I'd agree. Given the current
tree, I don't :) I've made an argument against the suggestion earlier:

  UINT64 Value64;

  Value64 = ~0x1;
  Value64 = ~0x1U;

The two assignments produce different results.

In the first case, the integer constant 0x1 has type "int". Our C
language implementation uses, for type "int", two's complement
representation, 1 sign bit, 31 value bits, and no padding bits. So after
the bitwise complement, we get 0x1111_1111_1111_1110, also with type
int, and with value (-2). Taking UINT64 for "unsigned long int" or
"unsigned long long int", after the conversion implied by the assignment
we get ((UINT64_MAX + 1) + (-2)), aka (UINT64_MAX - 1).

In the second case, the constant 0x1U has type "unsigned int". After the
bitwise complement, we get (UINT32_MAX - 1), also of type "unsigned
int". Taking UINT64 for "unsigned long int" or "unsigned long long int",
after the conversion implied by assignment we get the exact same value,
(UINT32_MAX - 1).

In assembly parlance this is called "sign-extended" vs. "zero-extended".
I dislike those terms when speaking about C; they are not necessary to
explain what happens. (The direction is the opposite -- the compiler
uses sign-extension / zero-extension, if the ISA supports those, for
implementing the C semantics.)

So, I don't recommend changing BIT0 through BIT30 in Base.h, unless we'd
like to audit all their uses :)

Note: "through BIT30" is not a typo above. BIT31 already has type
"unsigned int". That's because of how the "type ladder" for integer
constants works in C. It's easy to look up in the standard (or, well, in
the final draft), but here's the mental model I like to use for it:

- The ladder starts with "int", and works towards integer types with
  higher conversion ranks, until the constant fits.

- Normally only signed types are considered; however, when using the 0x
  (hex) or 0 (octal) prefixes, we add unsigned types to the ladder. Each
  of those will be considered right after the corresponding signed
  integer type, with equal conversion rank. The lesson here is that the
  0x and 0 prefixes *extend* the set of candidate types.

- The suffix "u" (or equivalently "U") *restricts* the ladder to
  unsigned types, however. (Regardless of prefix.)

- The suffixes "l" and "ll" (or equivalently, "L" and "LL", resp.) don't
  affect signedness, instead they affect how high we set our foot on the
  ladder at first. And, we climb up from there.

Given our "signed int" and "unsigned int" representations (see above),
BIT30 (0x40000000) fits in "int", so it gets the type "int". However,
BIT31 (0x80000000) does not fit in "int". Because we use the 0x prefix
with it, it gets the type "unsigned int", because there it fits. Because
BIT31 already gets type "unsigned int", we could append the "u" suffix
to BIT31 (and BIT31 only), without any change in behavior.

This also means that you already get very different results for the
following two assignments:

  Value64 = ~BIT30;
  Value64 = ~BIT31;

Now, in an ideal world:
- all BIT0..BIT31 macros would carry the U suffix,
- we'd *never* apply bitwise complement to signed integers (even though
  the result of that is implementation-defined, not undefined or
  unspecified),
- we'd write all expressions similar to the above as

  Value64 = ~(UINT64)BIT30;
  Value64 = ~(UINT64)BIT31;

I don't think we can audit all such uses now, however.

The present patch differs because it's -- probably -- not hard to review
all uses of the macros being modified here.

Thanks
Laszlo

> 
> /
>     Leif
> 
>> Contributed-under: TianoCore Contribution Agreement 1.1
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c | 10 +++++-----
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c b/EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c
>> index 069d87ca780d..7931d1ac4e2b 100644
>> --- a/EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c
>> +++ b/EmbeddedPkg/Library/GdbSerialLib/GdbSerialLib.c
>> @@ -40,11 +40,11 @@
>>  //---------------------------------------------
>>  // UART Register Bit Defines
>>  //---------------------------------------------
>> -#define LSR_TXRDY               0x20
>> -#define LSR_RXDA                0x01
>> -#define DLAB                    0x01
>> -#define ENABLE_FIFO             0x01
>> -#define CLEAR_FIFOS             0x06
>> +#define LSR_TXRDY               0x20U
>> +#define LSR_RXDA                0x01U
>> +#define DLAB                    0x01U
>> +#define ENABLE_FIFO             0x01U
>> +#define CLEAR_FIFOS             0x06U
>>  
>>  
>>  
>> -- 
>> 2.17.1
>>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>