From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <lersek@redhat.com>
Received-SPF: Pass (sender SPF authorized) identity=mailfrom;
 client-ip=66.187.233.73; helo=mx1.redhat.com; envelope-from=lersek@redhat.com;
 receiver=edk2-devel@lists.01.org 
Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 2F28120954CB8
 for <edk2-devel@lists.01.org>; Tue, 27 Feb 2018 11:48:11 -0800 (PST)
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com
 [10.11.54.6])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id 89FAC40201A5;
 Tue, 27 Feb 2018 19:54:17 +0000 (UTC)
Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-21.rdu2.redhat.com
 [10.10.120.21])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 85D942144B21;
 Tue, 27 Feb 2018 19:54:16 +0000 (UTC)
To: =?UTF-8?Q?Marvin_H=c3=a4user?= <Marvin.Haeuser@outlook.com>,
 "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: "michael.d.kinney@intel.com" <michael.d.kinney@intel.com>,
 "liming.gao@intel.com" <liming.gao@intel.com>
References: <AM4PR06MB14910F94889E6F6D5D9B648C80C00@AM4PR06MB1491.eurprd06.prod.outlook.com>
From: Laszlo Ersek <lersek@redhat.com>
Message-ID: <62c9363b-7f27-cfff-492a-560660727b86@redhat.com>
Date: Tue, 27 Feb 2018 20:54:14 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <AM4PR06MB14910F94889E6F6D5D9B648C80C00@AM4PR06MB1491.eurprd06.prod.outlook.com>
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.11.55.6]); Tue, 27 Feb 2018 19:54:17 +0000 (UTC)
X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]);
 Tue, 27 Feb 2018 19:54:17 +0000 (UTC) for IP:'10.11.54.6'
 DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com'
 HELO:'smtp.corp.redhat.com' FROM:'lersek@redhat.com' RCPT:''
Subject: Re: [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise operations.
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Feb 2018 19:48:12 -0000
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit

On 02/27/18 17:47, Marvin Häuser wrote:
> As per the C standard, bit-level operations on signed integers are
> either undefined or implementation-defined. Hence, mark all BIT
> defines and shifts as unsigned to safely allow such operations.

Sigh, this is why threading is important in patch sets. :/ This patch
comes first, and it sort of invalidates my review for patch #2 in this
series.

I think this change is a good idea *in theory* (arguably the macros
should have been introduced like this in the first place). However, at
this point, quite a bit of code could silently see its meaning changed.

Let me give you a pathologic example (it *does* occur in the wild,
although maybe not in edk2):

{
  UINT64 MyMask;

  MyMask = ~BIT0;
}

Before your patch, MyMask is assigned the value 0xFFFF_FFFF_FFFF_FFFE.

This is because:

- BIT0 has type "int" and value 1

- Due to our (implementation-defined) representation of negative
integers, namely two's complement, ~BIT0 (also having type "int") stands
for value (-2).

- When this value is converted to UINT64 ("unsigned long long"), we get
18446744073709551615 + 1 + (-2) == 18446744073709551614. That's
0xFFFF_FFFF_FFFF_FFFE, which is the mask value that we intended.

(An alternative way people sometimes put this is that ~BIT0 is "sign
extended". I strongly dislike this term when speaking about C. It is
appropriate for (x86?) assembly language, but not for C, in my opinion.)


In the above example, the "clever" programmer consciously exploited the
two's complement representation for the bit-neg, and the conversion of
the negative value to "unsigned long long" (as dictated by the standard).


Now, if you change BIT0 to have type "unsigned int" (aka UINT32), then
~BIT0 will have value 4294967294 (0xFFFF_FFFE), and type "unsigned int".

When this ~BIT0 is converted to UINT64, the value will be preserved, and
MyMask will end up with 0x0000_0000_FFFF_FFFE. Likely not the mask that
the clever programmer intended.

(An alternative way to state this is that ~BIT0 is "zero extended".
Again, I think this is inappropriate for C language discussion; it's
fine for (x86?) assembly.)

My point is that such innocent-looking (and otherwise really good!)
changes for *central* macros require a large audit effort.

More comments below:

> For the SIGNATURE macros, add several casts to account for int
> promotions, which might be signed.
> 
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Marvin Haeuser <Marvin.Haeuser@outlook.com>
> ---
>  MdePkg/Include/Base.h | 160 ++++++++++----------
>  1 file changed, 80 insertions(+), 80 deletions(-)
> 
> diff --git a/MdePkg/Include/Base.h b/MdePkg/Include/Base.h
> index a94182f08886..f108ed92eb0b 100644
> --- a/MdePkg/Include/Base.h
> +++ b/MdePkg/Include/Base.h
> @@ -404,38 +404,38 @@ struct _LIST_ENTRY {
>  #define MIN_INT32  (((INT32) -2147483647) - 1)
>  #define MIN_INT64  (((INT64) -9223372036854775807LL) - 1)
>  
> -#define  BIT0     0x00000001
> -#define  BIT1     0x00000002
> -#define  BIT2     0x00000004
> -#define  BIT3     0x00000008
> -#define  BIT4     0x00000010
> -#define  BIT5     0x00000020
> -#define  BIT6     0x00000040
> -#define  BIT7     0x00000080
> -#define  BIT8     0x00000100
> -#define  BIT9     0x00000200
> -#define  BIT10    0x00000400
> -#define  BIT11    0x00000800
> -#define  BIT12    0x00001000
> -#define  BIT13    0x00002000
> -#define  BIT14    0x00004000
> -#define  BIT15    0x00008000
> -#define  BIT16    0x00010000
> -#define  BIT17    0x00020000
> -#define  BIT18    0x00040000
> -#define  BIT19    0x00080000
> -#define  BIT20    0x00100000
> -#define  BIT21    0x00200000
> -#define  BIT22    0x00400000
> -#define  BIT23    0x00800000
> -#define  BIT24    0x01000000
> -#define  BIT25    0x02000000
> -#define  BIT26    0x04000000
> -#define  BIT27    0x08000000
> -#define  BIT28    0x10000000
> -#define  BIT29    0x20000000
> -#define  BIT30    0x40000000
> -#define  BIT31    0x80000000
> +#define  BIT0     0x00000001U
> +#define  BIT1     0x00000002U
> +#define  BIT2     0x00000004U
> +#define  BIT3     0x00000008U
> +#define  BIT4     0x00000010U
> +#define  BIT5     0x00000020U
> +#define  BIT6     0x00000040U
> +#define  BIT7     0x00000080U
> +#define  BIT8     0x00000100U
> +#define  BIT9     0x00000200U
> +#define  BIT10    0x00000400U
> +#define  BIT11    0x00000800U
> +#define  BIT12    0x00001000U
> +#define  BIT13    0x00002000U
> +#define  BIT14    0x00004000U
> +#define  BIT15    0x00008000U
> +#define  BIT16    0x00010000U
> +#define  BIT17    0x00020000U
> +#define  BIT18    0x00040000U
> +#define  BIT19    0x00080000U
> +#define  BIT20    0x00100000U
> +#define  BIT21    0x00200000U
> +#define  BIT22    0x00400000U
> +#define  BIT23    0x00800000U
> +#define  BIT24    0x01000000U
> +#define  BIT25    0x02000000U
> +#define  BIT26    0x04000000U
> +#define  BIT27    0x08000000U
> +#define  BIT28    0x10000000U
> +#define  BIT29    0x20000000U
> +#define  BIT30    0x40000000U
> +#define  BIT31    0x80000000U
>  #define  BIT32    0x0000000100000000ULL
>  #define  BIT33    0x0000000200000000ULL
>  #define  BIT34    0x0000000400000000ULL
> @@ -469,28 +469,28 @@ struct _LIST_ENTRY {
>  #define  BIT62    0x4000000000000000ULL
>  #define  BIT63    0x8000000000000000ULL
>  
> -#define  SIZE_1KB    0x00000400
> -#define  SIZE_2KB    0x00000800
> -#define  SIZE_4KB    0x00001000
> -#define  SIZE_8KB    0x00002000
> -#define  SIZE_16KB   0x00004000
> -#define  SIZE_32KB   0x00008000
> -#define  SIZE_64KB   0x00010000
> -#define  SIZE_128KB  0x00020000
> -#define  SIZE_256KB  0x00040000
> -#define  SIZE_512KB  0x00080000
> -#define  SIZE_1MB    0x00100000
> -#define  SIZE_2MB    0x00200000
> -#define  SIZE_4MB    0x00400000
> -#define  SIZE_8MB    0x00800000
> -#define  SIZE_16MB   0x01000000
> -#define  SIZE_32MB   0x02000000
> -#define  SIZE_64MB   0x04000000
> -#define  SIZE_128MB  0x08000000
> -#define  SIZE_256MB  0x10000000
> -#define  SIZE_512MB  0x20000000
> -#define  SIZE_1GB    0x40000000
> -#define  SIZE_2GB    0x80000000
> +#define  SIZE_1KB    0x00000400U
> +#define  SIZE_2KB    0x00000800U
> +#define  SIZE_4KB    0x00001000U
> +#define  SIZE_8KB    0x00002000U
> +#define  SIZE_16KB   0x00004000U
> +#define  SIZE_32KB   0x00008000U
> +#define  SIZE_64KB   0x00010000U
> +#define  SIZE_128KB  0x00020000U
> +#define  SIZE_256KB  0x00040000U
> +#define  SIZE_512KB  0x00080000U
> +#define  SIZE_1MB    0x00100000U
> +#define  SIZE_2MB    0x00200000U
> +#define  SIZE_4MB    0x00400000U
> +#define  SIZE_8MB    0x00800000U
> +#define  SIZE_16MB   0x01000000U
> +#define  SIZE_32MB   0x02000000U
> +#define  SIZE_64MB   0x04000000U
> +#define  SIZE_128MB  0x08000000U
> +#define  SIZE_256MB  0x10000000U
> +#define  SIZE_512MB  0x20000000U
> +#define  SIZE_1GB    0x40000000U
> +#define  SIZE_2GB    0x80000000U
>  #define  SIZE_4GB    0x0000000100000000ULL
>  #define  SIZE_8GB    0x0000000200000000ULL
>  #define  SIZE_16GB   0x0000000400000000ULL
> @@ -524,28 +524,28 @@ struct _LIST_ENTRY {
>  #define  SIZE_4EB    0x4000000000000000ULL
>  #define  SIZE_8EB    0x8000000000000000ULL
>  
> -#define  BASE_1KB    0x00000400
> -#define  BASE_2KB    0x00000800
> -#define  BASE_4KB    0x00001000
> -#define  BASE_8KB    0x00002000
> -#define  BASE_16KB   0x00004000
> -#define  BASE_32KB   0x00008000
> -#define  BASE_64KB   0x00010000
> -#define  BASE_128KB  0x00020000
> -#define  BASE_256KB  0x00040000
> -#define  BASE_512KB  0x00080000
> -#define  BASE_1MB    0x00100000
> -#define  BASE_2MB    0x00200000
> -#define  BASE_4MB    0x00400000
> -#define  BASE_8MB    0x00800000
> -#define  BASE_16MB   0x01000000
> -#define  BASE_32MB   0x02000000
> -#define  BASE_64MB   0x04000000
> -#define  BASE_128MB  0x08000000
> -#define  BASE_256MB  0x10000000
> -#define  BASE_512MB  0x20000000
> -#define  BASE_1GB    0x40000000
> -#define  BASE_2GB    0x80000000
> +#define  BASE_1KB    0x00000400U
> +#define  BASE_2KB    0x00000800U
> +#define  BASE_4KB    0x00001000U
> +#define  BASE_8KB    0x00002000U
> +#define  BASE_16KB   0x00004000U
> +#define  BASE_32KB   0x00008000U
> +#define  BASE_64KB   0x00010000U
> +#define  BASE_128KB  0x00020000U
> +#define  BASE_256KB  0x00040000U
> +#define  BASE_512KB  0x00080000U
> +#define  BASE_1MB    0x00100000U
> +#define  BASE_2MB    0x00200000U
> +#define  BASE_4MB    0x00400000U
> +#define  BASE_8MB    0x00800000U
> +#define  BASE_16MB   0x01000000U
> +#define  BASE_32MB   0x02000000U
> +#define  BASE_64MB   0x04000000U
> +#define  BASE_128MB  0x08000000U
> +#define  BASE_256MB  0x10000000U
> +#define  BASE_512MB  0x20000000U
> +#define  BASE_1GB    0x40000000U
> +#define  BASE_2GB    0x80000000U
>  #define  BASE_4GB    0x0000000100000000ULL
>  #define  BASE_8GB    0x0000000200000000ULL
>  #define  BASE_16GB   0x0000000400000000ULL
> @@ -974,7 +974,7 @@ typedef UINTN RETURN_STATUS;
>    @return The value specified by StatusCode with the highest bit set.
>  
>  **/
> -#define ENCODE_ERROR(StatusCode)     ((RETURN_STATUS)(MAX_BIT | (StatusCode)))
> +#define ENCODE_ERROR(StatusCode)     ((RETURN_STATUS)(MAX_BIT | (StatusCode##ULL)))
>  

MAX_BIT always has type UINTN (UINT64 aka "unsigned long long" on 64-bit
platforms, and UINT32 aka "unsigned int" on 32-bit platforms). This
means that ENCODE_ERROR results in a UINTN, right now.

After the change, it would always result in a UINT64. I think that could
be a problem for 32-bit platforms, especially if you push open-coded
RETURN_xxx status codes to the stack, for example for printing with
"%r". Pushing a UINT64 arg instead of a UINT32 arg through the ellipsis
(...) might throw off the rest of the argument processing.

So, I wouldn't recommend this change. What is the exact motivation for it?

>  /**
>    Produces a RETURN_STATUS code with the highest bit clear.
> @@ -1221,7 +1221,7 @@ typedef UINTN RETURN_STATUS;
>    @return A 16-bit value built from the two ASCII characters specified by A and B.
>  
>  **/
> -#define SIGNATURE_16(A, B)        ((A) | (B << 8))
> +#define SIGNATURE_16(A, B)        ((UINT16)(A) | (UINT16)((UINT16)(B) << 8U))

This might not have the intended effect. Due to "int" (INT32) being
capable of representing the entire domain of "unsigned short" (UINT16),
both operands of the bitwise OR operator would be promoted to "int". The
result would have type "int".

Then, even if you cast the entire expression to UINT16 in the
replacement text, it would still be promoted back to "int" in almost
every context. (One exception would be the operand of the "sizeof"
operator -- the expression operand of the "sizeof" operator is only
evaluated if it has variable length array (VLA) type).

I don't think this change is helpful.

>  
>  /**
>    Returns a 32-bit signature built from 4 ASCII characters.
> @@ -1238,7 +1238,7 @@ typedef UINTN RETURN_STATUS;
>            C and D.
>  
>  **/
> -#define SIGNATURE_32(A, B, C, D)  (SIGNATURE_16 (A, B) | (SIGNATURE_16 (C, D) << 16))
> +#define SIGNATURE_32(A, B, C, D)  ((UINT32)SIGNATURE_16 (A, B) | (UINT32)((UINT32)SIGNATURE_16 (C, D) << 16U))

This looks better; the original sub-expression

  SIGNATURE_16 (C, D) << 16

invokes undefined behavior *if* (SIGNATURE_16 (C, D)) has bit#15 set.

In practice that should never happen though: the documentation says that
the macro arguments have to be "ASCII characters", and bit#7 is clear in
those.

Anyway, if we want to be pedantic, the fix can be expressed more simply,
like this:

  (SIGNATURE_16 (A, B) | ((UINT32)SIGNATURE_16 (C, D) << 16))
                          ^^^^^^^^

Then the UINT32 type ("unsigned int") will "cascade" through the entire
expression -- (SIGNATURE_16 (A, B)) will be converted from "int" to
"unsigned int" for the bitwise OR, and the result will also have type
UINT32:

  (INT32_VAL  | (UINT32)INT32_VAL << 16)
  (INT32_VAL  | UINT32_VAL)
  (UINT32_VAL | UINT32_VAL)
  (UINT32_VAL)

>  
>  /**
>    Returns a 64-bit signature built from 8 ASCII characters.
> @@ -1260,7 +1260,7 @@ typedef UINTN RETURN_STATUS;
>  
>  **/
>  #define SIGNATURE_64(A, B, C, D, E, F, G, H) \
> -    (SIGNATURE_32 (A, B, C, D) | ((UINT64) (SIGNATURE_32 (E, F, G, H)) << 32))
> +    ((UINT64)SIGNATURE_32 (A, B, C, D) | ((UINT64) ((UINT64)SIGNATURE_32 (E, F, G, H)) << 32U))
>  
>  #if defined(_MSC_EXTENSIONS) && !defined (__INTEL_COMPILER) && !defined (MDE_CPU_EBC)
>    void * _ReturnAddress(void);
> 

Assuming that the SIGNATURE_32 is *either* fixed (so that it produces a
UINT32 value) *or* used in accordance with the documentation (ie., only
ASCII characters as arguments), the replacement text for SIGNATURE_64 is
already correct, and this hunk is unneeded.

In the former case (SIGNATURE_32 fixed), we have:

  (UINT32_VAL | (UINT64)UINT32_VAL << 32)
  (UINT32_VAL | UINT64_VAL)
  (UINT64_VAL | UINT64_VAL)
  (UINT64_VAL)

In the latter case -- SIGNATURE_32 left unchanged, but call sites only
use ASCII char args --, we have:

  (INT32_VAL  | (UINT64)INT32_VAL << 32)
  (INT32_VAL  | UINT64_VAL)
  (UINT64_VAL | UINT64_VAL)
  (UINT64_VAL)

which does what we want, because both INT32 --> UINT64 conversions start
on non-negative values (i.e., bit#15 is clear in both SIGNATURE_32()
expansions).

So, I think we don't need to touch SIGNATURE_64().

Thanks,
Laszlo