From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=66.187.233.73; helo=mx1.redhat.com; envelope-from=lersek@redhat.com; receiver=edk2-devel@lists.01.org Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2F28120954CB8 for ; Tue, 27 Feb 2018 11:48:11 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 89FAC40201A5; Tue, 27 Feb 2018 19:54:17 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-21.rdu2.redhat.com [10.10.120.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id 85D942144B21; Tue, 27 Feb 2018 19:54:16 +0000 (UTC) To: =?UTF-8?Q?Marvin_H=c3=a4user?= , "edk2-devel@lists.01.org" Cc: "michael.d.kinney@intel.com" , "liming.gao@intel.com" References: From: Laszlo Ersek Message-ID: <62c9363b-7f27-cfff-492a-560660727b86@redhat.com> Date: Tue, 27 Feb 2018 20:54:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Tue, 27 Feb 2018 19:54:17 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Tue, 27 Feb 2018 19:54:17 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'lersek@redhat.com' RCPT:'' Subject: Re: [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise operations. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Feb 2018 19:48:12 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit On 02/27/18 17:47, Marvin Häuser wrote: > As per the C standard, bit-level operations on signed integers are > either undefined or implementation-defined. Hence, mark all BIT > defines and shifts as unsigned to safely allow such operations. Sigh, this is why threading is important in patch sets. :/ This patch comes first, and it sort of invalidates my review for patch #2 in this series. I think this change is a good idea *in theory* (arguably the macros should have been introduced like this in the first place). However, at this point, quite a bit of code could silently see its meaning changed. Let me give you a pathologic example (it *does* occur in the wild, although maybe not in edk2): { UINT64 MyMask; MyMask = ~BIT0; } Before your patch, MyMask is assigned the value 0xFFFF_FFFF_FFFF_FFFE. This is because: - BIT0 has type "int" and value 1 - Due to our (implementation-defined) representation of negative integers, namely two's complement, ~BIT0 (also having type "int") stands for value (-2). - When this value is converted to UINT64 ("unsigned long long"), we get 18446744073709551615 + 1 + (-2) == 18446744073709551614. That's 0xFFFF_FFFF_FFFF_FFFE, which is the mask value that we intended. (An alternative way people sometimes put this is that ~BIT0 is "sign extended". I strongly dislike this term when speaking about C. It is appropriate for (x86?) assembly language, but not for C, in my opinion.) In the above example, the "clever" programmer consciously exploited the two's complement representation for the bit-neg, and the conversion of the negative value to "unsigned long long" (as dictated by the standard). Now, if you change BIT0 to have type "unsigned int" (aka UINT32), then ~BIT0 will have value 4294967294 (0xFFFF_FFFE), and type "unsigned int". When this ~BIT0 is converted to UINT64, the value will be preserved, and MyMask will end up with 0x0000_0000_FFFF_FFFE. Likely not the mask that the clever programmer intended. (An alternative way to state this is that ~BIT0 is "zero extended". Again, I think this is inappropriate for C language discussion; it's fine for (x86?) assembly.) My point is that such innocent-looking (and otherwise really good!) changes for *central* macros require a large audit effort. More comments below: > For the SIGNATURE macros, add several casts to account for int > promotions, which might be signed. > > Contributed-under: TianoCore Contribution Agreement 1.1 > Signed-off-by: Marvin Haeuser > --- > MdePkg/Include/Base.h | 160 ++++++++++---------- > 1 file changed, 80 insertions(+), 80 deletions(-) > > diff --git a/MdePkg/Include/Base.h b/MdePkg/Include/Base.h > index a94182f08886..f108ed92eb0b 100644 > --- a/MdePkg/Include/Base.h > +++ b/MdePkg/Include/Base.h > @@ -404,38 +404,38 @@ struct _LIST_ENTRY { > #define MIN_INT32 (((INT32) -2147483647) - 1) > #define MIN_INT64 (((INT64) -9223372036854775807LL) - 1) > > -#define BIT0 0x00000001 > -#define BIT1 0x00000002 > -#define BIT2 0x00000004 > -#define BIT3 0x00000008 > -#define BIT4 0x00000010 > -#define BIT5 0x00000020 > -#define BIT6 0x00000040 > -#define BIT7 0x00000080 > -#define BIT8 0x00000100 > -#define BIT9 0x00000200 > -#define BIT10 0x00000400 > -#define BIT11 0x00000800 > -#define BIT12 0x00001000 > -#define BIT13 0x00002000 > -#define BIT14 0x00004000 > -#define BIT15 0x00008000 > -#define BIT16 0x00010000 > -#define BIT17 0x00020000 > -#define BIT18 0x00040000 > -#define BIT19 0x00080000 > -#define BIT20 0x00100000 > -#define BIT21 0x00200000 > -#define BIT22 0x00400000 > -#define BIT23 0x00800000 > -#define BIT24 0x01000000 > -#define BIT25 0x02000000 > -#define BIT26 0x04000000 > -#define BIT27 0x08000000 > -#define BIT28 0x10000000 > -#define BIT29 0x20000000 > -#define BIT30 0x40000000 > -#define BIT31 0x80000000 > +#define BIT0 0x00000001U > +#define BIT1 0x00000002U > +#define BIT2 0x00000004U > +#define BIT3 0x00000008U > +#define BIT4 0x00000010U > +#define BIT5 0x00000020U > +#define BIT6 0x00000040U > +#define BIT7 0x00000080U > +#define BIT8 0x00000100U > +#define BIT9 0x00000200U > +#define BIT10 0x00000400U > +#define BIT11 0x00000800U > +#define BIT12 0x00001000U > +#define BIT13 0x00002000U > +#define BIT14 0x00004000U > +#define BIT15 0x00008000U > +#define BIT16 0x00010000U > +#define BIT17 0x00020000U > +#define BIT18 0x00040000U > +#define BIT19 0x00080000U > +#define BIT20 0x00100000U > +#define BIT21 0x00200000U > +#define BIT22 0x00400000U > +#define BIT23 0x00800000U > +#define BIT24 0x01000000U > +#define BIT25 0x02000000U > +#define BIT26 0x04000000U > +#define BIT27 0x08000000U > +#define BIT28 0x10000000U > +#define BIT29 0x20000000U > +#define BIT30 0x40000000U > +#define BIT31 0x80000000U > #define BIT32 0x0000000100000000ULL > #define BIT33 0x0000000200000000ULL > #define BIT34 0x0000000400000000ULL > @@ -469,28 +469,28 @@ struct _LIST_ENTRY { > #define BIT62 0x4000000000000000ULL > #define BIT63 0x8000000000000000ULL > > -#define SIZE_1KB 0x00000400 > -#define SIZE_2KB 0x00000800 > -#define SIZE_4KB 0x00001000 > -#define SIZE_8KB 0x00002000 > -#define SIZE_16KB 0x00004000 > -#define SIZE_32KB 0x00008000 > -#define SIZE_64KB 0x00010000 > -#define SIZE_128KB 0x00020000 > -#define SIZE_256KB 0x00040000 > -#define SIZE_512KB 0x00080000 > -#define SIZE_1MB 0x00100000 > -#define SIZE_2MB 0x00200000 > -#define SIZE_4MB 0x00400000 > -#define SIZE_8MB 0x00800000 > -#define SIZE_16MB 0x01000000 > -#define SIZE_32MB 0x02000000 > -#define SIZE_64MB 0x04000000 > -#define SIZE_128MB 0x08000000 > -#define SIZE_256MB 0x10000000 > -#define SIZE_512MB 0x20000000 > -#define SIZE_1GB 0x40000000 > -#define SIZE_2GB 0x80000000 > +#define SIZE_1KB 0x00000400U > +#define SIZE_2KB 0x00000800U > +#define SIZE_4KB 0x00001000U > +#define SIZE_8KB 0x00002000U > +#define SIZE_16KB 0x00004000U > +#define SIZE_32KB 0x00008000U > +#define SIZE_64KB 0x00010000U > +#define SIZE_128KB 0x00020000U > +#define SIZE_256KB 0x00040000U > +#define SIZE_512KB 0x00080000U > +#define SIZE_1MB 0x00100000U > +#define SIZE_2MB 0x00200000U > +#define SIZE_4MB 0x00400000U > +#define SIZE_8MB 0x00800000U > +#define SIZE_16MB 0x01000000U > +#define SIZE_32MB 0x02000000U > +#define SIZE_64MB 0x04000000U > +#define SIZE_128MB 0x08000000U > +#define SIZE_256MB 0x10000000U > +#define SIZE_512MB 0x20000000U > +#define SIZE_1GB 0x40000000U > +#define SIZE_2GB 0x80000000U > #define SIZE_4GB 0x0000000100000000ULL > #define SIZE_8GB 0x0000000200000000ULL > #define SIZE_16GB 0x0000000400000000ULL > @@ -524,28 +524,28 @@ struct _LIST_ENTRY { > #define SIZE_4EB 0x4000000000000000ULL > #define SIZE_8EB 0x8000000000000000ULL > > -#define BASE_1KB 0x00000400 > -#define BASE_2KB 0x00000800 > -#define BASE_4KB 0x00001000 > -#define BASE_8KB 0x00002000 > -#define BASE_16KB 0x00004000 > -#define BASE_32KB 0x00008000 > -#define BASE_64KB 0x00010000 > -#define BASE_128KB 0x00020000 > -#define BASE_256KB 0x00040000 > -#define BASE_512KB 0x00080000 > -#define BASE_1MB 0x00100000 > -#define BASE_2MB 0x00200000 > -#define BASE_4MB 0x00400000 > -#define BASE_8MB 0x00800000 > -#define BASE_16MB 0x01000000 > -#define BASE_32MB 0x02000000 > -#define BASE_64MB 0x04000000 > -#define BASE_128MB 0x08000000 > -#define BASE_256MB 0x10000000 > -#define BASE_512MB 0x20000000 > -#define BASE_1GB 0x40000000 > -#define BASE_2GB 0x80000000 > +#define BASE_1KB 0x00000400U > +#define BASE_2KB 0x00000800U > +#define BASE_4KB 0x00001000U > +#define BASE_8KB 0x00002000U > +#define BASE_16KB 0x00004000U > +#define BASE_32KB 0x00008000U > +#define BASE_64KB 0x00010000U > +#define BASE_128KB 0x00020000U > +#define BASE_256KB 0x00040000U > +#define BASE_512KB 0x00080000U > +#define BASE_1MB 0x00100000U > +#define BASE_2MB 0x00200000U > +#define BASE_4MB 0x00400000U > +#define BASE_8MB 0x00800000U > +#define BASE_16MB 0x01000000U > +#define BASE_32MB 0x02000000U > +#define BASE_64MB 0x04000000U > +#define BASE_128MB 0x08000000U > +#define BASE_256MB 0x10000000U > +#define BASE_512MB 0x20000000U > +#define BASE_1GB 0x40000000U > +#define BASE_2GB 0x80000000U > #define BASE_4GB 0x0000000100000000ULL > #define BASE_8GB 0x0000000200000000ULL > #define BASE_16GB 0x0000000400000000ULL > @@ -974,7 +974,7 @@ typedef UINTN RETURN_STATUS; > @return The value specified by StatusCode with the highest bit set. > > **/ > -#define ENCODE_ERROR(StatusCode) ((RETURN_STATUS)(MAX_BIT | (StatusCode))) > +#define ENCODE_ERROR(StatusCode) ((RETURN_STATUS)(MAX_BIT | (StatusCode##ULL))) > MAX_BIT always has type UINTN (UINT64 aka "unsigned long long" on 64-bit platforms, and UINT32 aka "unsigned int" on 32-bit platforms). This means that ENCODE_ERROR results in a UINTN, right now. After the change, it would always result in a UINT64. I think that could be a problem for 32-bit platforms, especially if you push open-coded RETURN_xxx status codes to the stack, for example for printing with "%r". Pushing a UINT64 arg instead of a UINT32 arg through the ellipsis (...) might throw off the rest of the argument processing. So, I wouldn't recommend this change. What is the exact motivation for it? > /** > Produces a RETURN_STATUS code with the highest bit clear. > @@ -1221,7 +1221,7 @@ typedef UINTN RETURN_STATUS; > @return A 16-bit value built from the two ASCII characters specified by A and B. > > **/ > -#define SIGNATURE_16(A, B) ((A) | (B << 8)) > +#define SIGNATURE_16(A, B) ((UINT16)(A) | (UINT16)((UINT16)(B) << 8U)) This might not have the intended effect. Due to "int" (INT32) being capable of representing the entire domain of "unsigned short" (UINT16), both operands of the bitwise OR operator would be promoted to "int". The result would have type "int". Then, even if you cast the entire expression to UINT16 in the replacement text, it would still be promoted back to "int" in almost every context. (One exception would be the operand of the "sizeof" operator -- the expression operand of the "sizeof" operator is only evaluated if it has variable length array (VLA) type). I don't think this change is helpful. > > /** > Returns a 32-bit signature built from 4 ASCII characters. > @@ -1238,7 +1238,7 @@ typedef UINTN RETURN_STATUS; > C and D. > > **/ > -#define SIGNATURE_32(A, B, C, D) (SIGNATURE_16 (A, B) | (SIGNATURE_16 (C, D) << 16)) > +#define SIGNATURE_32(A, B, C, D) ((UINT32)SIGNATURE_16 (A, B) | (UINT32)((UINT32)SIGNATURE_16 (C, D) << 16U)) This looks better; the original sub-expression SIGNATURE_16 (C, D) << 16 invokes undefined behavior *if* (SIGNATURE_16 (C, D)) has bit#15 set. In practice that should never happen though: the documentation says that the macro arguments have to be "ASCII characters", and bit#7 is clear in those. Anyway, if we want to be pedantic, the fix can be expressed more simply, like this: (SIGNATURE_16 (A, B) | ((UINT32)SIGNATURE_16 (C, D) << 16)) ^^^^^^^^ Then the UINT32 type ("unsigned int") will "cascade" through the entire expression -- (SIGNATURE_16 (A, B)) will be converted from "int" to "unsigned int" for the bitwise OR, and the result will also have type UINT32: (INT32_VAL | (UINT32)INT32_VAL << 16) (INT32_VAL | UINT32_VAL) (UINT32_VAL | UINT32_VAL) (UINT32_VAL) > > /** > Returns a 64-bit signature built from 8 ASCII characters. > @@ -1260,7 +1260,7 @@ typedef UINTN RETURN_STATUS; > > **/ > #define SIGNATURE_64(A, B, C, D, E, F, G, H) \ > - (SIGNATURE_32 (A, B, C, D) | ((UINT64) (SIGNATURE_32 (E, F, G, H)) << 32)) > + ((UINT64)SIGNATURE_32 (A, B, C, D) | ((UINT64) ((UINT64)SIGNATURE_32 (E, F, G, H)) << 32U)) > > #if defined(_MSC_EXTENSIONS) && !defined (__INTEL_COMPILER) && !defined (MDE_CPU_EBC) > void * _ReturnAddress(void); > Assuming that the SIGNATURE_32 is *either* fixed (so that it produces a UINT32 value) *or* used in accordance with the documentation (ie., only ASCII characters as arguments), the replacement text for SIGNATURE_64 is already correct, and this hunk is unneeded. In the former case (SIGNATURE_32 fixed), we have: (UINT32_VAL | (UINT64)UINT32_VAL << 32) (UINT32_VAL | UINT64_VAL) (UINT64_VAL | UINT64_VAL) (UINT64_VAL) In the latter case -- SIGNATURE_32 left unchanged, but call sites only use ASCII char args --, we have: (INT32_VAL | (UINT64)INT32_VAL << 32) (INT32_VAL | UINT64_VAL) (UINT64_VAL | UINT64_VAL) (UINT64_VAL) which does what we want, because both INT32 --> UINT64 conversions start on non-negative values (i.e., bit#15 is clear in both SIGNATURE_32() expansions). So, I think we don't need to touch SIGNATURE_64(). Thanks, Laszlo