Re: [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise operations.

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

From: "Marvin Häuser" <Marvin.Haeuser@outlook.com>
To: "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: Laszlo Ersek <lersek@redhat.com>,
	"michael.d.kinney@intel.com" <michael.d.kinney@intel.com>,
	"liming.gao@intel.com" <liming.gao@intel.com>
Subject: Re: [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise operations.
Date: Tue, 27 Feb 2018 20:31:55 +0000	[thread overview]
Message-ID: <AM4PR06MB14912DF47968C748F6BEB7A280C00@AM4PR06MB1491.eurprd06.prod.outlook.com> (raw)
In-Reply-To: <62c9363b-7f27-cfff-492a-560660727b86@redhat.com>

Hey Laszlo,

Thank you very much for your review, again. Comments are inline.

Regards,
Marvin.

> -----Original Message-----
> From: Laszlo Ersek <lersek@redhat.com>
> Sent: Tuesday, February 27, 2018 8:54 PM
> To: Marvin Häuser <Marvin.Haeuser@outlook.com>; edk2-
> devel@lists.01.org
> Cc: michael.d.kinney@intel.com; liming.gao@intel.com
> Subject: Re: [edk2] [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise
> operations.
> 
> On 02/27/18 17:47, Marvin Häuser wrote:
> > As per the C standard, bit-level operations on signed integers are
> > either undefined or implementation-defined. Hence, mark all BIT
> > defines and shifts as unsigned to safely allow such operations.
> 
> Sigh, this is why threading is important in patch sets. :/ This patch comes first,
> and it sort of invalidates my review for patch #2 in this series.
> 
> I think this change is a good idea *in theory* (arguably the macros should
> have been introduced like this in the first place). However, at this point, quite
> a bit of code could silently see its meaning changed.

You are right that likely at least some vendor has code that relies on the defines being signed. However, you seem to agree that bit definitions are naturally assumed to be unsigned and the same as this change could break code, not making the change could lead to non-functioning code that looks valid at first sight.
Regarding your comment on the 'truncating patches', saying the code relies on the definitions being signed: I honestly doubt the code was intentionally written with that in mind, but rather intuitively and it happened to work. I might be wrong of course.

> 
> Let me give you a pathologic example (it *does* occur in the wild, although
> maybe not in edk2):
> 
> {
>   UINT64 MyMask;
> 
>   MyMask = ~BIT0;
> }
> 
> Before your patch, MyMask is assigned the value 0xFFFF_FFFF_FFFF_FFFE.
> 
> This is because:
> 
> - BIT0 has type "int" and value 1
> 
> - Due to our (implementation-defined) representation of negative integers,
> namely two's complement, ~BIT0 (also having type "int") stands for value (-
> 2).

There is pretty much where my issue is. I don't think triggering implementation-defined behavior is a good idea, not to speak about relying on it. EDK no longer targets solely x86, there is even a new architecture (Arch-V or something?) on the way, so I consider relying on compiler specifics very dangerous.

> 
> - When this value is converted to UINT64 ("unsigned long long"), we get
> 18446744073709551615 + 1 + (-2) == 18446744073709551614. That's
> 0xFFFF_FFFF_FFFF_FFFE, which is the mask value that we intended.
> 
> (An alternative way people sometimes put this is that ~BIT0 is "sign
> extended". I strongly dislike this term when speaking about C. It is
> appropriate for (x86?) assembly language, but not for C, in my opinion.)
> 
> 
> In the above example, the "clever" programmer consciously exploited the
> two's complement representation for the bit-neg, and the conversion of the
> negative value to "unsigned long long" (as dictated by the standard).
> 
> 
> Now, if you change BIT0 to have type "unsigned int" (aka UINT32), then
> ~BIT0 will have value 4294967294 (0xFFFF_FFFE), and type "unsigned int".
> 
> When this ~BIT0 is converted to UINT64, the value will be preserved, and
> MyMask will end up with 0x0000_0000_FFFF_FFFE. Likely not the mask that
> the clever programmer intended.
> 
> (An alternative way to state this is that ~BIT0 is "zero extended".
> Again, I think this is inappropriate for C language discussion; it's fine for (x86?)
> assembly.)
> 
> My point is that such innocent-looking (and otherwise really good!) changes
> for *central* macros require a large audit effort.

Indeed that audit effort is required. How would that be handled for non-edk2 code anyway? Might there be a note in the UDK changelog, if accepted?

> 
> More comments below:
> 
> > For the SIGNATURE macros, add several casts to account for int
> > promotions, which might be signed.
> >
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Marvin Haeuser <Marvin.Haeuser@outlook.com>
> > ---
> >  MdePkg/Include/Base.h | 160 ++++++++++----------
> >  1 file changed, 80 insertions(+), 80 deletions(-)
> >
> > diff --git a/MdePkg/Include/Base.h b/MdePkg/Include/Base.h index
> > a94182f08886..f108ed92eb0b 100644
> > --- a/MdePkg/Include/Base.h
> > +++ b/MdePkg/Include/Base.h
> > @@ -404,38 +404,38 @@ struct _LIST_ENTRY {  #define MIN_INT32
> > (((INT32) -2147483647) - 1)  #define MIN_INT64  (((INT64)
> > -9223372036854775807LL) - 1)
> >
> > -#define  BIT0     0x00000001
> > -#define  BIT1     0x00000002
> > -#define  BIT2     0x00000004
> > -#define  BIT3     0x00000008
> > -#define  BIT4     0x00000010
> > -#define  BIT5     0x00000020
> > -#define  BIT6     0x00000040
> > -#define  BIT7     0x00000080
> > -#define  BIT8     0x00000100
> > -#define  BIT9     0x00000200
> > -#define  BIT10    0x00000400
> > -#define  BIT11    0x00000800
> > -#define  BIT12    0x00001000
> > -#define  BIT13    0x00002000
> > -#define  BIT14    0x00004000
> > -#define  BIT15    0x00008000
> > -#define  BIT16    0x00010000
> > -#define  BIT17    0x00020000
> > -#define  BIT18    0x00040000
> > -#define  BIT19    0x00080000
> > -#define  BIT20    0x00100000
> > -#define  BIT21    0x00200000
> > -#define  BIT22    0x00400000
> > -#define  BIT23    0x00800000
> > -#define  BIT24    0x01000000
> > -#define  BIT25    0x02000000
> > -#define  BIT26    0x04000000
> > -#define  BIT27    0x08000000
> > -#define  BIT28    0x10000000
> > -#define  BIT29    0x20000000
> > -#define  BIT30    0x40000000
> > -#define  BIT31    0x80000000
> > +#define  BIT0     0x00000001U
> > +#define  BIT1     0x00000002U
> > +#define  BIT2     0x00000004U
> > +#define  BIT3     0x00000008U
> > +#define  BIT4     0x00000010U
> > +#define  BIT5     0x00000020U
> > +#define  BIT6     0x00000040U
> > +#define  BIT7     0x00000080U
> > +#define  BIT8     0x00000100U
> > +#define  BIT9     0x00000200U
> > +#define  BIT10    0x00000400U
> > +#define  BIT11    0x00000800U
> > +#define  BIT12    0x00001000U
> > +#define  BIT13    0x00002000U
> > +#define  BIT14    0x00004000U
> > +#define  BIT15    0x00008000U
> > +#define  BIT16    0x00010000U
> > +#define  BIT17    0x00020000U
> > +#define  BIT18    0x00040000U
> > +#define  BIT19    0x00080000U
> > +#define  BIT20    0x00100000U
> > +#define  BIT21    0x00200000U
> > +#define  BIT22    0x00400000U
> > +#define  BIT23    0x00800000U
> > +#define  BIT24    0x01000000U
> > +#define  BIT25    0x02000000U
> > +#define  BIT26    0x04000000U
> > +#define  BIT27    0x08000000U
> > +#define  BIT28    0x10000000U
> > +#define  BIT29    0x20000000U
> > +#define  BIT30    0x40000000U
> > +#define  BIT31    0x80000000U
> >  #define  BIT32    0x0000000100000000ULL
> >  #define  BIT33    0x0000000200000000ULL
> >  #define  BIT34    0x0000000400000000ULL
> > @@ -469,28 +469,28 @@ struct _LIST_ENTRY {
> >  #define  BIT62    0x4000000000000000ULL
> >  #define  BIT63    0x8000000000000000ULL
> >
> > -#define  SIZE_1KB    0x00000400
> > -#define  SIZE_2KB    0x00000800
> > -#define  SIZE_4KB    0x00001000
> > -#define  SIZE_8KB    0x00002000
> > -#define  SIZE_16KB   0x00004000
> > -#define  SIZE_32KB   0x00008000
> > -#define  SIZE_64KB   0x00010000
> > -#define  SIZE_128KB  0x00020000
> > -#define  SIZE_256KB  0x00040000
> > -#define  SIZE_512KB  0x00080000
> > -#define  SIZE_1MB    0x00100000
> > -#define  SIZE_2MB    0x00200000
> > -#define  SIZE_4MB    0x00400000
> > -#define  SIZE_8MB    0x00800000
> > -#define  SIZE_16MB   0x01000000
> > -#define  SIZE_32MB   0x02000000
> > -#define  SIZE_64MB   0x04000000
> > -#define  SIZE_128MB  0x08000000
> > -#define  SIZE_256MB  0x10000000
> > -#define  SIZE_512MB  0x20000000
> > -#define  SIZE_1GB    0x40000000
> > -#define  SIZE_2GB    0x80000000
> > +#define  SIZE_1KB    0x00000400U
> > +#define  SIZE_2KB    0x00000800U
> > +#define  SIZE_4KB    0x00001000U
> > +#define  SIZE_8KB    0x00002000U
> > +#define  SIZE_16KB   0x00004000U
> > +#define  SIZE_32KB   0x00008000U
> > +#define  SIZE_64KB   0x00010000U
> > +#define  SIZE_128KB  0x00020000U
> > +#define  SIZE_256KB  0x00040000U
> > +#define  SIZE_512KB  0x00080000U
> > +#define  SIZE_1MB    0x00100000U
> > +#define  SIZE_2MB    0x00200000U
> > +#define  SIZE_4MB    0x00400000U
> > +#define  SIZE_8MB    0x00800000U
> > +#define  SIZE_16MB   0x01000000U
> > +#define  SIZE_32MB   0x02000000U
> > +#define  SIZE_64MB   0x04000000U
> > +#define  SIZE_128MB  0x08000000U
> > +#define  SIZE_256MB  0x10000000U
> > +#define  SIZE_512MB  0x20000000U
> > +#define  SIZE_1GB    0x40000000U
> > +#define  SIZE_2GB    0x80000000U
> >  #define  SIZE_4GB    0x0000000100000000ULL
> >  #define  SIZE_8GB    0x0000000200000000ULL
> >  #define  SIZE_16GB   0x0000000400000000ULL
> > @@ -524,28 +524,28 @@ struct _LIST_ENTRY {
> >  #define  SIZE_4EB    0x4000000000000000ULL
> >  #define  SIZE_8EB    0x8000000000000000ULL
> >
> > -#define  BASE_1KB    0x00000400
> > -#define  BASE_2KB    0x00000800
> > -#define  BASE_4KB    0x00001000
> > -#define  BASE_8KB    0x00002000
> > -#define  BASE_16KB   0x00004000
> > -#define  BASE_32KB   0x00008000
> > -#define  BASE_64KB   0x00010000
> > -#define  BASE_128KB  0x00020000
> > -#define  BASE_256KB  0x00040000
> > -#define  BASE_512KB  0x00080000
> > -#define  BASE_1MB    0x00100000
> > -#define  BASE_2MB    0x00200000
> > -#define  BASE_4MB    0x00400000
> > -#define  BASE_8MB    0x00800000
> > -#define  BASE_16MB   0x01000000
> > -#define  BASE_32MB   0x02000000
> > -#define  BASE_64MB   0x04000000
> > -#define  BASE_128MB  0x08000000
> > -#define  BASE_256MB  0x10000000
> > -#define  BASE_512MB  0x20000000
> > -#define  BASE_1GB    0x40000000
> > -#define  BASE_2GB    0x80000000
> > +#define  BASE_1KB    0x00000400U
> > +#define  BASE_2KB    0x00000800U
> > +#define  BASE_4KB    0x00001000U
> > +#define  BASE_8KB    0x00002000U
> > +#define  BASE_16KB   0x00004000U
> > +#define  BASE_32KB   0x00008000U
> > +#define  BASE_64KB   0x00010000U
> > +#define  BASE_128KB  0x00020000U
> > +#define  BASE_256KB  0x00040000U
> > +#define  BASE_512KB  0x00080000U
> > +#define  BASE_1MB    0x00100000U
> > +#define  BASE_2MB    0x00200000U
> > +#define  BASE_4MB    0x00400000U
> > +#define  BASE_8MB    0x00800000U
> > +#define  BASE_16MB   0x01000000U
> > +#define  BASE_32MB   0x02000000U
> > +#define  BASE_64MB   0x04000000U
> > +#define  BASE_128MB  0x08000000U
> > +#define  BASE_256MB  0x10000000U
> > +#define  BASE_512MB  0x20000000U
> > +#define  BASE_1GB    0x40000000U
> > +#define  BASE_2GB    0x80000000U
> >  #define  BASE_4GB    0x0000000100000000ULL
> >  #define  BASE_8GB    0x0000000200000000ULL
> >  #define  BASE_16GB   0x0000000400000000ULL
> > @@ -974,7 +974,7 @@ typedef UINTN RETURN_STATUS;
> >    @return The value specified by StatusCode with the highest bit set.
> >
> >  **/
> > -#define ENCODE_ERROR(StatusCode)     ((RETURN_STATUS)(MAX_BIT |
> (StatusCode)))
> > +#define ENCODE_ERROR(StatusCode)     ((RETURN_STATUS)(MAX_BIT |
> (StatusCode##ULL)))
> >
> 
> MAX_BIT always has type UINTN (UINT64 aka "unsigned long long" on 64-bit
> platforms, and UINT32 aka "unsigned int" on 32-bit platforms). This means
> that ENCODE_ERROR results in a UINTN, right now.

Good point, I didn't keep that in mind. It probably should be solely 'U'.

> 
> After the change, it would always result in a UINT64. I think that could be a
> problem for 32-bit platforms, especially if you push open-coded RETURN_xxx
> status codes to the stack, for example for printing with "%r". Pushing a
> UINT64 arg instead of a UINT32 arg through the ellipsis
> (...) might throw off the rest of the argument processing.

Actually the cast to RETURN_STATUS is covering the entire operation, so in the end it should be UINTN again, no?

> 
> So, I wouldn't recommend this change. What is the exact motivation for it?

If I am not terribly mistaken, the operation results in implementation-defined behavior as-is, which the patch as a whole intends to avoid.

> 
> >  /**
> >    Produces a RETURN_STATUS code with the highest bit clear.
> > @@ -1221,7 +1221,7 @@ typedef UINTN RETURN_STATUS;
> >    @return A 16-bit value built from the two ASCII characters specified by A
> and B.
> >
> >  **/
> > -#define SIGNATURE_16(A, B)        ((A) | (B << 8))
> > +#define SIGNATURE_16(A, B)        ((UINT16)(A) | (UINT16)((UINT16)(B) <<
> 8U))
> 
> This might not have the intended effect. Due to "int" (INT32) being capable
> of representing the entire domain of "unsigned short" (UINT16), both
> operands of the bitwise OR operator would be promoted to "int". The result
> would have type "int".

To me, what type the result has was less important than making sure both operands are unsigned to ensure a well-defined operation result.

> 
> Then, even if you cast the entire expression to UINT16 in the replacement
> text, it would still be promoted back to "int" in almost every context. (One
> exception would be the operand of the "sizeof"
> operator -- the expression operand of the "sizeof" operator is only evaluated
> if it has variable length array (VLA) type).
> 
> I don't think this change is helpful.
> 
> >
> >  /**
> >    Returns a 32-bit signature built from 4 ASCII characters.
> > @@ -1238,7 +1238,7 @@ typedef UINTN RETURN_STATUS;
> >            C and D.
> >
> >  **/
> > -#define SIGNATURE_32(A, B, C, D)  (SIGNATURE_16 (A, B) |
> > (SIGNATURE_16 (C, D) << 16))
> > +#define SIGNATURE_32(A, B, C, D)  ((UINT32)SIGNATURE_16 (A, B) |
> > +(UINT32)((UINT32)SIGNATURE_16 (C, D) << 16U))
> 
> This looks better; the original sub-expression
> 
>   SIGNATURE_16 (C, D) << 16
> 
> invokes undefined behavior *if* (SIGNATURE_16 (C, D)) has bit#15 set.
> 
> In practice that should never happen though: the documentation says that
> the macro arguments have to be "ASCII characters", and bit#7 is clear in
> those.

Thanks for this hint! While this is true of course, I usually prefer to show 'clear intentions' instead of relying on potentially unknown premises.

> 
> Anyway, if we want to be pedantic, the fix can be expressed more simply,
> like this:
> 
>   (SIGNATURE_16 (A, B) | ((UINT32)SIGNATURE_16 (C, D) << 16))
>                           ^^^^^^^^
> 
> Then the UINT32 type ("unsigned int") will "cascade" through the entire
> expression -- (SIGNATURE_16 (A, B)) will be converted from "int" to
> "unsigned int" for the bitwise OR, and the result will also have type
> UINT32:
> 
>   (INT32_VAL  | (UINT32)INT32_VAL << 16)
>   (INT32_VAL  | UINT32_VAL)
>   (UINT32_VAL | UINT32_VAL)
>   (UINT32_VAL)

This is only true if int <= INT32, isn't it?

> 
> >
> >  /**
> >    Returns a 64-bit signature built from 8 ASCII characters.
> > @@ -1260,7 +1260,7 @@ typedef UINTN RETURN_STATUS;
> >
> >  **/
> >  #define SIGNATURE_64(A, B, C, D, E, F, G, H) \
> > -    (SIGNATURE_32 (A, B, C, D) | ((UINT64) (SIGNATURE_32 (E, F, G, H)) <<
> 32))
> > +    ((UINT64)SIGNATURE_32 (A, B, C, D) | ((UINT64)
> > + ((UINT64)SIGNATURE_32 (E, F, G, H)) << 32U))
> >
> >  #if defined(_MSC_EXTENSIONS) && !defined (__INTEL_COMPILER) &&
> !defined (MDE_CPU_EBC)
> >    void * _ReturnAddress(void);
> >
> 
> Assuming that the SIGNATURE_32 is *either* fixed (so that it produces a
> UINT32 value) *or* used in accordance with the documentation (ie., only
> ASCII characters as arguments), the replacement text for SIGNATURE_64 is
> already correct, and this hunk is unneeded.

I think this is also only true if int <= INT64.

> 
> In the former case (SIGNATURE_32 fixed), we have:
> 
>   (UINT32_VAL | (UINT64)UINT32_VAL << 32)
>   (UINT32_VAL | UINT64_VAL)
>   (UINT64_VAL | UINT64_VAL)
>   (UINT64_VAL)
> 
> In the latter case -- SIGNATURE_32 left unchanged, but call sites only use
> ASCII char args --, we have:
> 
>   (INT32_VAL  | (UINT64)INT32_VAL << 32)
>   (INT32_VAL  | UINT64_VAL)
>   (UINT64_VAL | UINT64_VAL)
>   (UINT64_VAL)
> 
> which does what we want, because both INT32 --> UINT64 conversions start
> on non-negative values (i.e., bit#15 is clear in both SIGNATURE_32()
> expansions).
> 
> So, I think we don't need to touch SIGNATURE_64().
> 
> Thanks,
> Laszlo

next prev parent reply	other threads:[~2018-02-27 20:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-27 16:47 [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise operations Marvin Häuser
2018-02-27 19:54 ` Laszlo Ersek
2018-02-27 20:31   ` Marvin Häuser [this message]
2018-02-28 11:00     ` Laszlo Ersek
2018-02-28 11:43       ` Marvin Häuser
2018-02-28 13:57         ` Laszlo Ersek
2018-02-28 14:01           ` Laszlo Ersek
2018-02-28 14:21           ` Marvin Häuser
2018-02-28 18:37             ` Kinney, Michael D
2018-02-28 18:52               ` Marvin Häuser
2018-03-01  1:41                 ` Kinney, Michael D
2018-03-01 11:10                   ` Marvin Häuser
2018-03-01 17:18                     ` Kinney, Michael D
2018-03-01 17:28                       ` Marvin Häuser
2018-02-28 18:45             ` Marvin Häuser
2018-02-28 21:07               ` Marvin Häuser
2018-03-01 10:39                 ` Laszlo Ersek
2018-03-01 11:25                   ` Marvin Häuser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM4PR06MB14912DF47968C748F6BEB7A280C00@AM4PR06MB1491.eurprd06.prod.outlook.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox