Re: [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise operations.

public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed

From: Laszlo Ersek <lersek@redhat.com>
To: "Marvin Häuser" <Marvin.Haeuser@outlook.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: "michael.d.kinney@intel.com" <michael.d.kinney@intel.com>,
	"liming.gao@intel.com" <liming.gao@intel.com>
Subject: Re: [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise operations.
Date: Wed, 28 Feb 2018 14:57:20 +0100	[thread overview]
Message-ID: <2b22bfbd-24ce-e26c-9f1c-e5ba2816b48f@redhat.com> (raw)
In-Reply-To: <AM4PR06MB149112EED610A2DBA423963480C70@AM4PR06MB1491.eurprd06.prod.outlook.com>

On 02/28/18 12:43, Marvin Häuser wrote:

> Actually, your explanations and the rest of the bit defines made me
> wonder, whether marking all BIT defines and bit masks of any kind,
> anywhere, as ULL, might be the best solution.

For a new project just being started, that could be one of the safest /
wisest choices. :)

For edk2, about the only counter-argument I'd have right now is that
many assignments and function arguments could suddenly require
down-casts.

> +1) Inversion can be used for any integer size uncasted,

Right, you could just use ~BITn, and truncate it to whatever unsigned
type is needed.

> as edk2 does not support vendor extensions such as __int128 anyway.

Not *yet*, I guess :) UEFI 2.7 does list UINT128 / INT128, in table 5,
"Common UEFI Data Types". I believe those typedefs may have been added
for RISC-V.

> +2) Binary operations (AND, OR, ...) should not raise any problems at
> all (except for our fellow using VS2005x86 :) )

Haha :) In earnest though, you are right.

> +3) Assignments and passing-by-arguments will result in compiler-time
> errors when not done properly in the sense of this patch, which
> eliminates the need to audit all usages manually, but just compile the
> entire codebase and fix the errors one-by-one (which is of course
> still not a quick task, but if the package authors agree with the
> proposal, I might attempt it).

Good point; it didn't occur to me that a large number of the "client
sites" would be flagged by the compiler at once.

... I've just remembered another possible problem: I think some EBC
compiler cannot use 64-bit integers as case labels in switch statements.
See commit ada385843b94 ("MdeModulePkg/PciBus: Change switch-case to
if-else to fix EBC build", 2018-01-10). Some case labels currently built
of BIT[0..31] macros could break in EBC builds. OTOH, the (EBC) compiler
would catch those as well; no silent breakage.

> -1) The 'truncating constant value' warning would probably need to be
> disabled globally, however I don't understand how an explicit cast is
> a problem anyway.
>
> Did I overlook anything contra regarding that?

Hmmm... Do you think it could have a performance impact on 32-bit
platforms? (I don't think so, at least not in optimized / RELEASE
builds.)

>> (3) The clever code: is aware that BIT macros are (mostly) signed,
>>     and exploits that fact for various ends.

>> If we don't think (3) is a real risk, I'm fine with the approach of
>> these patches. (I don't think I'll be able to send a real R-b for
>> them, because that would require me to evaluate all uses, and that's
>> a Herculean task I just can't take on; apologies.)
>
> I hope that the 'all ULL' proposal uncovers them all. Would there be
> cases, where no error would be raised?

Dunno :) I quite like the idea!

>>>>> -#define ENCODE_ERROR(StatusCode)     ((RETURN_STATUS)(MAX_BIT |
>>>> (StatusCode)))
>>>>> +#define ENCODE_ERROR(StatusCode)     ((RETURN_STATUS)(MAX_BIT
>> |
>>>> (StatusCode##ULL)))

>> Given the current definition, if StatusCode is a signed integer, then
>> one way or another, it will be converted to (not reinterpreted as) an
>> unsigned integer type. This is due to the rules on the bitwise OR
>> operator and the outermost cast (thank you for reminding me of that).
>> Such conversions are fully defined in the C standard, they are not
>> implementation-defined.
>
> Thanks for noting. Would you still append the U-prefix for readability
> / making the intention clear?

I think I'd document the requirement that StatusCode have an unsigned
integer type, and then I'd update the ENCODE_ERROR() macro invocations
to conform.

>> Is your point that we shouldn't even assume INT32=="int" and
>> UINT32=="unsigned int"?
>
> Yes, pretty much. I don't think it does hurt either. While it is
> certainly a very ugly definition, that's my preferred use for macros
> anyway - hide ugly code behind a lovely name.
>
>>
>> I think that would be a good point for a highly portable C codebase
>> that was written from the ground up without type assumptions. But I
>> don't think it applies to edk2. If UINT32 mapped to "unsigned short"
>> and INT64 mapped to "int", edk2 would break in a thousand places
>> immediately.
>
> Of course it would, however I imagined a header file that is included
> in every file for every architecture to be as close to flawless as
> possible.
> While of course I would also fix such assumptions in executable code
> if I found, and hope the maintainers care enough to review and merge,
> I consider header files a greater danger as that 'flawed' (in an
> idealistic sense) macro is expanded into external code, which might
> cause trouble while debugging, as the consumer assumes the mistake is
> within his code.
> Headers >>> library code > module code, in my opinion, especially for
> the most central one.

I'll defer to the MdePkg maintainers on this.

>> Personally I think that's a laudable goal, and I agree that all
>> portable C code should be written like that. I also think however
>> that there's a large disconnect between that goal / style, and how
>> edk2 (and even the UEFI spec) exist today. In my opinion /
>> experience, edk2 was never meant to be portable to unknown, future
>> platforms. For one, it is tied to little endian byte order on the
>> spec level.
>
> Personally, I am fine to assert anything that the specification
> demands (that's what it's here for after all), however we are not
> aware of the compiler-specifics at spec-level.

It is interesting to contrast:

- the use of the standard (optional!) uint32_t type in portable, hosted
  C projects

- with the use of UINT32 in edk2.

* From C99:

  7.18.1.1 Exact-width integer types

    1 [...]

    2 The typedef name uintN_t designates an unsigned integer type with
      width N. Thus, uint24_t denotes an unsigned integer type with a
      width of exactly 24 bits.

    3 These types are optional. However, if an implementation provides
      integer types with widths of 8, 16, 32, or 64 bits, no padding
      bits, and (for the signed types) that have a two'\x19s complement
      representation, it shall define the corresponding typedef names.

In my experience, portable, hosted C code uses the standard types by
default (int, unsigned int, size_t, etc) and uses uint32_t and friends
only in contexts where the representation matters -- mostly for
implementing binary formats (files, network, ...). And going from one
set of types to the other is usually strongly protected with range
checks; there is usually some kind of (de)serialization that very
consciously handles such transitions.

This practice is very good because the integer promotions and the usual
arithmetic conversions, as defined by the standard, i.e. in terms of
standard integer types, are relatively easy to evalute for most of the
code. Even bit-fiddling can be done in just "unsigned", "unsigned long"
or "unsigned long long", because <limits.h> guarantees minimum
magnitudes for those types.

* In edk2 however (and in the UEFI spec), we have:

- exact-width integer types that are hard to map to standard integer
  types -- hence the promotions and conversions are hard to evaluate in
  a portability mindset (they are not hard to evaluate if you just take
  UINT32=="unsigned int" for guaranteed, but that's what you'd like to
  avoid),

- UINTN / INTN, which is defined as "native width". That might make some
  sense in assembly, but in C, "native width" makes no sense. The
  promotion and conversion rules can be interpreted for UINTN only very
  indirectly, by making some assumptions about UINT32 and UINT64. (And,
  about UINT128, theoretically :) )

Combine this with the fact that in edk2 source code, we're supposed to
*avoid* using the standard C types! This is the exact opposite of the
practice that I described for those portable, hosted C projects (i.e.,
stick with the standard types by default, go "exact width" only when
unavoidable). When I first encountered edk2, I was utterly confused by
this practice. I had seen it nowhere else. Reasoning about promotions
and conversions looked plain impossible.

I still think it was a bad choice (I don't know where it comes from,
maybe Windows?), but I've gotten used to it by now -- I do equate UINT32
to "unsigned int", "UINT64" to "unsigned long long", and this way I can
actually reason about the ISO C promotions and conversions.

If you take that away, I'm not sure what we'll be left with :) I just
think that the entire ISO C portability mindset, albeit *absolutely*
desirable, is entirely foreign to the edk2 codebase (and the UEFI spec
too).  I agree we should avoid even implementation-defined behavior, as
much as we can -- but not more than we can. If we start doubting UINT32
maps to "unsigned int", the entire code base falls apart in my eyes.

Now, I do understand why UEFI (the spec) defines such exact-width types,
and declares protocols in terms of those types. That's because UEFI is
an ABI spec, not an API spec. That makes sense. However:

- rather than introducing UINT32 etc, the spec should have used the
  standard C types uint32_t etc,

- regardless of the UEFI interfaces being declared with exact-width
  integer types, edk2 code should be allowed (or even *prefer*) to use
  standard C types, such as "int", "unsigned int" etc, unless
  necessitated otherwise.

Sorry about this long rant; I'm trying to explain why I felt that the
goal of your patches didn't match edk2 very well. It doesn't mean that
your patches are "wrong".

Thanks
Laszlo

next prev parent reply	other threads:[~2018-02-28 13:51 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-27 16:47 [PATCH 1/2] MdePkg/Base.h: Ensure safe bitwise operations Marvin Häuser
2018-02-27 19:54 ` Laszlo Ersek
2018-02-27 20:31   ` Marvin Häuser
2018-02-28 11:00     ` Laszlo Ersek
2018-02-28 11:43       ` Marvin Häuser
2018-02-28 13:57         ` Laszlo Ersek [this message]
2018-02-28 14:01           ` Laszlo Ersek
2018-02-28 14:21           ` Marvin Häuser
2018-02-28 18:37             ` Kinney, Michael D
2018-02-28 18:52               ` Marvin Häuser
2018-03-01  1:41                 ` Kinney, Michael D
2018-03-01 11:10                   ` Marvin Häuser
2018-03-01 17:18                     ` Kinney, Michael D
2018-03-01 17:28                       ` Marvin Häuser
2018-02-28 18:45             ` Marvin Häuser
2018-02-28 21:07               ` Marvin Häuser
2018-03-01 10:39                 ` Laszlo Ersek
2018-03-01 11:25                   ` Marvin Häuser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b22bfbd-24ce-e26c-9f1c-e5ba2816b48f@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox