From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Tue, 24 Sep 2019 13:26:21 -0700 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5DFC8309BDBA; Tue, 24 Sep 2019 20:26:20 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-118.rdu2.redhat.com [10.10.120.118]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5CD5760F80; Tue, 24 Sep 2019 20:26:14 +0000 (UTC) Subject: Re: [edk2-devel] [PATCH 00/35] edk2: clean up the usage of standardized (VOID*) typedefs To: =?UTF-8?Q?Marvin_H=c3=a4user?= , "devel@edk2.groups.io" Cc: Andrew Fish , Anthony Perard , Ard Biesheuvel , Benjamin You , Chao Zhang , Dandan Bi , David Woodhouse , Eric Dong , Guo Dong , Hao A Wu , Jaben Carsey , Jian J Wang , Jiaxin Wu , Jiewen Yao , Jordan Justen , Julien Grall , Leif Lindholm , Liming Gao , Maurice Ma , Michael D Kinney , Ray Ni , Siyuan Fu , Supreeth Venkatesh , Zhichao Gao References: <20190917194935.24322-1-lersek@redhat.com> From: "Laszlo Ersek" Message-ID: <06b8b728-e38e-cc64-4499-9733fc993e5d@redhat.com> Date: Tue, 24 Sep 2019 22:26:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Tue, 24 Sep 2019 20:26:21 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 09/23/19 18:27, Marvin H=C3=A4user wrote: > Good day, >=20 > Thank you, Laszlo, for your ambition to introduce stricter code style=20 > enforcements. Sorry to "hijack" the actual topic (I did not CC anyone o= n=20 > purpose, as this is mostly a separate topic and I'd like a quick commen= t=20 > first), but this seems like a good occasion to mention another few bad=20 > practices edk2 has been following. Mainly, I'd like to call *some*=20 > attention to quality problems in the code base while this has some=20 > traction, and cause a discussion on whether and how those are to be=20 > approached. >=20 > Thank you for your time. Sure, I can offer my personal opinion on these. > "inadequate type punning": > e.g.=20 > https://github.com/tianocore/edk2/blob/fcdedafd97c8f18c33a63d26b954e5db= aee81a2b/MdeModulePkg/Bus/Usb/UsbBusPei/HubPeim.c#L446 >=20 > This is mostly about the infamous "Strict Aliasing" rule, which is=20 > basically: > "An object shall have its stored value accessed only by an lvalue=20 > expression that has one of the > following types: > =E2=80=94 a type compatible with the effective type of the object, > =E2=80=94 a qualifed version of a type compatible with the effective ty= pe of the=20 > object, > =E2=80=94 a type that is the signed or unsigned type corresponding to t= he=20 > effective type of the object, > =E2=80=94 a type that is the signed or unsigned type corresponding to a= qualifed=20 > version of the effective > type of the object, > =E2=80=94 an aggregate or union type that includes one of the aforement= ioned=20 > types among its members > (including, recursively, a member of a subaggregate or contained union)= , or > =E2=80=94 a character type." > C18 (ISO/IEC 9899:2018), 6.5.7 (exists, though has been updated, since = C90) >=20 > Currently optimisations based on this are disabled. This is a bit nasty= =20 > to work around if *seriously* needed when sticking to C90, I can only=20 > think of memcpy right now. However, even though there are compilers tha= t=20 > do not fully support C99 (ahem, Microsoft :) ), type-punning by unions=20 > should be supported by them all, and has been legal as of C99, where th= e=20 > following part has been dropped from the standard: > "With one exception, if a member of a union object is accessed after a=20 > value has been stored in a different member of the object, the behavior= =20 > is implementation-defined." > C90 (ISO/IEC 9899:1990), 6.3.2.3 I'm opposed to enforcing the strict aliasing rules, even though in all code that I write, I either try to conform to them, or at least I seek to be fully conscious of breaking them. Here's the thing: IMO, the strict aliasing rules sacrifice flexibility for performance (optimization possibilities). Not to mention the amount of code in edk2 that would have to be identified and updated. BaseTools uses "-fno-strict-aliasing" everywhere, and I think that's a good choice. https://blog.regehr.org/archives/1180 This proposal for a "friendly dialect of C" intended to eliminate the strict aliasing rules altogether. (Item#10; possibly also item#13.) Also, as it points out, the Linux kernel is built with "-fno-strict-aliasing". I've checked now, with a *long* series of "git blame" commands, even digging into the "history" repository (which is at ). I can say that the current flag has been in place since *at least* Linux v2.5.0 (2002-02-04). QEMU has been built with "-fno-strict-aliasing" ever since commit b932caba32c6 ("new disk image layer", 2004-08-01), known originally as SVN rev 1030. Consider the following example. You have a dynamically allocated buffer. You read some data into it, from the network or the disk, using PCI DMA. Let's assume that, from the block read via PCI DMA, the library function(s) or protocol member(s) that you call, directly or indirectly, there is at least one that: - copies data from a source buffer to a target buffer, using UINT32 or UINT64 assignments (for speed), - and is implemented in C. Now, according to the effective type rules, your dynamic buffer's effective type is "array of UINT32" or "array of UINT64". That's because of C99 6.5 Expressions, p6: "If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value." Then if you try to parse this buffer as a UEFI device path (=3D a packed sequence of device path node structures), *IN PLACE*, you will break the effective type rules no matter what. Because, you will necessarily look at the next node in the blob as an EFI_DEVICE_PATH_PROTOCOL (because you'll want to learn the Type and the SubType fields, and the Length array too). Note that EFI_DEVICE_PATH_PROTOCOL is a structure with no UINT32 or UINT64 members. Boom. So you have to resort to one of the following things: 1. Define a union type, assign the *full union* first , and then check the union helper variable. (First you must make sure there was enough room in the buffer being parsed for a full union.) 2. Or, use memcpy() -- or something that the compiler is similarly enlightened about --, to the same helper variable. 3. Or, write a manual copy loop with character access, to the helper variable. 4. Or, use character access to read the fields. #1 through #3 add a separate copying step, while #4 is extremely uncomfortable to program. In a nutshell, the effective type rules require separate de-serialization routines for all data, and that's incredibly annoying. It kills the utility of packed C structures. I prefer packed C structures, size checks (!!!) against the buffers to parse, and then type punning of pointers. > "pointer unions": > e.g.=20 > https://github.com/tianocore/edk2/blob/fcdedafd97c8f18c33a63d26b954e5db= aee81a2b/MdePkg/Include/IndustryStandard/SmBios.h#L2592 >=20 > While the idea behind them is certainly style preference, using a union= =20 > of pointers prevents two important things over a union of structs. >=20 > 1) CONST declaration: When defining a variable of a union type=20 > containing pointers as CONST, speaking of its members, they are all=20 > going to be CONST pointers to arbitrary memory and not arbitrary=20 > pointers to CONST memory. With a union of structs, you can have either=20 > as required (e.g. CONST UNION_TYPE *union, or UNION_TYPE *COST union). Formally, you are right, but I'm doubtful of the utility of "pointer-to-union-of-structs". We cannot de-reference a pointer to the union unless the buffer has enough data for the complete union. This leaves the parsing of small structs unsolved. A union of pointers is just syntactic sugar, of course, but it's convenient. The member that is "pointer-to-smallest-header-substructure" can be used for determining the actual structure type. Then we can determine if there's enough data for that structure in the buffer. Then we can use the matching pointer member, for accessing that structure. Furthermore, CONST gets too complex really soon, and we have to start adding explicit casts. My favorite link is: http://c-faq.com/ansi/constmismatch.html I had never met "pointer unions" before edk2, but I've found them quite convenient. > 2) Well-defined header inspection: > "if a union contains several structures that share a common initial=20 > sequence [...], it is permitted to inspect the common initial part of=20 > any of them anywhere that a declaration of the completed type of the=20 > union is visible" > C18 (ISO/IEC 9899:2018), 6.5.2.3.6 (exists since at least C90) >=20 > This guarantee can be used to inspect the type defined in a common=20 > header (e.g. SMBIOS_STRUCTURE) and the process the type-specific data b= y=20 > accessing the appropiate member (e.g. SMBIOS_TABLE_TYPE0) legally. This only applies *after* you have populated the union for inspection. (Or at least enough bytes for the common header that you're going to inspect.) If you have a union collecting three structures: 5 bytes, 19 bytes, and 32 bytes, and you have a buffer with 24 bytes left (suggesting a 5 byte structure and a 19 byte structure in it, or vice versa), you cannot populate the full union -- you don't have 32 bytes left. So let's then assume that the common header is 2 bytes long (with 3 vs. 17 vs. 30 additional bytes required in the specific structures). Fine, then you read 2 bytes into the stand-alone union helper variable, for inspecting the common header. Based on the header inspection, you then decide to (attempt to) read 3 more bytes, or 17 bytes, continuing into the union, and then parse the specific (now completed) structure through the matching union member. Great? Not really. Notice that, in this process, you didn't need the union *at all*. You can simply use standalone structures, and you may not even need more stack space. Compare: (i) with a union: enum Type { type_1, type_2 } struct H { enum Type t; ... }; struct S1 { struct H h; int S1_i1; }; struct S2 { struct H h; char S2_c; double S2_d; }; union U { S1 s1; S2 s2; }; Code: union U u; char unsigned *src =3D buffer; char unsigned *dst =3D (void*)&u; size_t specific; if (room_left < sizeof(struct H)) { return BAD; } memcpy(dst, src, sizeof(struct H)); dst +=3D sizeof(struct H); src +=3D sizeof(struct H); room_left -=3D sizeof(struct H); switch (u.s1.h.t) { case type_1: specific =3D sizeof(struct S1) - sizeof(struct H); if (room_left < specific) { return BAD; } memcpy(dst, src, specific); dst +=3D specific; src +=3D specific; room_left -=3D specific; /* now access u.s1.S1_i1 */ return GOOD; case type_2: specific =3D sizeof(struct S2) - sizeof(struct H); if (room_left < specific) { return BAD; } memcpy(dst, src, specific); dst +=3D specific; src +=3D specific; room_left -=3D specific; /* now access u.s2.{S2_c,S2_d} */ return GOOD; default: return BAD; } (ii) without a union: enum Type { type_1, type_2 } struct H { enum Type t; ... }; /* note: header no longer embedded */ struct S1 { int S1_i1; }; /* note: header no longer embedded */ struct S2 { char S2_c; double S2_d; }; Code: struct H h; /* note: no union, just the common header */ char unsigned *src =3D buffer; /* note: no dst pointer into union */ size_t specific; if (room_left < sizeof h) { return BAD; } memcpy(&h, src, sizeof h); src +=3D sizeof h; room_left -=3D sizeof h; switch (h.t) { /* note: no awkward reference to "u.s1" */ case type_1: { struct S1 s1; /* note: never co-exists with "s2" on the stack */ specific =3D sizeof s1; /* note: no awkward subtraction */ if (room_left < specific) { return BAD; } memcpy(&s1, src, specific); src +=3D specific; room_left -=3D specific; /* now access s1.S1_i1 */ } return GOOD; case type_2: { struct S2 s2; /* note: never co-exists with "s1" on the stack */ specific =3D sizeof s2; /* note: no awkward subtraction */ if (room_left < specific) { return BAD; } memcpy(&s2, src, specific); src +=3D specific; room_left -=3D specific; /* now access s2.S2_c, s2.S2_d */ } return GOOD; default: return BAD; } To me, (ii) is much cleaner than (i); the union is not needed. Of course, I find the type punning approach even better than (ii) :) See below: (iii) no union, yes type punning: struct H *h; char unsigned *src =3D buffer; if (room_left < sizeof *h) { return BAD; } h =3D (struct H *)src; /* note: no copying */ src +=3D sizeof *h; room_left -=3D sizeof *h; switch (h->t) { case type_1: { struct S1 *s1; specific =3D sizeof *s1; if (room_left < specific) { return BAD; } s1 =3D (struct S1 *)src; /* note: no copying */ src +=3D specific; room_left -=3D specific; /* now access s1->S1_i1 */ } return GOOD; and so on. > Plain=20 > casts and "pointer union" accesses are illegal as per the "inadequate=20 > type punning" point above. >=20 >=20 >=20 > "casting away CONST": > e.g.=20 > https://github.com/tianocore/edk2/blob/fcdedafd97c8f18c33a63d26b954e5db= aee81a2b/MdePkg/Library/BaseIoLibIntrinsic/IoLibMmioBuffer.c#L236 In this case, the real problem is with the function prototype / specification, not the implementation. The implementation follows from the problematic function semantics -- if you take Buffer as (CONST UINT8 *), then why return the exact same buffer as (UINT8 *)? > This should be obvious as Undefined Behaviour because memory previously= =20 > guaranteed to be read-only is returned as a pointer to memory that=20 > allows writing, Note: this is *per se* not undefined behavior. Casting away CONST (explicitly) in itself is OK. Writing through the pointer is also OK *if* the pointed-to object was never defined as CONST. Otherwise, the behavior is undefined. See C99 6.7.3 Type qualifiers, p5: "If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined." I've cited the part about "volatile" to highlight the difference between "modify" and "refer to". When casting away const, the behavior is only undefined if you actually try to modify an object that is actually const. int i =3D 2; int const ci =3D 3; int *pi; int const *pci; pci =3D &i; pi =3D (int *)pci; /* fine in itself, but now you're on your own */ *pi =3D 3; /* fine, as "i" is not defined const */ pci =3D &ci; pi =3D (int *)pci; /* fine in itself, but now you're on your own */ i =3D *pi; /* fine, not modifying "ci" */ *pi =3D 3; /* undefined, as "ci" is defined const */ But, yes, the pattern seen under the link is risky practice. > but for easier lookup, here's the related rule: > "the left operand has atomic, qualifed, or unqualifed pointer type, and= =20 > [...] the type pointed to by the left has all the qualifers of the type= =20 > pointed to by the right" > C18 (ISO/IEC 9899:2018), 6.5.16.1 (exists since at least C90) What you quote is from the Constraints of "Simple assignment". Look at "6.5.4 Cast operators" as well, paragraph 3: "Conversions that involve pointers, other than where permitted by the constraints of 6.5.16.1, shall be specified by means of an explicit cast.= " In brief, casting away const is not invalid in itself; it just throws away protections (diagnostics) that the compiler would otherwise be required to emit. Sometimes it's unavoidable. In most cases we should avoid it. That might require fixing a few function prototypes. > "structs with trailing 1-length array" > e.g.=20 > https://github.com/tianocore/edk2/blob/fcdedafd97c8f18c33a63d26b954e5db= aee81a2b/MdePkg/Include/Guid/FileInfo.h#L51 >=20 > This is undefined as per: > "The behavior is undefned in the following circumstances: > [...] > =E2=80=94 Addition or subtraction of a pointer into, or just beyond, an= array=20 > object and an integer type produces a result that does not point into,=20 > or just beyond, the same array object (6.5.6). > =E2=80=94 Addition or subtraction of a pointer into, or just beyond, an= array=20 > object and an integer type produces a result that points just beyond th= e=20 > array object and is used as the operand of a unary * operator that is=20 > evaluated (6.5.6). > =E2=80=94 Pointers that do not point into, or just beyond, the same arr= ay object=20 > are subtracted (6.5.6)." > C18 (ISO/IEC 9899:2018), J.2 (exists since at least C90) >=20 > Same as above, while not all compilers fully support C99, flexible=20 > arrays should be support by all reasonably new compilers and allow a=20 > legal declaration and usage where this hack is in place. At worst, a=20 > macro could be provided to declare a [1] vs a [] array on demand and a=20 > requirement be introduced to have a "SIZE_OF_" macro for each such=20 > struct, but my personal preference would be to just enforce flexible ar= rays. Yes, in C99, the flexible array member was introduced to replace the "struct hack", which had always been undefined. It would be nice to remove all toolchains that don't support the flexible array member, and then to replace all struct hacks with the flexible array member. I agree. Unfortunately, there's one extra difficulty (beyond the "expected" regressions in adjusting code for the fixed element at offset 0): the struct hack is used in several places in the UEFI 2.8 spec. So that would have to be updated too. > "Missing security checks for external data": > e.g.=20 > https://github.com/tianocore/edk2/blob/fcdedafd97c8f18c33a63d26b954e5db= aee81a2b/MdePkg/Library/BasePeCoffLib/BasePeCoff.c#L943 >=20 > The given example misses an alignment verification of the resulting=20 > pointer (which technically has to be verified *before* casting), as=20 > documented here: > "The behavior is undefined in the following circumstances: > [...] > =E2=80=94 Conversion between two pointer types produces a result that i= s=20 > incorrectly aligned (6.3.2.3)." > C18 (ISO/IEC 9899:2018), J.2 (exists since at least C90) In C99 anyway, "6.3.2.3 Pointers", paragraph 7 writes, "A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned [...] for the pointed-to type, the behavior is undefined. [...]" I find this very impractical and limiting, when accessing RAM (not MMIO). I prefer if unaligned pointers (into RAM) just work, albeit slow, perhaps. I believe AARCH64 can be configured to trip a fault vs. work (but more slowly). On Intel, it just works. I think we should be given the freedom to "define" the behavior that's left undefined in this case by the standard. > There are more such issues throughout the codebase, including missing=20 > overflow and (or flawed, see before) bounds checks, however I cannot=20 > find such quickly. >=20 >=20 >=20 > "signed int BIT definitions": > e.g.=20 > https://github.com/tianocore/edk2/blob/fcdedafd97c8f18c33a63d26b954e5db= aee81a2b/MdePkg/Include/Base.h#L348 >=20 > Fixing this would be prone to regressions, but I'd like to add it for=20 > tracking purposes. Related discussion can be found down the chain here:= =20 > https://lists.01.org/pipermail/edk2-devel/2018-February/021919.html I agree about this, in theory. I'm afraid it's impossible to fix in practice, given the huge amounts of dependent code (esp. out of tree code= ). More or less, I'd summarize my opinion as follows: - we should try to write such new code that conforms to the standard, - *except* when the standard doesn't give us enough guarantees (i.e. leaves the behavior undefined) that we need for convenient *in-place* parsing (from RAM). Integer range checks and buffer boundary checks are extremely important and we should implement those meticulously, but once we've made sure our accesses are in range, the compiler should just follow our pointers wherever they point. We should drag our toolchains kicking and screaming into a state where they build the source the way we need, for *in-place* parsing of RAM buffers, through packed structures. As long as the architectures that we target don't prevent us from in-place parsing, the toolchains should neither. Thanks Laszlo