From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Mon, 15 Jul 2019 17:45:43 -0700 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C05FEC057F4F; Tue, 16 Jul 2019 00:45:42 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-33.ams2.redhat.com [10.36.116.33]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5D45560161; Tue, 16 Jul 2019 00:45:38 +0000 (UTC) Subject: Re: [edk2-devel] [PATCH 2/3] MdePkg/BaseLib: rewrite Base64Decode() To: =?UTF-8?Q?Marvin_H=c3=a4user?= , "devel@edk2.groups.io" Cc: Liming Gao , Michael D Kinney , =?UTF-8?Q?Philippe_Mathieu-Daud=c3=a9?= , Zhichao Gao References: <20190702102836.27589-1-lersek@redhat.com> <20190702102836.27589-3-lersek@redhat.com> From: "Laszlo Ersek" Message-ID: <364bcb01-e936-b828-d972-7d68ccbdc3ac@redhat.com> Date: Tue, 16 Jul 2019 02:45:37 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 16 Jul 2019 00:45:42 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable Hi Marvin, On 07/15/19 20:44, Marvin H=C3=A4user wrote: > I feel like my rushed message mentioning 'MAX_ADDRESS' was misleading > a little - the point with that was a potential index overflow (I may > actually have meant 'MAX_UINTN', I am not sure about the details > anymore) in the original code, I personally see a lot of sense in > *not* checking whether the buffer wraps around (similarly the overlap > condition). For one consistency with similar code, where no such > checks exist, and then a sanity-trust in the caller (which, as this is > a library function, is interior as opposed to an external protocol > caller which should naturally be trusted less). > > Generally, I am rather confused about the edk2 trust model for static > calls. A bunch of libraries verify input parameter sanity via ASSERTs, > another bunch by runtime checks and appropriate return statuses. Is > there any kind of policy I am unaware of? Good points, and you are right, the landscape is not consistent. I think the edk2 practice can be characterized as follows (again, the picture is not uniform / consistent across the codebase): (1) Functions are supposed to have detailed interface contracts, but many don't. In some cases, cutting corners appears at least moderately defensible, because a function can be really simple, and writing documentation could be the larger part of the effort. I still prefer if we document all new functions painstakingly. (2) Caller responsibilities are frequently checked by callees. The checks vary between ASSERT()s and explicit conditions / return values. The 2nd kind is better, of course, and that is actually exemplified by the UEFI spec. Namely, a large proportion of EFI_INVALID_PARAMETER return codes correspond to cases when the callee catches the caller breaking the interface contract. See for example EFI_BOOT_SERVICES.CreateEvent(). In some other cases, similar efforts / spec requirements look dubious (see EFI_BOOT_SERVICES.FreePool() -- "Buffer was invalid"). If the callee gets a pointer to freed storage, it can't even *evaluate* that pointer without depending on undefined behavior. So, in theory, there's no way to enforce the contract, beyond trusting the caller, and so there can be no *substitute* for a detailed contract; see (1). I think the approach ("watch your caller") is generally acceptable still, if there is not a large runtime cost, because the environment is extremely unforgiving, and because the firmware can, in practice, recover from *some* undefined behavior this way, even. (3) Genuine failure conditions are occasionally checked with ASSERT(). This should never be done. Now, considering "wrapped arrays" and such -- obviously such a thing is not even an object in C (no valid memory allocation would ever produce it), so normally I would consider checks against it pointless. However, in the present case, there are three arguments for including the MAX_ADDRESS checks: - The original code included similar MAX_ADDRESS checks, and I didn't want to "weaken" the implementation in any way. (I simply didn't want to defend such choices -- so I didn't make them.) - The conditions are not costly, and as long as the buffer pointers are into valid storage, the conditions do catch -- without invoking undefined behavior -- *sizes* that would cause wrap-around. This is in line with (2). - This is MdePkg/BaseLib, so general distrust (with negligible runtime cost) cannot hurt. Widely used library -- yes, not a protocol, but still linked (statically) into a bunch of 3rd party code --, and (again) an unforgiving environment. I introduced the overlap check myself (IIRC). That was because: - There is no language-level reason why decoding back into the same buffer couldn't work -- it's just that this implementation doesn't aim to support that. Hence the corresponding natural language restriction in the interface contract. (In edk2, we don't have the "restrict" keyword, from C99.) Otherwise a programmer might cleverly say, "aha, base64 decoding never *inflates* data, so I'll just decode "in-place". - And then, although it's a caller responsibility, catch the overlap explicitly with RETURN_INVALID_PARAMETER, in line with the above: the checks are cheap, and BaseLib is quite central. Now I'm not trying to present any of this as "policy" -- just my 2 cents. I hope it makes sense. Thanks Laszlo