From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mout01.posteo.de (mout01.posteo.de [185.67.36.65])
 by mx.groups.io with SMTP id smtpd.web08.11454.1621203356473843269
 for <devel@edk2.groups.io>;
 Sun, 16 May 2021 15:15:57 -0700
Authentication-Results: mx.groups.io;
 dkim=fail reason="body hash did not verify" header.i=@posteo.de header.s=2017 header.b=J9EhWOSG;
 spf=pass (domain: posteo.de, ip: 185.67.36.65, mailfrom: mhaeuser@posteo.de)
Received: from submission (posteo.de [89.146.220.130]) 
	by mout01.posteo.de (Postfix) with ESMTPS id B4027240028
	for <devel@edk2.groups.io>; Mon, 17 May 2021 00:15:54 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.de; s=2017;
	t=1621203354; bh=ZcSN0jrYOloJrm4TKPzk7mPHcMeoakyy/FbGwBOv4Dw=;
	h=From:Subject:To:Cc:Date:From;
	b=J9EhWOSGobGbY08cj0mxs9nVykgPntiDoMYhU9ME+MlZYTM2UkauD1QNaXcWNV1VF
	 h6Lr9i9dSiaTEPY4tVvCAS3h7J7BfeT/RXWNr9k2yN6wDB14nCqf7ZzghjAP/H2I6W
	 H5g+zhmcar3PRq3fcFqKRRs2GvP9CkTFVj+tldsb3tU95JRfJwGzcfd4YrVpDgZuOR
	 mljbnc3w3hEPvsH8TAdb/GbsX4pS2WwG43GPIsQIkCTw7gobZ9pcP66C0IqiGm3kA9
	 rQHa/790Ng8evcKdXhETtxFNeWWmFDTnfafiYGHgNAj1Ap6oy3y3n0GUOfPHVxvPnJ
	 nWt9IXdfJVPyw==
Received: from customer (localhost [127.0.0.1])
	by submission (posteo.de) with ESMTPSA id 4FjxRY2cYsz6tmB;
	Mon, 17 May 2021 00:15:53 +0200 (CEST)
From: =?UTF-8?B?TWFydmluIEjDpHVzZXI=?= <mhaeuser@posteo.de>
Subject: Re: [edk2-devel] [PATCH] UefiCpuPkg/MpInitLib: Allocate a separate SEV-ES AP reset stack area
To: devel@edk2.groups.io, lersek@redhat.com, thomas.lendacky@amd.com
Cc: Brijesh Singh <brijesh.singh@amd.com>, Eric Dong <eric.dong@intel.com>,
 Ray Ni <ray.ni@intel.com>, Rahul Kumar <rahul1.kumar@intel.com>
References: <1162d1f4fe09048aaafba6f6ea3046bebd34fc2b.1620766224.git.thomas.lendacky@amd.com>
 <a9110fb7-473e-dada-b0c1-0b5f75504395@posteo.de>
 <bd4d5bfd-b9c4-79c4-21ff-4738ec2741a0@amd.com>
 <b4dc7e35-87d9-b598-4dec-5fcaf6a345d1@posteo.de>
 <414ed071-6bed-4e41-400d-7d958b27ae3b@redhat.com>
Message-ID: <19890a20-d5fb-f793-660f-f72ee610bb68@posteo.de>
Date: Sun, 16 May 2021 22:15:52 +0000
MIME-Version: 1.0
In-Reply-To: <414ed071-6bed-4e41-400d-7d958b27ae3b@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-GB
Content-Transfer-Encoding: quoted-printable

Am 16.05.2021 um 03:17 schrieb Laszlo Ersek:
> On 05/14/21 17:44, Marvin H=C3=A4user wrote:
>> On 14.05.21 17:23, Lendacky, Thomas wrote:
>>> On 5/14/21 10:04 AM, Marvin H=C3=A4user wrote:
>=20
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // Check to be sure that the "alloca=
te below" behavior hasn't
>>>>> changed.
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // This will also catch a failed all=
ocation, as "-1" is
>>>>> returned on
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // failure.
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 //
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (CpuMpData->SevEsAPResetStackStar=
t >=3D
>>>>> CpuMpData->WakeupBuffer) {
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 DEBUG ((DEBUG_ERROR,
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "SEV-ES AP r=
eset stack is not below wakeup buffer\n"));
>>>>> +
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ASSERT (FALSE);
>>>> Should the ASSERT not only catch the broken "allocate below" behaviou=
r,
>>>> i.e. not trigger on failed allocation?
>>> I think it's best to trigger on a failed allocation as well rather tha=
n
>>> continuing and allowing a page fault or some other problem to occur.
>>
>> Well, it should handle the error in a safe way, i.e. the deadloop below=
.
>> To not ASSERT on plausible conditions is a common design guideline in
>> most low-level projects including Linux kernel.
>>
>> Best regards,
>> Marvin
>>
>>> Thanks,
>>> Tom
>>>
>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 CpuDeadLoop ();
>=20
> "DEBUG + ASSERT(FALSE) + CpuDeadLoop()" is a pattern in edk2.
>=20
> In RELEASE builds, it will lead to a CpuDeadLoop(). That's the main goal
> -- don't continue execution if the condition controlling the whole block
> fired.
>=20
> In DEBUG and NOOPT builds, the pattern will lead to a debug message
> (usually at the "error" level), followed by an assertion failure. The
> error message of the assertion failure is irrelevant ("FALSE"). The
> point of adding ASSERT ahead of CpuDeadLoop() is that the way ASSERT
> hangs execution is customizable, via "PcdDebugPropertyMask", unlike
> CpuDeadLoop(). In many cases, ASSERT() uses CpuDeadLoop() itself, so the
> effect is the same -- the explicit CpuDeadLoop is not reached. In other
> configs, ASSERT() can raise a debug exception (CpuBreakpoint()).

I absolutely do not *expect* Tom to change this, it was just a slight=20
remark (as many places have this anyway). I'll still try to explain why=20
I made that remark, but for whom it is of no interest, I do not expect=20
it to be read. I'm fine with the patch as-is myself. Thank you a lot, Tom!


I know it, unfortunately, is a pattern in EDK II - taking this pattern=20
too far is what caused the 8-revision patch regarding untrusted inputs=20
we submitted previously. :)

There are many concerns about unconventional ASSERTs, though I must=20
admit none but one (and that one barely) really apply here, which is why=
=20
I have trouble explaining why I believe it should be changed. Here are=20
some reasons outside the context of this patch:

1) Consistency between DEBUG and RELEASE builds: I think one can justify=
=20
to have a breakpoint on a condition that may realistically occur. But a=20
deadloop can give a wrong impression about how production code works.=20
E.g. it also is a common pattern in EDK II to ASSERT on memory=20
allocation failure but *not* have a proper check after, so DEBUG builds=20
will nicely error or deadloop, while RELEASE goes ahead and causes a CPU=
=20
exception or memory corruption depending on the context. Thus,=20
real-world error handling cannot really be tested. This does not apply=20
because there *is* a RELEASE deadloop.

2) Static analysis: Some static analysers use ASSERT information for=20
their own analysis, and try to give hints about unsafe or unreachable=20
code based on own annotations. This kind of applies, but only when=20
substituting EDK II ASSERT with properly recognisable ASSERTs (e.g.=20
__builtin_unreachable).

2) Dynamic analysis: ASSERTs can be useful when fuzzing for example.=20
Enabled Sanitizers will only catch unsafe behaviour, but maybe you have=20
some extra code in place to sanity-check the results further. An ASSERT=20
yields an error dump (usually followed by the worker dying). However, as=
=20
allocation failures are perfectly expected, this can cause a dramatic=20
about of False Positives and testing interruption. This does not apply=20
because deadloop'd code cannot really be fuzz-tested anyway.

ASSERTs really are designed as unbreakable conditions, i.e. 1)=20
preconditions 2) invariants 3) postconditions. No allocator in early=20
kernel-space or lower can really guarantee allocation success, thus it=20
cannot be a postcondition of any such function. And while it might make=20
debugging look a bit easier (but you will see from the backtrace anyway=20
where you halted), it messes with all tools that assume proper usage.

Also, I just realised, you can of course see it from the address value=20
when debugging, but you cannot see it from the ASSERT or DEBUG message=20
*which* of the two logical error conditions failed (i.e. broken=20
allocator or OOM). Changing the ASSERT would fix that. :)

Best regards,
Marvin


>=20
> The required part of the pattern is CpuDeadLoop(); the DEBUG message
> makes it more debugging-friendly, and the ASSERT(), with the tweakable
> "hang method", makes it even more debugging-friendly.
>=20
> Thanks
> Laszlo
>=20
>=20
>=20
>=20
>=20
>=20