From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mout01.posteo.de (mout01.posteo.de [185.67.36.65]) by mx.groups.io with SMTP id smtpd.web08.11454.1621203356473843269 for ; Sun, 16 May 2021 15:15:57 -0700 Authentication-Results: mx.groups.io; dkim=fail reason="body hash did not verify" header.i=@posteo.de header.s=2017 header.b=J9EhWOSG; spf=pass (domain: posteo.de, ip: 185.67.36.65, mailfrom: mhaeuser@posteo.de) Received: from submission (posteo.de [89.146.220.130]) by mout01.posteo.de (Postfix) with ESMTPS id B4027240028 for ; Mon, 17 May 2021 00:15:54 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.de; s=2017; t=1621203354; bh=ZcSN0jrYOloJrm4TKPzk7mPHcMeoakyy/FbGwBOv4Dw=; h=From:Subject:To:Cc:Date:From; b=J9EhWOSGobGbY08cj0mxs9nVykgPntiDoMYhU9ME+MlZYTM2UkauD1QNaXcWNV1VF h6Lr9i9dSiaTEPY4tVvCAS3h7J7BfeT/RXWNr9k2yN6wDB14nCqf7ZzghjAP/H2I6W H5g+zhmcar3PRq3fcFqKRRs2GvP9CkTFVj+tldsb3tU95JRfJwGzcfd4YrVpDgZuOR mljbnc3w3hEPvsH8TAdb/GbsX4pS2WwG43GPIsQIkCTw7gobZ9pcP66C0IqiGm3kA9 rQHa/790Ng8evcKdXhETtxFNeWWmFDTnfafiYGHgNAj1Ap6oy3y3n0GUOfPHVxvPnJ nWt9IXdfJVPyw== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4FjxRY2cYsz6tmB; Mon, 17 May 2021 00:15:53 +0200 (CEST) From: =?UTF-8?B?TWFydmluIEjDpHVzZXI=?= Subject: Re: [edk2-devel] [PATCH] UefiCpuPkg/MpInitLib: Allocate a separate SEV-ES AP reset stack area To: devel@edk2.groups.io, lersek@redhat.com, thomas.lendacky@amd.com Cc: Brijesh Singh , Eric Dong , Ray Ni , Rahul Kumar References: <1162d1f4fe09048aaafba6f6ea3046bebd34fc2b.1620766224.git.thomas.lendacky@amd.com> <414ed071-6bed-4e41-400d-7d958b27ae3b@redhat.com> Message-ID: <19890a20-d5fb-f793-660f-f72ee610bb68@posteo.de> Date: Sun, 16 May 2021 22:15:52 +0000 MIME-Version: 1.0 In-Reply-To: <414ed071-6bed-4e41-400d-7d958b27ae3b@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: quoted-printable Am 16.05.2021 um 03:17 schrieb Laszlo Ersek: > On 05/14/21 17:44, Marvin H=C3=A4user wrote: >> On 14.05.21 17:23, Lendacky, Thomas wrote: >>> On 5/14/21 10:04 AM, Marvin H=C3=A4user wrote: >=20 >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // Check to be sure that the "alloca= te below" behavior hasn't >>>>> changed. >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // This will also catch a failed all= ocation, as "-1" is >>>>> returned on >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // failure. >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (CpuMpData->SevEsAPResetStackStar= t >=3D >>>>> CpuMpData->WakeupBuffer) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 DEBUG ((DEBUG_ERROR, >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "SEV-ES AP r= eset stack is not below wakeup buffer\n")); >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ASSERT (FALSE); >>>> Should the ASSERT not only catch the broken "allocate below" behaviou= r, >>>> i.e. not trigger on failed allocation? >>> I think it's best to trigger on a failed allocation as well rather tha= n >>> continuing and allowing a page fault or some other problem to occur. >> >> Well, it should handle the error in a safe way, i.e. the deadloop below= . >> To not ASSERT on plausible conditions is a common design guideline in >> most low-level projects including Linux kernel. >> >> Best regards, >> Marvin >> >>> Thanks, >>> Tom >>> >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 CpuDeadLoop (); >=20 > "DEBUG + ASSERT(FALSE) + CpuDeadLoop()" is a pattern in edk2. >=20 > In RELEASE builds, it will lead to a CpuDeadLoop(). That's the main goal > -- don't continue execution if the condition controlling the whole block > fired. >=20 > In DEBUG and NOOPT builds, the pattern will lead to a debug message > (usually at the "error" level), followed by an assertion failure. The > error message of the assertion failure is irrelevant ("FALSE"). The > point of adding ASSERT ahead of CpuDeadLoop() is that the way ASSERT > hangs execution is customizable, via "PcdDebugPropertyMask", unlike > CpuDeadLoop(). In many cases, ASSERT() uses CpuDeadLoop() itself, so the > effect is the same -- the explicit CpuDeadLoop is not reached. In other > configs, ASSERT() can raise a debug exception (CpuBreakpoint()). I absolutely do not *expect* Tom to change this, it was just a slight=20 remark (as many places have this anyway). I'll still try to explain why=20 I made that remark, but for whom it is of no interest, I do not expect=20 it to be read. I'm fine with the patch as-is myself. Thank you a lot, Tom! I know it, unfortunately, is a pattern in EDK II - taking this pattern=20 too far is what caused the 8-revision patch regarding untrusted inputs=20 we submitted previously. :) There are many concerns about unconventional ASSERTs, though I must=20 admit none but one (and that one barely) really apply here, which is why= =20 I have trouble explaining why I believe it should be changed. Here are=20 some reasons outside the context of this patch: 1) Consistency between DEBUG and RELEASE builds: I think one can justify= =20 to have a breakpoint on a condition that may realistically occur. But a=20 deadloop can give a wrong impression about how production code works.=20 E.g. it also is a common pattern in EDK II to ASSERT on memory=20 allocation failure but *not* have a proper check after, so DEBUG builds=20 will nicely error or deadloop, while RELEASE goes ahead and causes a CPU= =20 exception or memory corruption depending on the context. Thus,=20 real-world error handling cannot really be tested. This does not apply=20 because there *is* a RELEASE deadloop. 2) Static analysis: Some static analysers use ASSERT information for=20 their own analysis, and try to give hints about unsafe or unreachable=20 code based on own annotations. This kind of applies, but only when=20 substituting EDK II ASSERT with properly recognisable ASSERTs (e.g.=20 __builtin_unreachable). 2) Dynamic analysis: ASSERTs can be useful when fuzzing for example.=20 Enabled Sanitizers will only catch unsafe behaviour, but maybe you have=20 some extra code in place to sanity-check the results further. An ASSERT=20 yields an error dump (usually followed by the worker dying). However, as= =20 allocation failures are perfectly expected, this can cause a dramatic=20 about of False Positives and testing interruption. This does not apply=20 because deadloop'd code cannot really be fuzz-tested anyway. ASSERTs really are designed as unbreakable conditions, i.e. 1)=20 preconditions 2) invariants 3) postconditions. No allocator in early=20 kernel-space or lower can really guarantee allocation success, thus it=20 cannot be a postcondition of any such function. And while it might make=20 debugging look a bit easier (but you will see from the backtrace anyway=20 where you halted), it messes with all tools that assume proper usage. Also, I just realised, you can of course see it from the address value=20 when debugging, but you cannot see it from the ASSERT or DEBUG message=20 *which* of the two logical error conditions failed (i.e. broken=20 allocator or OOM). Changing the ASSERT would fix that. :) Best regards, Marvin >=20 > The required part of the pattern is CpuDeadLoop(); the DEBUG message > makes it more debugging-friendly, and the ASSERT(), with the tweakable > "hang method", makes it even more debugging-friendly. >=20 > Thanks > Laszlo >=20 >=20 >=20 >=20 >=20 >=20