Hi Laszlo, Thanks for your comments, I send out v2 change. I agree with you suggest adding PCD to control this check. Detail see v2 change. Add other comments inline below. From: Laszlo Ersek Sent: Thursday, September 3, 2020 3:59 PM To: Dong, Eric ; devel@edk2.groups.io Cc: Ni, Ray Subject: Re: [edk2-devel] [PATCH] UefiCpuPkg/MpInitLib: Add check for CR3/GDT/IDT. On 09/03/20 03:47, Dong, Eric wrote: > Hi Laszlo, > > Thanks for your detail review and good comments, add my reply in > below. > > This issue was reported by tester. They use a shell application to do > the test. In the test, it first moves the page table to above 4G range > then calls StartUpThisAp to let AP run the test procedure. The system > will hang during the test. > > Add more comments in below inline. Thanks. Let me summarize my understanding: (1) During normal boot, and normal MP services PPI usage, and normal MP services protocol usage, there is no issue. [Eric] Correct. (2) Right now, we have not identified the exact edk2 core modules and locations that are responsible for allocating the IDT / GDT / page tables. [Eric] I don’t check the IDT/GDT yet. For page table, it created by DxeIpl driver and used for later boot phase. Nonetheless, the allocations are all placed in the 32-bit address space, and so there is no problem with AP startup, during normal usage (see point (1)). [Eric] Correct. (3) There is a UEFI shell application that isn't more closely identified in this discussion. It first moves at least one of the IDT / GDT / page tables above 4GB, and then calls some member functions of the MP services protocol. This causes a hang, experienced during the execution of the UEFI shell application. [Eric] Correct. (4) The present patch recognizes the issue in FillExchangeInfoData(), and returns an error. In the relevant use case (UEFI shell application), this causes the called MP services protocol member function to fail, synchronously. This means the application still doesn't work, but there is no hang at least. [Eric] Correct. Is my understanding correct? If it is, then I have the following comments: - Please file a TianoCore BZ for this issue, and capture the use case in it (= UEFI shell application that changes the IDT / GDT / page tables base). Please feel free to copy parts of the above description, if you think that's useful. - I'm quite doubtful that the use case is valid, in the first place. A UEFI application is supposed to consume the services described in the UEFI specification. Messing with the page tables is something that a UEFI application should *not* do, in my opinion. Such page table manipulation is expected to interfere with various DXE drivers in the platform firmware. - Another comment on the UEFI shell application, and the present patch, is that their *combination* will still break ExitBootServices(). Assume that the patch is applied, and the UEFI shell application is invoked. The application now fails gracefully, and exits. Then we attempt to boot an OS (this is a valid thing to do). Because the application moved the IDT / GDT / page tables "out of range", MpInitChangeApLoopCallback() will do the wrong thing. - Most importantly: in MpInitLib, there is a *large* amount of call sites, and a large number of call paths that lead to WakeUpAP(), and ultimately to FillExchangeInfoData(). This means that changing the return type of FillExchangeInfoData() from VOID to EFI_STATUS has a "ripple effect" -- many call paths would have to deal with error checking, error propagation, and resource release (!!!) along those error paths. - For example, your current patch leaks resources in WakeUpAP(), when FillExchangeInfoData() fails -- see AllocateResetVector() and AllocateSevEsAPMemory(). - For another example, your current patch does not handle several WakeUpAP() call sites, such as the ones in ResetProcessorToIdleState() and CheckAllAPs(). Whereas in reality, from the applications point of view, we only need the MP services protocol member functions to fail cleanly. Therefore we should address this problem *early*; that is, *much less deep* in the call stack. I suggest the following: - introduce a feature PCD (default value FALSE) - modify the following functions in MpInitLib: - MpInitLibStartupAllAPs - MpInitLibStartupThisAP - MpInitLibSwitchBSP - MpInitLibEnableDisableAP [Eric] I found above 4 API need to wake up AP to do the task, so I add code to check the CR3/GDT/IDT. Below 3 API does not need to wake up AP, so I don’t add the check. - MpInitLibGetNumberOfProcessors - MpInitLibGetProcessorInfo - MpInitLibWhoAmI - each modified function should check the feature PCD *very early*. If the PCD is true, then the function should pre-emptively verify the IDT / GDT / root page table location. If any one of those objects is outside of the 32-bit address space, then the function should fail immediately. - We need to consider the event handlers in DxeMpInitLib: - CheckApsStatus() is not a problem, because the above short-circuits in the public MpInitLib functions will guarantee that "mStopCheckAllApsStatus" is always TRUE. - For exit-boot-services and legacy-boot, we need to modify MpInitChangeApLoopCallback() however. MpInitChangeApLoopCallback() should do the same as the above-listed functions (check the PCD, check the IDT / GDT / root page table), and if one of them is above 4GB, then MpInitChangeApLoopCallback() should hang *hard* -- call CpuDeadLoop() --, even in RELEASE builds. This is because the UEFI application in question makes OS boot impossible, so we should stop the system as early as we realize that. - Of course, I suggest a helper function to encapsulate the PCD check, and the GDT / IDT / CR3 checks. [Eric] good suggestion, I will follow up to provide v2 changes. This will let most platforms ignore the special UEFI shell application use case (which I do feel is invalid). In addition, platforms that do want to handle this use case, can set the PCD to TRUE for the CpuDxe module *only* (so that CpuMpPei is not penalized). Finally, this approach keeps the code clean; long call paths to FillExchangeInfoData() are not made more complex. Thanks Laszlo