On 4/14/23 16:39, Ard Biesheuvel wrote:
> On Fri, 14 Apr 2023 at 22:23, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> I've been trying to debug a problem I'm seeing when I moved to the GCC 12
>> compiler. Under SEV it results in the guest crashing.
>>
>> I narrowed the issue down to the call to TemporaryRamMigration() in
>> PeiCheckAndSwitchStack() of MdeModulePkg/Core/Pei/Dispatcher/Dispatcher.c.
>>
>> I get this output on GCC11:
>> Old Stack size 32768, New stack size 131072
>> Stack Hob: BaseAddress=0x3BF76000 Length=0x20000
>> Heap Offset = 0x3B786000 Stack Offset = 0x3B776000
>> *** DEBUG: PeiCheckAndSwitchStack:851 - SecCoreData=3BF95D20
>> TemporaryRamMigration(0x810000, 0x3BF8E000, 0x10000)
>> *** DEBUG: PeiCheckAndSwitchStack:871 - SecCoreData=3BF95D20
>>
>> and everything is good.
>>
>> However, I get this output on GCC12:
>> Old Stack size 32768, New stack size 131072
>> Stack Hob: BaseAddress=0x3BF76000 Length=0x20000
>> Heap Offset = 0x3B786000 Stack Offset = 0x3B776000
>> *** DEBUG: PeiCheckAndSwitchStack:851 - SecCoreData=3BF95D20
>> TemporaryRamMigration(0x810000, 0x3BF8E000, 0x10000)
>> *** DEBUG: PeiCheckAndSwitchStack:871 - SecCoreData=7770BD20
>> MMIO using encrypted memory: 7770BD48
>> !!!! X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID - 00000000 !!!!
>>
>> and terminate because SecCoreData has been corrupted and points to an
>> address in an MMIO range (this is an SEV-ES/SEV-SNP example).
>>
>> As near as I can tell from looking at the object code, on GCC12 it looks
>> like the SecCoreData value is stored in the RBP register, which appears to
>> be getting corrupted when calling TemporaryRamMigration().
>>
>> Does anyone have any thoughts on this?
>>
>
> The stack switching logic in OvmfPkg/Sec/SecMain.c looks highly dubious to me.
>
> LongJump() can be used to do a long return, i.e., it allows to return
> from several levels deep in the call stack to back up to where
> SetJump() was called. However, using LongJump() to return to the
> caller with a different stack is, quite frankly, insane, and I'm
> surprised it didn't break a lot sooner.
>
> In this particular case, RBX gets updated along with RSP, presumably
> because the code assumes it is being used as a frame pointer? Are you
> building with -fomit-frame-pointer perhaps?
Looks like our emails crossed paths... turns out I was on the wrong
branch for my testing and didn't have ff36b2550f94 ("OvmfPkg/Sec: fix
stack switch").
So you can disregard, but thanks for taking a look.
Thanks,
Tom