From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by mx.groups.io with SMTP id smtpd.web08.34675.1658187522434229279 for ; Mon, 18 Jul 2022 16:38:42 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@gmail.com header.s=20210112 header.b=f4K6G18r; spf=pass (domain: gmail.com, ip: 209.85.208.181, mailfrom: benjamin.doron00@gmail.com) Received: by mail-lj1-f181.google.com with SMTP id w2so15495744ljj.7 for ; Mon, 18 Jul 2022 16:38:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bSQFgBxoWPoac/pkLerf/oc5OOG0l7wSx9W8CtueCtQ=; b=f4K6G18riTgzKxv43iOpgPN07BI6movySsTl7cyvppVhUg91PDZSds2cuBTyet3uJf +DlCQ7PWY4cbHCCl4nmt+u+43TL2BGxyk91ooN4EGr8C+meMm1ysgO6aeMko6LVrYDPO XLdkQDXGQHLhDwvrEkWRcTt6NyKNoRzJ+4TqdCQKa3vxrwkV+G62PE3zZ20Sv4UcS4VI 3WndxAoItAaTu9BdFjy76xoFo9ouwIY7kEksQH0o6PuJYoX8XzSmrfQ/DZAU0v+W9lLq EiZS36UF3xTPyQoFbcibUaf4emt0GYQPltYRI8IDveBtM/ryyNW34XKE6L07N+G0jVTO 4K8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bSQFgBxoWPoac/pkLerf/oc5OOG0l7wSx9W8CtueCtQ=; b=l13Lf5SDl2lnc+XV39tlfPkXgV4+5bx+ijWSTrtPgxxeoOPX7OzFOSCQBlU2Jth195 TAYGYf8qkTVi/rexlBINrVNt5rEDWlCuMnONdBoLj3MeFUrsAaje1G57savFo3NfecMV obf4v6DqvozdwIaY2rWttwcFTzZSWntjs1JxzAPDGluIstCti/tusScWF3rPfmABIzer dPgp157U1XAkCoQlpqqigtFCcn2VTOYyeik10tu5yblYnuAg7GUyzch+UmT7b+c3e59B Y9+lh8No7L0MQXek3NlZt4DToLyU/2riBs25wjUirhP+WGYhlbgAYJTDyV8Rivyy81ig guwA== X-Gm-Message-State: AJIora8Pw6SM3JNLZbFSXxilGNPUdF5Voy9fWivNwWyMYWAP1dqxH6qb CusSub3vqcFO5eSTbFNskkOXoDEl0/AH1VKUHBuMrqEgtrM= X-Google-Smtp-Source: AGRyM1serKoEEMLpXKvty4sNT4TBl4Rg+fh7LsksEO5oJVqQz+4gf+vCnv3aNX23+aeta875LzkXMOq7kWj/u6MFe2A= X-Received: by 2002:a2e:998d:0:b0:25d:b1e1:d19 with SMTP id w13-20020a2e998d000000b0025db1e10d19mr5023403lji.167.1658187520299; Mon, 18 Jul 2022 16:38:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "Benjamin Doron" Date: Mon, 18 Jul 2022 19:38:29 -0400 Message-ID: Subject: Re: [GSoC 2022] Implementing S3 resume for MinPlatform To: devel@edk2.groups.io Cc: "Desimone, Nathaniel L" , "Sinha, Ankit" Content-Type: multipart/alternative; boundary="0000000000001e585c05e41cde2a" --0000000000001e585c05e41cde2a Content-Type: text/plain; charset="UTF-8" After thinking about it some more, I suspect this is an issue in the debug library stack specific to BootScriptExecutorDxe. Being after exit-BS, it is in some ways a runtime driver. The BootScriptExecutorDxe that is executed from the lockbox was copied there at ReadyToLock, before the library receives its event to flip the mEndOfBootServices variable inside the data section (as I understand). Therefore, on resume, the library will wrongly try to use boot services, despite this being impossible and seemingly protected by the variable. I suppose the only fix is another library instance. Best regards, Benjamin On Mon, 18 Jul 2022 at 16:33, Benjamin Doron wrote: > Hi all, > I've been working on implementing S3 resume support for MinPlatform during > the past few weeks. Presently, the last line of code that I know will > execute on resume flows is > https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Universal/Acpi/S3Resume2Pei/S3Resume.c#L878 > - right before transferring control to BootScriptExecutorDxe. > > I had added a debug print at > https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Universal/Acpi/BootScriptExecutorDxe/ScriptExecute.c#L47 > to ensure that control was successfully passed here, but it never executes > and the platform doesn't resume. I've considered that it may be a debug > library-specific issue, and I've been fixing some of those (but that > certainly may still require debugging). However, after addressing that, the > bug still too predictably occurs here. Therefore, what other assumptions > are made for the jump here to succeed? > > So far, I've considered: > - DxePcdLib could try calling the protocol after exit-BS, which is > guaranteed to fail (then page fault). However, I've checked the disassembly > and it's not used on resume flow. This is fine. > - DebugLibSerialPort is used for this module, because RSC's serial port > handler is unregistered at exit-BS. This should now be fine. > > Some (potentially) plausible architectural issues: > - Page tables are used in long mode. Maybe I could verify these are sane > by looking up the structure and printing each entry's fields, but they are > probably fine. > - Maybe BootScriptStackSize is too small? I sort of doubt it from looking > at the disassembly. Also, even if the stack overflows, I'd expect the > earlier debug prints to succeed. > - Maybe my added debug print in S3BootScriptExecutorEntryFunction() is a > problem? However, it's the IDTR that's written, not the GDTR. I'd expect > that to only be an issue if an interrupt is fired. Also, SmmRestoreCpu() > does the same. As I understand, normally there is an enormous difference > between DXE and SMM, because SMM has some resume state in some CPU MSRs > (etc), but I think here PiSmmCpuDxeSmm is being entered as if it were mere > 64-bit code, like DXE. > > Best regards, > Benjamin > --0000000000001e585c05e41cde2a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
After thinking about it some more, I suspect this is = an issue in the debug library stack specific to BootScriptExecutorDxe. Bein= g after exit-BS, it is in some ways a runtime driver.

The BootScriptExecutorDxe that is executed from the lockbox was cop= ied there at ReadyToLock, before the library receives its event to flip the= mEndOfBootServices variable inside the data section (as I understand). The= refore, on resume, the library will wrongly try to use boot services, despi= te this being impossible and seemingly protected by the variable.
=

I suppose the only fix is another library instance.
=

Best regards,
Benjamin


On Mon, 18 Jul 2022 at 16:= 33, Benjamin Doron <benjam= in.doron00@gmail.com> wrote:
Hi all,
I've been w= orking on implementing S3 resume support for MinPlatform during the past fe= w weeks. Presently, the last line of code that I know will execute on resum= e flows is https://= github.com/tianocore/edk2/blob/master/UefiCpuPkg/Universal/Acpi/S3Resume2Pe= i/S3Resume.c#L878 - right before transferring control to BootScriptExec= utorDxe.

I had added a debug print at https://github.c= om/tianocore/edk2/blob/master/MdeModulePkg/Universal/Acpi/BootScriptExecuto= rDxe/ScriptExecute.c#L47 to ensure that control was successfully passed= here, but it never executes and the platform doesn't resume. I've = considered that it may be a debug library-specific issue, and I've been= fixing some of those (but that certainly may still require debugging). How= ever, after addressing that, the bug still too predictably occurs here. The= refore, what other assumptions are made for the jump here to succeed?
=

So far, I've considered:
- DxePcdLib coul= d try calling the protocol after exit-BS, which is guaranteed to fail (then= page fault). However, I've checked the disassembly and it's not us= ed on resume flow. This is fine.
- DebugLibSerialPort is used for= this module, because RSC's serial port handler is unregistered at exit= -BS. This should now be fine.

Some (potentially) p= lausible architectural issues:
- Page tables are used in long= mode. Maybe I could verify these are sane by looking up the structure and = printing each entry's fields, but they are probably fine.
- M= aybe BootScriptStackSize is too small? I sort of doubt it from looking at t= he disassembly. Also, even if the stack overflows, I'd expect the earli= er debug prints to succeed.
- Maybe my added debug print in S3BootScriptExecutorEntryFunction() is a problem? However, it's= the IDTR that's written, not the GDTR. I'd expect that to only be = an issue if an interrupt is fired. Also, SmmRestoreCpu() does the same. As = I understand, normally there is an enormous difference between DXE and=20 SMM, because SMM has some resume state in some CPU MSRs (etc), but I think= =20 here PiSmmCpuDxeSmm is being entered as if it were mere 64-bit code,=20 like DXE.

<= div>Best regards,
Benjamin
--0000000000001e585c05e41cde2a--