public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Andrew Fish <afish@apple.com>
Cc: Bill Paul <wpaul@windriver.com>,
	 Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com>,
	 "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
Date: Fri, 21 Sep 2018 17:14:24 -0700	[thread overview]
Message-ID: <CAKv+Gu8VfOWvQCRhhO8pf8TMaedzNx=CCjxE30B2PgYwRYL_dQ@mail.gmail.com> (raw)
In-Reply-To: <4F31CED5-DCB4-4834-A0F2-52D410C2F5BF@apple.com>

On Fri 21 Sep 2018 at 16:57, Andrew Fish <afish@apple.com> wrote:

>
>
> > On Sep 21, 2018, at 4:24 PM, Vladimir Olovyannikov <
> vladimir.olovyannikov@broadcom.com> wrote:
> >
> > On Thu, Sep 20, 2018 at 2:52 PM Vladimir Olovyannikov
> > <vladimir.olovyannikov@broadcom.com> wrote:
> >>
> >> On Wed, Sep 19, 2018 at 5:21 PM Bill Paul <wpaul@windriver.com> wrote:
> >>>
> >>> Of all the gin joints in all the towns in all the world, Vladimir
> >>> Olovyannikov
> >>> had to walk into mine at 16:58 on Wednesday 19 September 2018 and say:
> >>>
> >>>>> From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
> >>>>> Sent: Wednesday, September 19, 2018 4:38 PM
> >>>>> To: Vladimir Olovyannikov
> >>>>> Cc: edk2-devel@lists.01.org
> >>>>> Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
> >>>>> Armv8 platform
> >>>>>
> >>>>>
> >>>>> On 19 September 2018 at 15:55, Vladimir Olovyannikov
> >>>>>
> >>>>> <vladimir.olovyannikov@broadcom.com> wrote:
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I need UEFI experts help on the problem with Armv8 board on warm
> UEFI
> >>>>>> reset.
> >>>>>> Cold reset works fine.
> >>>>>>
> >>>>>> Here is how I set up a warm reset:
> >>>>>>
> >>>>>> STATIC
> >>>>>> EFI_STATUS
> >>>>>> ShutdownUefiBootServices (
> >>>>>>
> >>>>>> VOID
> >>>>>> )
> >>>>>>
> >>>>>> {
> >>>>>>
> >>>>>> EFI_STATUS              Status;
> >>>>>> UINTN                   MemoryMapSize;
> >>>>>> EFI_MEMORY_DESCRIPTOR   *MemoryMap;
> >>>>>> UINTN                   MapKey;
> >>>>>> UINTN                   DescriptorSize;
> >>>>>> UINT32                  DescriptorVersion;
> >>>>>> UINTN                   Pages;
> >>>>>>
> >>>>>> MemoryMap = NULL;
> >>>>>> MemoryMapSize = 0;
> >>>>>> Pages = 0;
> >>>>>>
> >>>>>> do {
> >>>>>>
> >>>>>>   Status = gBS->GetMemoryMap (
> >>>>>>
> >>>>>>                   &MemoryMapSize,
> >>>>>>                   MemoryMap,
> >>>>>>                   &MapKey,
> >>>>>>                   &DescriptorSize,
> >>>>>>                   &DescriptorVersion
> >>>>>>                   );
> >>>>>>
> >>>>>>   if (Status == EFI_BUFFER_TOO_SMALL) {
> >>>>>>
> >>>>>>     Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
> >>>>>>     MemoryMap = AllocatePages (Pages);
> >>>>>>
> >>>>>>     //
> >>>>>>     // Get System MemoryMap
> >>>>>>     //
> >>>>>>     Status = gBS->GetMemoryMap (
> >>>>>>
> >>>>>>                     &MemoryMapSize,
> >>>>>>                     MemoryMap,
> >>>>>>                     &MapKey,
> >>>>>>                     &DescriptorSize,
> >>>>>>                     &DescriptorVersion
> >>>>>>                     );
> >>>>>>
> >>>>>>   }
> >>>>>>
> >>>>>>   // Don't do anything between the GetMemoryMap() and
> >>>>>>   ExitBootServices() if (!EFI_ERROR(Status)) {
> >>>>>>
> >>>>>>     Status = gBS->ExitBootServices (gImageHandle, MapKey);
> >>>>>>     if (EFI_ERROR(Status)) {
> >>>>>>
> >>>>>>       FreePages (MemoryMap, Pages);
> >>>>>>       MemoryMap = NULL;
> >>>>>>       MemoryMapSize = 0;
> >>>>>>
> >>>>>>     }
> >>>>>>
> >>>>>>   }
> >>>>>>
> >>>>>> } while (EFI_ERROR(Status));
> >>>>>>
> >>>>>> return Status;
> >>>>>>
> >>>>>> }
> >>>>>>
> >>>>>> Then perform
> >>>>>> ArmCleanDataCache ();
> >>>>>> ArmInvalidateDataCache ();
> >>>>>> ArmDisableInstructionCache ();
> >>>>>> ArmInvalidateInstructionCache ();
> >>>>>
> >>>>> These don't do anything useful on ARM. You can only reliably perform
> >>>>> cache
> >>>>> maintenance by virtual address.
> >>>>
> >>>> So, should I just remove them altogether?
> >>>>
> >>>>>> ArmDisableMmu ();
> >>>>>
> >>>>> ... so after this call returns, all bets are off with regards to
> >>>>> whether
> >>>>> what is popped from the stack is actually what we pushed when we
> >>>>> entered
> >>>>> the function.
> >>>>
> >>>> OK, thank you for explanation.
> >>>> But this call returns back into ResetLib implementation as it should,
> >>>> and
> >>>> then there is a direct jump to the start of FV.
> >>>> Am I doing anything wrong here?
> >>>> Then, up to the point of enabling of MMU the stack is OK. But right
> >>>> after
> >>>> enabling MMU it points at _ModuleEntryPoint end of function in
> >>>> DxeCoreEntryPoint.c
> >>>> Am I missing anything? Maybe some stack cleanup before jumping to the
> >>>> start
> >>>> of FV?
> >>>
> >>> When the MMU is enabled, does the mapping for the stack pages change?
> That
> >>> is,
> >>> could the stack now be mapped to different physical page now?
> >> Thanks for ideas Bill,
> >> No, the mapping stays the same.
> >> The issue is only with warm reset, and only on an A72 board.
> >> There is another platform on A53 sharing the same code, which has no
> issues
> >> with warm reset.
> >> I cannot explain why.
> >>>
> >>> Instead of showing a stack trace, can you dump the stack pages and
> compare
> >>> the
> >>> before and after contents?
> >> I can clearly see that before and after contents are different.
> >>>
> >>> Assuming the same physical memory pages are still being used, then
> there
> >>> could
> >>> be a cache flushing problem. What could happen is:
> >>>
> >>> - some stack memory has been touched recently and is now in the data
> cache
> >>> - changes are made, which are written to the cache, but not yet flushed
> >>> out to
> >>> RAM
> >>> - enabling the MMU causes a full invalidate of the cache
> >>>
> >>> Now when you look at the stack, you see the earlier contents that were
> in
> >>> RAM
> >>> -- the changes previously only written to the cache have been lost.
> >>>
> >>> Enabling/disabling caches and MMU is always tricky. I can't say for
> sure,
> >>> but
> >>> I wouldn't be surprised if there's some subtle bug that causes a flush
> >>> operation to be missed and things may just work by coincidence in the
> cold
> >>> start case.
> >> I might be missing something preparing for warm reset.
> >> Disabling interrupts does not help though.
> >> Ard, I switched off all DMA-capable devices, so am just booting into
> UEFI
> >> with no disks or network,
> >> disabled interrupts. The issue is here. Any ideas on how to debug it and
> >> fix?
> > Update: when I add this as an experiment:
> > UINTN StackBottom;
> >
> > __asm__ volatile ("mov %0,SP" : "=r"(StackBottom));
> > WriteBackInvalidateDataCacheRange((VOID *)StackBottom,
> >    PcdGet64(PcdSystemMemorySize) +
> >    PcdGet64(PcdSystemMemoryBase) - StackBottom);
> > then the stack data were not corrupted anymore on the next
> ArmEnableMmu() call,
> > and warm reset worked (though unreliably, can throw exception on
> > memcpy down the road on UEFI boot; probably because I invalidated only
> > the stack area in the experiment).
> > Considering that A53 board does not have issues, does this means that
> > ArmInvalidateDataCache() implementation is useless for A72?
> > Based on this, should the approach be "find all data regions and
> > invalidate them using InvalidateDataCacheRange()"?
>
> Vladimir,
>
> We hit an issue like this a while back on x86 and it turned out our
> sequence was dependent on the C compiler code generation not touching a
> specific register. It might be worth while to disassemble this code and
> take a look at the assembler. I'd also point out the __asm__ volatile is a
> serializing event to the C memory model so it could change how the C code
> was optimized. You could try  __asm__ volatile with a no-op instruction to
> see if it really is the SP read at this point that fixes the issue. But it
> is likely the bug will be more obvious if you look at the assembly.
>

Is your primary FV hosted in DRAM? Are you sure it has not been corrupted
by the time you attempt your warm reboot?

>



>
> >>>
> >>> -Bill
> >>>
> >>>>>> Then jump to start of FV:
> >>>>>>
> >>>>>> typedef
> >>>>>> VOID
> >>>>>>
> >>>>>> (EFIAPI *START_FV)(
> >>>>>>
> >>>>>> VOID
> >>>>>>
> >>>>>> );
> >>>>>> StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
> >>>>>> StartOfFv ();
> >>>>>>
> >>>>>> Now this is what happens on warm reset:
> >>>>>> reset -c warm
> >>>>>> 1. Until ArmEnableMmu() gets called, everything works as expected.
> >>>>>>
> >>>>>>   Here is the stack right before ArmEnableMmu() is called:
> >>>>>>    ArmConfigureMmu+0x4f8
> >>>>>>    InitMmu+0x24
> >>>>>>    MemoryPeim+0x440
> >>>>>>    PrePiMain+0x114
> >>>>>>    PrimaryMain+0x68
> >>>>>>    CEntryPoint+0xC4
> >>>>>>    EL2:0x00000000800008BC
> >>>>>>    -----  End of stack info -----
> >>>>>>
> >>>>>> 2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
> >>>>>>   ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
> >>>>>>
> >>>>>> ArmConfigureMmu() in
> ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
> >>>>>> (return EFI_SUCCESS)
> >>>>>>
> >>>>>>  _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
> >>>>>>
> >>>>>> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines
> >>>>>> 59-60.
> >>>>>>
> >>>>>>  El2:0x000000008E5E8300 <-- Absolutely bogus
> >>>>>>
> >>>>>>   --- End of stack info ---
> >>>>>>
> >>>>>> So, as soon as ArmEnableMmu() exits, execution jumps directly to
> >>>>>> CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
> >>>>>>
> >>>>>> Would be grateful for any advice.
> >>>>>>
> >>>>>> Thank you,
> >>>>>> Vladimir
> >>>>
> >>>> _______________________________________________
> >>>> edk2-devel mailing list
> >>>> edk2-devel@lists.01.org
> >>>> https://lists.01.org/mailman/listinfo/edk2-devel
> >>> --
> >>>
> =============================================================================
> >>> -Bill Paul            (510) 749-2329 | Senior Member of Technical
> Staff,
> >>>                 wpaul@windriver.com | Master of Unix-Fu - Wind River
> >>> Systems
> >>>
> =============================================================================
> >>>   "I put a dollar in a change machine. Nothing changed." - George
> Carlin
> >>>
> =============================================================================
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>


  reply	other threads:[~2018-09-22  0:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-20 21:52 Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform Vladimir Olovyannikov
2018-09-21 23:24 ` Vladimir Olovyannikov
2018-09-21 23:57   ` Andrew Fish
2018-09-22  0:14     ` Ard Biesheuvel [this message]
2018-09-24 17:44       ` Vladimir Olovyannikov
  -- strict thread matches above, loose matches on Subject: below --
2018-09-19 22:55 Vladimir Olovyannikov
2018-09-19 23:38 ` Ard Biesheuvel
2018-09-19 23:58   ` Vladimir Olovyannikov
2018-09-19 23:56     ` Bill Paul
2018-09-20  0:09     ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKv+Gu8VfOWvQCRhhO8pf8TMaedzNx=CCjxE30B2PgYwRYL_dQ@mail.gmail.com' \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox