public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
@ 2018-09-19 22:55 Vladimir Olovyannikov
  2018-09-19 23:38 ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir Olovyannikov @ 2018-09-19 22:55 UTC (permalink / raw)
  To: Ard Biesheuvel, edk2-devel@lists.01.org

Hi All,



I need UEFI experts help on the problem with Armv8 board on warm UEFI reset.

Cold reset works fine.



Here is how I set up a warm reset:



STATIC

EFI_STATUS

ShutdownUefiBootServices (

  VOID

  )

{

  EFI_STATUS              Status;

  UINTN                   MemoryMapSize;

  EFI_MEMORY_DESCRIPTOR   *MemoryMap;

  UINTN                   MapKey;

  UINTN                   DescriptorSize;

  UINT32                  DescriptorVersion;

  UINTN                   Pages;



  MemoryMap = NULL;

  MemoryMapSize = 0;

  Pages = 0;



  do {

    Status = gBS->GetMemoryMap (

                    &MemoryMapSize,

                    MemoryMap,

                    &MapKey,

                    &DescriptorSize,

                    &DescriptorVersion

                    );

    if (Status == EFI_BUFFER_TOO_SMALL) {



      Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;

      MemoryMap = AllocatePages (Pages);



      //

      // Get System MemoryMap

      //

      Status = gBS->GetMemoryMap (

                      &MemoryMapSize,

                      MemoryMap,

                      &MapKey,

                      &DescriptorSize,

                      &DescriptorVersion

                      );

    }



    // Don't do anything between the GetMemoryMap() and ExitBootServices()

    if (!EFI_ERROR(Status)) {

      Status = gBS->ExitBootServices (gImageHandle, MapKey);

      if (EFI_ERROR(Status)) {

        FreePages (MemoryMap, Pages);

        MemoryMap = NULL;

        MemoryMapSize = 0;

      }

    }

  } while (EFI_ERROR(Status));



  return Status;

}



Then perform

ArmCleanDataCache ();

ArmInvalidateDataCache ();

ArmDisableInstructionCache ();

ArmInvalidateInstructionCache ();

ArmDisableMmu ();



Then jump to start of FV:



typedef

VOID

(EFIAPI *START_FV)(

  VOID

);

StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);

StartOfFv ();



Now this is what happens on warm reset:

reset -c warm

1. Until ArmEnableMmu() gets called, everything works as expected.

    Here is the stack right before ArmEnableMmu() is called:

     ArmConfigureMmu+0x4f8

     InitMmu+0x24

     MemoryPeim+0x440

     PrePiMain+0x114

     PrimaryMain+0x68

     CEntryPoint+0xC4

     EL2:0x00000000800008BC

     -----  End of stack info -----



2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :

    ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
(return EFI_SUCCESS)

   _ModuleEntryPoint+0x24 <-- Wrong. This points directly to ASSERT(FALSE);
and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines 59-60.

   El2:0x000000008E5E8300 <-- Absolutely bogus

    --- End of stack info ---



So, as soon as ArmEnableMmu() exits, execution jumps directly to
CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().



Would be grateful for any advice.



Thank you,

Vladimir


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
  2018-09-19 22:55 Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform Vladimir Olovyannikov
@ 2018-09-19 23:38 ` Ard Biesheuvel
  2018-09-19 23:58   ` Vladimir Olovyannikov
  0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2018-09-19 23:38 UTC (permalink / raw)
  To: Vladimir Olovyannikov; +Cc: edk2-devel@lists.01.org

On 19 September 2018 at 15:55, Vladimir Olovyannikov <
vladimir.olovyannikov@broadcom.com> wrote:

> Hi All,
>
>
>
> I need UEFI experts help on the problem with Armv8 board on warm UEFI
> reset.
>
> Cold reset works fine.
>
>
>
> Here is how I set up a warm reset:
>
>
>
> STATIC
>
> EFI_STATUS
>
> ShutdownUefiBootServices (
>
>   VOID
>
>   )
>
> {
>
>   EFI_STATUS              Status;
>
>   UINTN                   MemoryMapSize;
>
>   EFI_MEMORY_DESCRIPTOR   *MemoryMap;
>
>   UINTN                   MapKey;
>
>   UINTN                   DescriptorSize;
>
>   UINT32                  DescriptorVersion;
>
>   UINTN                   Pages;
>
>
>
>   MemoryMap = NULL;
>
>   MemoryMapSize = 0;
>
>   Pages = 0;
>
>
>
>   do {
>
>     Status = gBS->GetMemoryMap (
>
>                     &MemoryMapSize,
>
>                     MemoryMap,
>
>                     &MapKey,
>
>                     &DescriptorSize,
>
>                     &DescriptorVersion
>
>                     );
>
>     if (Status == EFI_BUFFER_TOO_SMALL) {
>
>
>
>       Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
>
>       MemoryMap = AllocatePages (Pages);
>
>
>
>       //
>
>       // Get System MemoryMap
>
>       //
>
>       Status = gBS->GetMemoryMap (
>
>                       &MemoryMapSize,
>
>                       MemoryMap,
>
>                       &MapKey,
>
>                       &DescriptorSize,
>
>                       &DescriptorVersion
>
>                       );
>
>     }
>
>
>
>     // Don't do anything between the GetMemoryMap() and ExitBootServices()
>
>     if (!EFI_ERROR(Status)) {
>
>       Status = gBS->ExitBootServices (gImageHandle, MapKey);
>
>       if (EFI_ERROR(Status)) {
>
>         FreePages (MemoryMap, Pages);
>
>         MemoryMap = NULL;
>
>         MemoryMapSize = 0;
>
>       }
>
>     }
>
>   } while (EFI_ERROR(Status));
>
>
>
>   return Status;
>
> }
>
>
>
> Then perform
>
> ArmCleanDataCache ();
>
> ArmInvalidateDataCache ();
>
> ArmDisableInstructionCache ();
>
> ArmInvalidateInstructionCache ();
>

These don't do anything useful on ARM. You can only reliably perform cache
maintenance by virtual address.

ArmDisableMmu ();
>
>
>

... so after this call returns, all bets are off with regards to whether
what is popped from the stack is actually what we pushed when we entered
the function.

> Then jump to start of FV:
>
>
>
> typedef
>
> VOID
>
> (EFIAPI *START_FV)(
>
>   VOID
>
> );
>
> StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
>
> StartOfFv ();
>
>
>
> Now this is what happens on warm reset:
>
> reset -c warm
>
> 1. Until ArmEnableMmu() gets called, everything works as expected.
>
>     Here is the stack right before ArmEnableMmu() is called:
>
>      ArmConfigureMmu+0x4f8
>
>      InitMmu+0x24
>
>      MemoryPeim+0x440
>
>      PrePiMain+0x114
>
>      PrimaryMain+0x68
>
>      CEntryPoint+0xC4
>
>      EL2:0x00000000800008BC
>
>      -----  End of stack info -----
>
>
>
> 2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
>
>     ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
> ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
> (return EFI_SUCCESS)
>
>    _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines 59-60.
>
>    El2:0x000000008E5E8300 <-- Absolutely bogus
>
>     --- End of stack info ---
>
>
>
> So, as soon as ArmEnableMmu() exits, execution jumps directly to
> CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
>
>
>
> Would be grateful for any advice.
>
>
>
> Thank you,
>
> Vladimir
>
>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
  2018-09-19 23:58   ` Vladimir Olovyannikov
@ 2018-09-19 23:56     ` Bill Paul
  2018-09-20  0:09     ` Ard Biesheuvel
  1 sibling, 0 replies; 10+ messages in thread
From: Bill Paul @ 2018-09-19 23:56 UTC (permalink / raw)
  To: edk2-devel; +Cc: Vladimir Olovyannikov, Ard Biesheuvel

Of all the gin joints in all the towns in all the world, Vladimir Olovyannikov 
had to walk into mine at 16:58 on Wednesday 19 September 2018 and say:

> >From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
> >Sent: Wednesday, September 19, 2018 4:38 PM
> >To: Vladimir Olovyannikov
> >Cc: edk2-devel@lists.01.org
> >Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
> >Armv8 platform
> >
> >
> >On 19 September 2018 at 15:55, Vladimir Olovyannikov
> >
> ><vladimir.olovyannikov@broadcom.com> wrote:
> >>Hi All,
> >>
> >>I need UEFI experts help on the problem with Armv8 board on warm UEFI
> >>reset.
> >>Cold reset works fine.
> >>
> >>Here is how I set up a warm reset:
> >>
> >>STATIC
> >>EFI_STATUS
> >>ShutdownUefiBootServices (
> >>
> >>  VOID
> >>  )
> >>
> >>{
> >>
> >>  EFI_STATUS              Status;
> >>  UINTN                   MemoryMapSize;
> >>  EFI_MEMORY_DESCRIPTOR   *MemoryMap;
> >>  UINTN                   MapKey;
> >>  UINTN                   DescriptorSize;
> >>  UINT32                  DescriptorVersion;
> >>  UINTN                   Pages;
> >>  
> >>  MemoryMap = NULL;
> >>  MemoryMapSize = 0;
> >>  Pages = 0;
> >>  
> >>  do {
> >>  
> >>    Status = gBS->GetMemoryMap (
> >>    
> >>                    &MemoryMapSize,
> >>                    MemoryMap,
> >>                    &MapKey,
> >>                    &DescriptorSize,
> >>                    &DescriptorVersion
> >>                    );
> >>    
> >>    if (Status == EFI_BUFFER_TOO_SMALL) {
> >>    
> >>      Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
> >>      MemoryMap = AllocatePages (Pages);
> >>      
> >>      //
> >>      // Get System MemoryMap
> >>      //
> >>      Status = gBS->GetMemoryMap (
> >>      
> >>                      &MemoryMapSize,
> >>                      MemoryMap,
> >>                      &MapKey,
> >>                      &DescriptorSize,
> >>                      &DescriptorVersion
> >>                      );
> >>    
> >>    }
> >>    
> >>    // Don't do anything between the GetMemoryMap() and
> >>    ExitBootServices() if (!EFI_ERROR(Status)) {
> >>    
> >>      Status = gBS->ExitBootServices (gImageHandle, MapKey);
> >>      if (EFI_ERROR(Status)) {
> >>      
> >>        FreePages (MemoryMap, Pages);
> >>        MemoryMap = NULL;
> >>        MemoryMapSize = 0;
> >>      
> >>      }
> >>    
> >>    }
> >>  
> >>  } while (EFI_ERROR(Status));
> >>  
> >>  return Status;
> >>
> >>}
> >>
> >>Then perform
> >>ArmCleanDataCache ();
> >>ArmInvalidateDataCache ();
> >>ArmDisableInstructionCache ();
> >>ArmInvalidateInstructionCache ();
> >
> >These don't do anything useful on ARM. You can only reliably perform cache
> >maintenance by virtual address.
> 
> So, should I just remove them altogether?
> 
> >>ArmDisableMmu ();
> >
> >... so after this call returns, all bets are off with regards to whether
> >what is popped from the stack is actually what we pushed when we entered
> >the function.
> 
> OK, thank you for explanation.
> But this call returns back into ResetLib implementation as it should, and
> then there is a direct jump to the start of FV.
> Am I doing anything wrong here?
> Then, up to the point of enabling of MMU the stack is OK. But right after
> enabling MMU it points at _ModuleEntryPoint end of function in
> DxeCoreEntryPoint.c
> Am I missing anything? Maybe some stack cleanup before jumping to the start
> of FV?

When the MMU is enabled, does the mapping for the stack pages change? That is, 
could the stack now be mapped to different physical page now?

Instead of showing a stack trace, can you dump the stack pages and compare the 
before and after contents?

Assuming the same physical memory pages are still being used, then there could 
be a cache flushing problem. What could happen is:

- some stack memory has been touched recently and is now in the data cache
- changes are made, which are written to the cache, but not yet flushed out to 
RAM
- enabling the MMU causes a full invalidate of the cache

Now when you look at the stack, you see the earlier contents that were in RAM 
-- the changes previously only written to the cache have been lost.

Enabling/disabling caches and MMU is always tricky. I can't say for sure, but 
I wouldn't be surprised if there's some subtle bug that causes a flush 
operation to be missed and things may just work by coincidence in the cold 
start case.

-Bill
 
> >>Then jump to start of FV:
> >>
> >>typedef
> >>VOID
> >>
> >> (EFIAPI *START_FV)(
> >> 
> >>  VOID
> >>
> >>);
> >>StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
> >>StartOfFv ();
> >>
> >>Now this is what happens on warm reset:
> >>reset -c warm
> >>1. Until ArmEnableMmu() gets called, everything works as expected.
> >>
> >>    Here is the stack right before ArmEnableMmu() is called:
> >>     ArmConfigureMmu+0x4f8
> >>     InitMmu+0x24
> >>     MemoryPeim+0x440
> >>     PrePiMain+0x114
> >>     PrimaryMain+0x68
> >>     CEntryPoint+0xC4
> >>     EL2:0x00000000800008BC
> >>     -----  End of stack info -----
> >>
> >>2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
> >>    ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
> >> 
> >> ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
> >> (return EFI_SUCCESS)
> >> 
> >>   _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
> >> 
> >> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines 59-60.
> >> 
> >>   El2:0x000000008E5E8300 <-- Absolutely bogus
> >>   
> >>    --- End of stack info ---
> >>
> >>So, as soon as ArmEnableMmu() exits, execution jumps directly to
> >>CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
> >>
> >>Would be grateful for any advice.
> >>
> >>Thank you,
> >>Vladimir
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
-- 
=============================================================================
-Bill Paul            (510) 749-2329 | Senior Member of Technical Staff,
                 wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
=============================================================================
   "I put a dollar in a change machine. Nothing changed." - George Carlin
=============================================================================


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
  2018-09-19 23:38 ` Ard Biesheuvel
@ 2018-09-19 23:58   ` Vladimir Olovyannikov
  2018-09-19 23:56     ` Bill Paul
  2018-09-20  0:09     ` Ard Biesheuvel
  0 siblings, 2 replies; 10+ messages in thread
From: Vladimir Olovyannikov @ 2018-09-19 23:58 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: edk2-devel

>From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
>Sent: Wednesday, September 19, 2018 4:38 PM
>To: Vladimir Olovyannikov
>Cc: edk2-devel@lists.01.org
>Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8
>platform


>On 19 September 2018 at 15:55, Vladimir Olovyannikov
><vladimir.olovyannikov@broadcom.com> wrote:
>>Hi All,

>>I need UEFI experts help on the problem with Armv8 board on warm UEFI
>>reset.
>>Cold reset works fine.

>>Here is how I set up a warm reset:

>>STATIC
>>EFI_STATUS
>>ShutdownUefiBootServices (
>>  VOID
>>  )
>>{
>>  EFI_STATUS              Status;
>>  UINTN                   MemoryMapSize;
>>  EFI_MEMORY_DESCRIPTOR   *MemoryMap;
>>  UINTN                   MapKey;
>>  UINTN                   DescriptorSize;
>>  UINT32                  DescriptorVersion;
>>  UINTN                   Pages;

>>  MemoryMap = NULL;
>>  MemoryMapSize = 0;
>>  Pages = 0;
>>
>>  do {
>>    Status = gBS->GetMemoryMap (
>>                    &MemoryMapSize,
>>                    MemoryMap,
>>                    &MapKey,
>>                    &DescriptorSize,
>>                    &DescriptorVersion
>>                    );
>>    if (Status == EFI_BUFFER_TOO_SMALL) {
>>
>>      Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
>>      MemoryMap = AllocatePages (Pages);
>>
>>      //
>>      // Get System MemoryMap
>>      //
>>      Status = gBS->GetMemoryMap (
>>                      &MemoryMapSize,
>>                      MemoryMap,
>>                      &MapKey,
>>                      &DescriptorSize,
>>                      &DescriptorVersion
>>                      );
>>    }
>>
>>    // Don't do anything between the GetMemoryMap() and ExitBootServices()
>>    if (!EFI_ERROR(Status)) {
>>      Status = gBS->ExitBootServices (gImageHandle, MapKey);
>>      if (EFI_ERROR(Status)) {
>>        FreePages (MemoryMap, Pages);
>>        MemoryMap = NULL;
>>        MemoryMapSize = 0;
>>      }
>>    }
>>  } while (EFI_ERROR(Status));
>>
>>  return Status;
>>}
>>
>>Then perform
>>ArmCleanDataCache ();
>>ArmInvalidateDataCache ();
>>ArmDisableInstructionCache ();
>>ArmInvalidateInstructionCache ();

>These don't do anything useful on ARM. You can only reliably perform cache
>maintenance by virtual address.
So, should I just remove them altogether?

>>ArmDisableMmu ();

>... so after this call returns, all bets are off with regards to whether
>what is popped from the stack is actually what we pushed when we entered
>the function.
OK, thank you for explanation.
But this call returns back into ResetLib implementation as it should, and
then there is a direct jump to the start of FV.
Am I doing anything wrong here?
Then, up to the point of enabling of MMU the stack is OK. But right after
enabling MMU it points at _ModuleEntryPoint end of function in
DxeCoreEntryPoint.c
Am I missing anything? Maybe some stack cleanup before jumping to the start
of FV?

>>Then jump to start of FV:

>>typedef
>>VOID
>> (EFIAPI *START_FV)(
>>  VOID
>>);
>>StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
>>StartOfFv ();
>>
>>Now this is what happens on warm reset:
>>reset -c warm
>>1. Until ArmEnableMmu() gets called, everything works as expected.
>>    Here is the stack right before ArmEnableMmu() is called:
>>     ArmConfigureMmu+0x4f8
>>     InitMmu+0x24
>>     MemoryPeim+0x440
>>     PrePiMain+0x114
>>     PrimaryMain+0x68
>>     CEntryPoint+0xC4
>>     EL2:0x00000000800008BC
>>     -----  End of stack info -----
>>
>>2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
>>    ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
>> ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
>> (return EFI_SUCCESS)
>>   _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
>> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines 59-60.
>>   El2:0x000000008E5E8300 <-- Absolutely bogus
>>    --- End of stack info ---
>>
>>So, as soon as ArmEnableMmu() exits, execution jumps directly to
>>CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
>>
>>Would be grateful for any advice.
>>
>>Thank you,
>>Vladimir


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
  2018-09-19 23:58   ` Vladimir Olovyannikov
  2018-09-19 23:56     ` Bill Paul
@ 2018-09-20  0:09     ` Ard Biesheuvel
  1 sibling, 0 replies; 10+ messages in thread
From: Ard Biesheuvel @ 2018-09-20  0:09 UTC (permalink / raw)
  To: Vladimir Olovyannikov; +Cc: edk2-devel@lists.01.org

On 19 September 2018 at 16:58, Vladimir Olovyannikov <
vladimir.olovyannikov@broadcom.com> wrote:

> >From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
> >Sent: Wednesday, September 19, 2018 4:38 PM
> >To: Vladimir Olovyannikov
> >Cc: edk2-devel@lists.01.org
> >Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
> Armv8
> >platform
>
>
> >On 19 September 2018 at 15:55, Vladimir Olovyannikov
> ><vladimir.olovyannikov@broadcom.com> wrote:
> >>Hi All,
>
> >>I need UEFI experts help on the problem with Armv8 board on warm UEFI
> >>reset.
> >>Cold reset works fine.
>
> >>Here is how I set up a warm reset:
>
> >>STATIC
> >>EFI_STATUS
> >>ShutdownUefiBootServices (
> >>  VOID
> >>  )
> >>{
> >>  EFI_STATUS              Status;
> >>  UINTN                   MemoryMapSize;
> >>  EFI_MEMORY_DESCRIPTOR   *MemoryMap;
> >>  UINTN                   MapKey;
> >>  UINTN                   DescriptorSize;
> >>  UINT32                  DescriptorVersion;
> >>  UINTN                   Pages;
>
> >>  MemoryMap = NULL;
> >>  MemoryMapSize = 0;
> >>  Pages = 0;
> >>
> >>  do {
> >>    Status = gBS->GetMemoryMap (
> >>                    &MemoryMapSize,
> >>                    MemoryMap,
> >>                    &MapKey,
> >>                    &DescriptorSize,
> >>                    &DescriptorVersion
> >>                    );
> >>    if (Status == EFI_BUFFER_TOO_SMALL) {
> >>
> >>      Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
> >>      MemoryMap = AllocatePages (Pages);
> >>
> >>      //
> >>      // Get System MemoryMap
> >>      //
> >>      Status = gBS->GetMemoryMap (
> >>                      &MemoryMapSize,
> >>                      MemoryMap,
> >>                      &MapKey,
> >>                      &DescriptorSize,
> >>                      &DescriptorVersion
> >>                      );
> >>    }
> >>
> >>    // Don't do anything between the GetMemoryMap() and
> ExitBootServices()
> >>    if (!EFI_ERROR(Status)) {
> >>      Status = gBS->ExitBootServices (gImageHandle, MapKey);
> >>      if (EFI_ERROR(Status)) {
> >>        FreePages (MemoryMap, Pages);
> >>        MemoryMap = NULL;
> >>        MemoryMapSize = 0;
> >>      }
> >>    }
> >>  } while (EFI_ERROR(Status));
> >>
> >>  return Status;
> >>}
> >>
> >>Then perform
> >>ArmCleanDataCache ();
> >>ArmInvalidateDataCache ();
> >>ArmDisableInstructionCache ();
> >>ArmInvalidateInstructionCache ();
>
> >These don't do anything useful on ARM. You can only reliably perform cache
> >maintenance by virtual address.
> So, should I just remove them altogether?
>
> >>ArmDisableMmu ();
>
> >... so after this call returns, all bets are off with regards to whether
> >what is popped from the stack is actually what we pushed when we entered
> >the function.
> OK, thank you for explanation.
> But this call returns back into ResetLib implementation as it should, and
> then there is a direct jump to the start of FV.
> Am I doing anything wrong here?
> Then, up to the point of enabling of MMU the stack is OK. But right after
> enabling MMU it points at _ModuleEntryPoint end of function in
> DxeCoreEntryPoint.c
> Am I missing anything? Maybe some stack cleanup before jumping to the start
> of FV?
>
>
I guess you should be disabling interrupts as well. And quiesce all DMA
capable devices like network controllers that may corrupt your memory.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
@ 2018-09-20 21:52 Vladimir Olovyannikov
  2018-09-21 23:24 ` Vladimir Olovyannikov
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir Olovyannikov @ 2018-09-20 21:52 UTC (permalink / raw)
  To: Bill Paul; +Cc: edk2-devel, Ard Biesheuvel

On Wed, Sep 19, 2018 at 5:21 PM Bill Paul <wpaul@windriver.com> wrote:
>
> Of all the gin joints in all the towns in all the world, Vladimir
> Olovyannikov
> had to walk into mine at 16:58 on Wednesday 19 September 2018 and say:
>
> > >From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
> > >Sent: Wednesday, September 19, 2018 4:38 PM
> > >To: Vladimir Olovyannikov
> > >Cc: edk2-devel@lists.01.org
> > >Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
> > >Armv8 platform
> > >
> > >
> > >On 19 September 2018 at 15:55, Vladimir Olovyannikov
> > >
> > ><vladimir.olovyannikov@broadcom.com> wrote:
> > >>Hi All,
> > >>
> > >>I need UEFI experts help on the problem with Armv8 board on warm UEFI
> > >>reset.
> > >>Cold reset works fine.
> > >>
> > >>Here is how I set up a warm reset:
> > >>
> > >>STATIC
> > >>EFI_STATUS
> > >>ShutdownUefiBootServices (
> > >>
> > >>  VOID
> > >>  )
> > >>
> > >>{
> > >>
> > >>  EFI_STATUS              Status;
> > >>  UINTN                   MemoryMapSize;
> > >>  EFI_MEMORY_DESCRIPTOR   *MemoryMap;
> > >>  UINTN                   MapKey;
> > >>  UINTN                   DescriptorSize;
> > >>  UINT32                  DescriptorVersion;
> > >>  UINTN                   Pages;
> > >>
> > >>  MemoryMap = NULL;
> > >>  MemoryMapSize = 0;
> > >>  Pages = 0;
> > >>
> > >>  do {
> > >>
> > >>    Status = gBS->GetMemoryMap (
> > >>
> > >>                    &MemoryMapSize,
> > >>                    MemoryMap,
> > >>                    &MapKey,
> > >>                    &DescriptorSize,
> > >>                    &DescriptorVersion
> > >>                    );
> > >>
> > >>    if (Status == EFI_BUFFER_TOO_SMALL) {
> > >>
> > >>      Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
> > >>      MemoryMap = AllocatePages (Pages);
> > >>
> > >>      //
> > >>      // Get System MemoryMap
> > >>      //
> > >>      Status = gBS->GetMemoryMap (
> > >>
> > >>                      &MemoryMapSize,
> > >>                      MemoryMap,
> > >>                      &MapKey,
> > >>                      &DescriptorSize,
> > >>                      &DescriptorVersion
> > >>                      );
> > >>
> > >>    }
> > >>
> > >>    // Don't do anything between the GetMemoryMap() and
> > >>    ExitBootServices() if (!EFI_ERROR(Status)) {
> > >>
> > >>      Status = gBS->ExitBootServices (gImageHandle, MapKey);
> > >>      if (EFI_ERROR(Status)) {
> > >>
> > >>        FreePages (MemoryMap, Pages);
> > >>        MemoryMap = NULL;
> > >>        MemoryMapSize = 0;
> > >>
> > >>      }
> > >>
> > >>    }
> > >>
> > >>  } while (EFI_ERROR(Status));
> > >>
> > >>  return Status;
> > >>
> > >>}
> > >>
> > >>Then perform
> > >>ArmCleanDataCache ();
> > >>ArmInvalidateDataCache ();
> > >>ArmDisableInstructionCache ();
> > >>ArmInvalidateInstructionCache ();
> > >
> > >These don't do anything useful on ARM. You can only reliably perform
> > >cache
> > >maintenance by virtual address.
> >
> > So, should I just remove them altogether?
> >
> > >>ArmDisableMmu ();
> > >
> > >... so after this call returns, all bets are off with regards to
> > >whether
> > >what is popped from the stack is actually what we pushed when we
> > >entered
> > >the function.
> >
> > OK, thank you for explanation.
> > But this call returns back into ResetLib implementation as it should,
> > and
> > then there is a direct jump to the start of FV.
> > Am I doing anything wrong here?
> > Then, up to the point of enabling of MMU the stack is OK. But right
> > after
> > enabling MMU it points at _ModuleEntryPoint end of function in
> > DxeCoreEntryPoint.c
> > Am I missing anything? Maybe some stack cleanup before jumping to the
> > start
> > of FV?
>
> When the MMU is enabled, does the mapping for the stack pages change? That
> is,
> could the stack now be mapped to different physical page now?
Thanks for ideas Bill,
No, the mapping stays the same.
The issue is only with warm reset, and only on an A72 board.
There is another platform on A53 sharing the same code, which has no issues
with warm reset.
I cannot explain why.
>
> Instead of showing a stack trace, can you dump the stack pages and compare
> the
> before and after contents?
I can clearly see that before and after contents are different.
>
> Assuming the same physical memory pages are still being used, then there
> could
> be a cache flushing problem. What could happen is:
>
> - some stack memory has been touched recently and is now in the data cache
> - changes are made, which are written to the cache, but not yet flushed
> out to
> RAM
> - enabling the MMU causes a full invalidate of the cache
>
> Now when you look at the stack, you see the earlier contents that were in
> RAM
> -- the changes previously only written to the cache have been lost.
>
> Enabling/disabling caches and MMU is always tricky. I can't say for sure,
> but
> I wouldn't be surprised if there's some subtle bug that causes a flush
> operation to be missed and things may just work by coincidence in the cold
> start case.
I might be missing something preparing for warm reset.
Disabling interrupts does not help though.
Ard, I switched off all DMA-capable devices, so am just booting into UEFI
with no disks or network,
disabled interrupts. The issue is here. Any ideas on how to debug it and
fix?
>
> -Bill
>
> > >>Then jump to start of FV:
> > >>
> > >>typedef
> > >>VOID
> > >>
> > >> (EFIAPI *START_FV)(
> > >>
> > >>  VOID
> > >>
> > >>);
> > >>StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
> > >>StartOfFv ();
> > >>
> > >>Now this is what happens on warm reset:
> > >>reset -c warm
> > >>1. Until ArmEnableMmu() gets called, everything works as expected.
> > >>
> > >>    Here is the stack right before ArmEnableMmu() is called:
> > >>     ArmConfigureMmu+0x4f8
> > >>     InitMmu+0x24
> > >>     MemoryPeim+0x440
> > >>     PrePiMain+0x114
> > >>     PrimaryMain+0x68
> > >>     CEntryPoint+0xC4
> > >>     EL2:0x00000000800008BC
> > >>     -----  End of stack info -----
> > >>
> > >>2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
> > >>    ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
> > >>
> > >> ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
> > >> (return EFI_SUCCESS)
> > >>
> > >>   _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
> > >>
> > >> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines
> > >> 59-60.
> > >>
> > >>   El2:0x000000008E5E8300 <-- Absolutely bogus
> > >>
> > >>    --- End of stack info ---
> > >>
> > >>So, as soon as ArmEnableMmu() exits, execution jumps directly to
> > >>CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
> > >>
> > >>Would be grateful for any advice.
> > >>
> > >>Thank you,
> > >>Vladimir
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
> --
> =============================================================================
> -Bill Paul            (510) 749-2329 | Senior Member of Technical Staff,
>                  wpaul@windriver.com | Master of Unix-Fu - Wind River
> Systems
> =============================================================================
>    "I put a dollar in a change machine. Nothing changed." - George Carlin
> =============================================================================


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
  2018-09-20 21:52 Vladimir Olovyannikov
@ 2018-09-21 23:24 ` Vladimir Olovyannikov
  2018-09-21 23:57   ` Andrew Fish
  0 siblings, 1 reply; 10+ messages in thread
From: Vladimir Olovyannikov @ 2018-09-21 23:24 UTC (permalink / raw)
  To: Bill Paul; +Cc: edk2-devel@lists.01.org, Ard Biesheuvel

On Thu, Sep 20, 2018 at 2:52 PM Vladimir Olovyannikov
<vladimir.olovyannikov@broadcom.com> wrote:
>
> On Wed, Sep 19, 2018 at 5:21 PM Bill Paul <wpaul@windriver.com> wrote:
> >
> > Of all the gin joints in all the towns in all the world, Vladimir
> > Olovyannikov
> > had to walk into mine at 16:58 on Wednesday 19 September 2018 and say:
> >
> > > >From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
> > > >Sent: Wednesday, September 19, 2018 4:38 PM
> > > >To: Vladimir Olovyannikov
> > > >Cc: edk2-devel@lists.01.org
> > > >Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
> > > >Armv8 platform
> > > >
> > > >
> > > >On 19 September 2018 at 15:55, Vladimir Olovyannikov
> > > >
> > > ><vladimir.olovyannikov@broadcom.com> wrote:
> > > >>Hi All,
> > > >>
> > > >>I need UEFI experts help on the problem with Armv8 board on warm UEFI
> > > >>reset.
> > > >>Cold reset works fine.
> > > >>
> > > >>Here is how I set up a warm reset:
> > > >>
> > > >>STATIC
> > > >>EFI_STATUS
> > > >>ShutdownUefiBootServices (
> > > >>
> > > >>  VOID
> > > >>  )
> > > >>
> > > >>{
> > > >>
> > > >>  EFI_STATUS              Status;
> > > >>  UINTN                   MemoryMapSize;
> > > >>  EFI_MEMORY_DESCRIPTOR   *MemoryMap;
> > > >>  UINTN                   MapKey;
> > > >>  UINTN                   DescriptorSize;
> > > >>  UINT32                  DescriptorVersion;
> > > >>  UINTN                   Pages;
> > > >>
> > > >>  MemoryMap = NULL;
> > > >>  MemoryMapSize = 0;
> > > >>  Pages = 0;
> > > >>
> > > >>  do {
> > > >>
> > > >>    Status = gBS->GetMemoryMap (
> > > >>
> > > >>                    &MemoryMapSize,
> > > >>                    MemoryMap,
> > > >>                    &MapKey,
> > > >>                    &DescriptorSize,
> > > >>                    &DescriptorVersion
> > > >>                    );
> > > >>
> > > >>    if (Status == EFI_BUFFER_TOO_SMALL) {
> > > >>
> > > >>      Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
> > > >>      MemoryMap = AllocatePages (Pages);
> > > >>
> > > >>      //
> > > >>      // Get System MemoryMap
> > > >>      //
> > > >>      Status = gBS->GetMemoryMap (
> > > >>
> > > >>                      &MemoryMapSize,
> > > >>                      MemoryMap,
> > > >>                      &MapKey,
> > > >>                      &DescriptorSize,
> > > >>                      &DescriptorVersion
> > > >>                      );
> > > >>
> > > >>    }
> > > >>
> > > >>    // Don't do anything between the GetMemoryMap() and
> > > >>    ExitBootServices() if (!EFI_ERROR(Status)) {
> > > >>
> > > >>      Status = gBS->ExitBootServices (gImageHandle, MapKey);
> > > >>      if (EFI_ERROR(Status)) {
> > > >>
> > > >>        FreePages (MemoryMap, Pages);
> > > >>        MemoryMap = NULL;
> > > >>        MemoryMapSize = 0;
> > > >>
> > > >>      }
> > > >>
> > > >>    }
> > > >>
> > > >>  } while (EFI_ERROR(Status));
> > > >>
> > > >>  return Status;
> > > >>
> > > >>}
> > > >>
> > > >>Then perform
> > > >>ArmCleanDataCache ();
> > > >>ArmInvalidateDataCache ();
> > > >>ArmDisableInstructionCache ();
> > > >>ArmInvalidateInstructionCache ();
> > > >
> > > >These don't do anything useful on ARM. You can only reliably perform
> > > >cache
> > > >maintenance by virtual address.
> > >
> > > So, should I just remove them altogether?
> > >
> > > >>ArmDisableMmu ();
> > > >
> > > >... so after this call returns, all bets are off with regards to
> > > >whether
> > > >what is popped from the stack is actually what we pushed when we
> > > >entered
> > > >the function.
> > >
> > > OK, thank you for explanation.
> > > But this call returns back into ResetLib implementation as it should,
> > > and
> > > then there is a direct jump to the start of FV.
> > > Am I doing anything wrong here?
> > > Then, up to the point of enabling of MMU the stack is OK. But right
> > > after
> > > enabling MMU it points at _ModuleEntryPoint end of function in
> > > DxeCoreEntryPoint.c
> > > Am I missing anything? Maybe some stack cleanup before jumping to the
> > > start
> > > of FV?
> >
> > When the MMU is enabled, does the mapping for the stack pages change? That
> > is,
> > could the stack now be mapped to different physical page now?
> Thanks for ideas Bill,
> No, the mapping stays the same.
> The issue is only with warm reset, and only on an A72 board.
> There is another platform on A53 sharing the same code, which has no issues
> with warm reset.
> I cannot explain why.
> >
> > Instead of showing a stack trace, can you dump the stack pages and compare
> > the
> > before and after contents?
> I can clearly see that before and after contents are different.
> >
> > Assuming the same physical memory pages are still being used, then there
> > could
> > be a cache flushing problem. What could happen is:
> >
> > - some stack memory has been touched recently and is now in the data cache
> > - changes are made, which are written to the cache, but not yet flushed
> > out to
> > RAM
> > - enabling the MMU causes a full invalidate of the cache
> >
> > Now when you look at the stack, you see the earlier contents that were in
> > RAM
> > -- the changes previously only written to the cache have been lost.
> >
> > Enabling/disabling caches and MMU is always tricky. I can't say for sure,
> > but
> > I wouldn't be surprised if there's some subtle bug that causes a flush
> > operation to be missed and things may just work by coincidence in the cold
> > start case.
> I might be missing something preparing for warm reset.
> Disabling interrupts does not help though.
> Ard, I switched off all DMA-capable devices, so am just booting into UEFI
> with no disks or network,
> disabled interrupts. The issue is here. Any ideas on how to debug it and
> fix?
Update: when I add this as an experiment:
UINTN StackBottom;

__asm__ volatile ("mov %0,SP" : "=r"(StackBottom));
WriteBackInvalidateDataCacheRange((VOID *)StackBottom,
    PcdGet64(PcdSystemMemorySize) +
    PcdGet64(PcdSystemMemoryBase) - StackBottom);
then the stack data were not corrupted anymore on the next ArmEnableMmu() call,
and warm reset worked (though unreliably, can throw exception on
memcpy down the road on UEFI boot; probably because I invalidated only
the stack area in the experiment).
Considering that A53 board does not have issues, does this means that
ArmInvalidateDataCache() implementation is useless for A72?
Based on this, should the approach be "find all data regions and
invalidate them using InvalidateDataCacheRange()"?
> >
> > -Bill
> >
> > > >>Then jump to start of FV:
> > > >>
> > > >>typedef
> > > >>VOID
> > > >>
> > > >> (EFIAPI *START_FV)(
> > > >>
> > > >>  VOID
> > > >>
> > > >>);
> > > >>StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
> > > >>StartOfFv ();
> > > >>
> > > >>Now this is what happens on warm reset:
> > > >>reset -c warm
> > > >>1. Until ArmEnableMmu() gets called, everything works as expected.
> > > >>
> > > >>    Here is the stack right before ArmEnableMmu() is called:
> > > >>     ArmConfigureMmu+0x4f8
> > > >>     InitMmu+0x24
> > > >>     MemoryPeim+0x440
> > > >>     PrePiMain+0x114
> > > >>     PrimaryMain+0x68
> > > >>     CEntryPoint+0xC4
> > > >>     EL2:0x00000000800008BC
> > > >>     -----  End of stack info -----
> > > >>
> > > >>2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
> > > >>    ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
> > > >>
> > > >> ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
> > > >> (return EFI_SUCCESS)
> > > >>
> > > >>   _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
> > > >>
> > > >> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines
> > > >> 59-60.
> > > >>
> > > >>   El2:0x000000008E5E8300 <-- Absolutely bogus
> > > >>
> > > >>    --- End of stack info ---
> > > >>
> > > >>So, as soon as ArmEnableMmu() exits, execution jumps directly to
> > > >>CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
> > > >>
> > > >>Would be grateful for any advice.
> > > >>
> > > >>Thank you,
> > > >>Vladimir
> > >
> > > _______________________________________________
> > > edk2-devel mailing list
> > > edk2-devel@lists.01.org
> > > https://lists.01.org/mailman/listinfo/edk2-devel
> > --
> > =============================================================================
> > -Bill Paul            (510) 749-2329 | Senior Member of Technical Staff,
> >                  wpaul@windriver.com | Master of Unix-Fu - Wind River
> > Systems
> > =============================================================================
> >    "I put a dollar in a change machine. Nothing changed." - George Carlin
> > =============================================================================


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
  2018-09-21 23:24 ` Vladimir Olovyannikov
@ 2018-09-21 23:57   ` Andrew Fish
  2018-09-22  0:14     ` Ard Biesheuvel
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Fish @ 2018-09-21 23:57 UTC (permalink / raw)
  To: Vladimir Olovyannikov; +Cc: Bill Paul, edk2-devel@lists.01.org



> On Sep 21, 2018, at 4:24 PM, Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com> wrote:
> 
> On Thu, Sep 20, 2018 at 2:52 PM Vladimir Olovyannikov
> <vladimir.olovyannikov@broadcom.com> wrote:
>> 
>> On Wed, Sep 19, 2018 at 5:21 PM Bill Paul <wpaul@windriver.com> wrote:
>>> 
>>> Of all the gin joints in all the towns in all the world, Vladimir
>>> Olovyannikov
>>> had to walk into mine at 16:58 on Wednesday 19 September 2018 and say:
>>> 
>>>>> From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
>>>>> Sent: Wednesday, September 19, 2018 4:38 PM
>>>>> To: Vladimir Olovyannikov
>>>>> Cc: edk2-devel@lists.01.org
>>>>> Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
>>>>> Armv8 platform
>>>>> 
>>>>> 
>>>>> On 19 September 2018 at 15:55, Vladimir Olovyannikov
>>>>> 
>>>>> <vladimir.olovyannikov@broadcom.com> wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> I need UEFI experts help on the problem with Armv8 board on warm UEFI
>>>>>> reset.
>>>>>> Cold reset works fine.
>>>>>> 
>>>>>> Here is how I set up a warm reset:
>>>>>> 
>>>>>> STATIC
>>>>>> EFI_STATUS
>>>>>> ShutdownUefiBootServices (
>>>>>> 
>>>>>> VOID
>>>>>> )
>>>>>> 
>>>>>> {
>>>>>> 
>>>>>> EFI_STATUS              Status;
>>>>>> UINTN                   MemoryMapSize;
>>>>>> EFI_MEMORY_DESCRIPTOR   *MemoryMap;
>>>>>> UINTN                   MapKey;
>>>>>> UINTN                   DescriptorSize;
>>>>>> UINT32                  DescriptorVersion;
>>>>>> UINTN                   Pages;
>>>>>> 
>>>>>> MemoryMap = NULL;
>>>>>> MemoryMapSize = 0;
>>>>>> Pages = 0;
>>>>>> 
>>>>>> do {
>>>>>> 
>>>>>>   Status = gBS->GetMemoryMap (
>>>>>> 
>>>>>>                   &MemoryMapSize,
>>>>>>                   MemoryMap,
>>>>>>                   &MapKey,
>>>>>>                   &DescriptorSize,
>>>>>>                   &DescriptorVersion
>>>>>>                   );
>>>>>> 
>>>>>>   if (Status == EFI_BUFFER_TOO_SMALL) {
>>>>>> 
>>>>>>     Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
>>>>>>     MemoryMap = AllocatePages (Pages);
>>>>>> 
>>>>>>     //
>>>>>>     // Get System MemoryMap
>>>>>>     //
>>>>>>     Status = gBS->GetMemoryMap (
>>>>>> 
>>>>>>                     &MemoryMapSize,
>>>>>>                     MemoryMap,
>>>>>>                     &MapKey,
>>>>>>                     &DescriptorSize,
>>>>>>                     &DescriptorVersion
>>>>>>                     );
>>>>>> 
>>>>>>   }
>>>>>> 
>>>>>>   // Don't do anything between the GetMemoryMap() and
>>>>>>   ExitBootServices() if (!EFI_ERROR(Status)) {
>>>>>> 
>>>>>>     Status = gBS->ExitBootServices (gImageHandle, MapKey);
>>>>>>     if (EFI_ERROR(Status)) {
>>>>>> 
>>>>>>       FreePages (MemoryMap, Pages);
>>>>>>       MemoryMap = NULL;
>>>>>>       MemoryMapSize = 0;
>>>>>> 
>>>>>>     }
>>>>>> 
>>>>>>   }
>>>>>> 
>>>>>> } while (EFI_ERROR(Status));
>>>>>> 
>>>>>> return Status;
>>>>>> 
>>>>>> }
>>>>>> 
>>>>>> Then perform
>>>>>> ArmCleanDataCache ();
>>>>>> ArmInvalidateDataCache ();
>>>>>> ArmDisableInstructionCache ();
>>>>>> ArmInvalidateInstructionCache ();
>>>>> 
>>>>> These don't do anything useful on ARM. You can only reliably perform
>>>>> cache
>>>>> maintenance by virtual address.
>>>> 
>>>> So, should I just remove them altogether?
>>>> 
>>>>>> ArmDisableMmu ();
>>>>> 
>>>>> ... so after this call returns, all bets are off with regards to
>>>>> whether
>>>>> what is popped from the stack is actually what we pushed when we
>>>>> entered
>>>>> the function.
>>>> 
>>>> OK, thank you for explanation.
>>>> But this call returns back into ResetLib implementation as it should,
>>>> and
>>>> then there is a direct jump to the start of FV.
>>>> Am I doing anything wrong here?
>>>> Then, up to the point of enabling of MMU the stack is OK. But right
>>>> after
>>>> enabling MMU it points at _ModuleEntryPoint end of function in
>>>> DxeCoreEntryPoint.c
>>>> Am I missing anything? Maybe some stack cleanup before jumping to the
>>>> start
>>>> of FV?
>>> 
>>> When the MMU is enabled, does the mapping for the stack pages change? That
>>> is,
>>> could the stack now be mapped to different physical page now?
>> Thanks for ideas Bill,
>> No, the mapping stays the same.
>> The issue is only with warm reset, and only on an A72 board.
>> There is another platform on A53 sharing the same code, which has no issues
>> with warm reset.
>> I cannot explain why.
>>> 
>>> Instead of showing a stack trace, can you dump the stack pages and compare
>>> the
>>> before and after contents?
>> I can clearly see that before and after contents are different.
>>> 
>>> Assuming the same physical memory pages are still being used, then there
>>> could
>>> be a cache flushing problem. What could happen is:
>>> 
>>> - some stack memory has been touched recently and is now in the data cache
>>> - changes are made, which are written to the cache, but not yet flushed
>>> out to
>>> RAM
>>> - enabling the MMU causes a full invalidate of the cache
>>> 
>>> Now when you look at the stack, you see the earlier contents that were in
>>> RAM
>>> -- the changes previously only written to the cache have been lost.
>>> 
>>> Enabling/disabling caches and MMU is always tricky. I can't say for sure,
>>> but
>>> I wouldn't be surprised if there's some subtle bug that causes a flush
>>> operation to be missed and things may just work by coincidence in the cold
>>> start case.
>> I might be missing something preparing for warm reset.
>> Disabling interrupts does not help though.
>> Ard, I switched off all DMA-capable devices, so am just booting into UEFI
>> with no disks or network,
>> disabled interrupts. The issue is here. Any ideas on how to debug it and
>> fix?
> Update: when I add this as an experiment:
> UINTN StackBottom;
> 
> __asm__ volatile ("mov %0,SP" : "=r"(StackBottom));
> WriteBackInvalidateDataCacheRange((VOID *)StackBottom,
>    PcdGet64(PcdSystemMemorySize) +
>    PcdGet64(PcdSystemMemoryBase) - StackBottom);
> then the stack data were not corrupted anymore on the next ArmEnableMmu() call,
> and warm reset worked (though unreliably, can throw exception on
> memcpy down the road on UEFI boot; probably because I invalidated only
> the stack area in the experiment).
> Considering that A53 board does not have issues, does this means that
> ArmInvalidateDataCache() implementation is useless for A72?
> Based on this, should the approach be "find all data regions and
> invalidate them using InvalidateDataCacheRange()"?

Vladimir,

We hit an issue like this a while back on x86 and it turned out our sequence was dependent on the C compiler code generation not touching a specific register. It might be worth while to disassemble this code and take a look at the assembler. I'd also point out the __asm__ volatile is a serializing event to the C memory model so it could change how the C code was optimized. You could try  __asm__ volatile with a no-op instruction to see if it really is the SP read at this point that fixes the issue. But it is likely the bug will be more obvious if you look at the assembly. 

Thanks,

Andrew Fish

>>> 
>>> -Bill
>>> 
>>>>>> Then jump to start of FV:
>>>>>> 
>>>>>> typedef
>>>>>> VOID
>>>>>> 
>>>>>> (EFIAPI *START_FV)(
>>>>>> 
>>>>>> VOID
>>>>>> 
>>>>>> );
>>>>>> StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
>>>>>> StartOfFv ();
>>>>>> 
>>>>>> Now this is what happens on warm reset:
>>>>>> reset -c warm
>>>>>> 1. Until ArmEnableMmu() gets called, everything works as expected.
>>>>>> 
>>>>>>   Here is the stack right before ArmEnableMmu() is called:
>>>>>>    ArmConfigureMmu+0x4f8
>>>>>>    InitMmu+0x24
>>>>>>    MemoryPeim+0x440
>>>>>>    PrePiMain+0x114
>>>>>>    PrimaryMain+0x68
>>>>>>    CEntryPoint+0xC4
>>>>>>    EL2:0x00000000800008BC
>>>>>>    -----  End of stack info -----
>>>>>> 
>>>>>> 2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
>>>>>>   ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
>>>>>> 
>>>>>> ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
>>>>>> (return EFI_SUCCESS)
>>>>>> 
>>>>>>  _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
>>>>>> 
>>>>>> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines
>>>>>> 59-60.
>>>>>> 
>>>>>>  El2:0x000000008E5E8300 <-- Absolutely bogus
>>>>>> 
>>>>>>   --- End of stack info ---
>>>>>> 
>>>>>> So, as soon as ArmEnableMmu() exits, execution jumps directly to
>>>>>> CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
>>>>>> 
>>>>>> Would be grateful for any advice.
>>>>>> 
>>>>>> Thank you,
>>>>>> Vladimir
>>>> 
>>>> _______________________________________________
>>>> edk2-devel mailing list
>>>> edk2-devel@lists.01.org
>>>> https://lists.01.org/mailman/listinfo/edk2-devel
>>> --
>>> =============================================================================
>>> -Bill Paul            (510) 749-2329 | Senior Member of Technical Staff,
>>>                 wpaul@windriver.com | Master of Unix-Fu - Wind River
>>> Systems
>>> =============================================================================
>>>   "I put a dollar in a change machine. Nothing changed." - George Carlin
>>> =============================================================================
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
  2018-09-21 23:57   ` Andrew Fish
@ 2018-09-22  0:14     ` Ard Biesheuvel
  2018-09-24 17:44       ` Vladimir Olovyannikov
  0 siblings, 1 reply; 10+ messages in thread
From: Ard Biesheuvel @ 2018-09-22  0:14 UTC (permalink / raw)
  To: Andrew Fish; +Cc: Bill Paul, Vladimir Olovyannikov, edk2-devel@lists.01.org

On Fri 21 Sep 2018 at 16:57, Andrew Fish <afish@apple.com> wrote:

>
>
> > On Sep 21, 2018, at 4:24 PM, Vladimir Olovyannikov <
> vladimir.olovyannikov@broadcom.com> wrote:
> >
> > On Thu, Sep 20, 2018 at 2:52 PM Vladimir Olovyannikov
> > <vladimir.olovyannikov@broadcom.com> wrote:
> >>
> >> On Wed, Sep 19, 2018 at 5:21 PM Bill Paul <wpaul@windriver.com> wrote:
> >>>
> >>> Of all the gin joints in all the towns in all the world, Vladimir
> >>> Olovyannikov
> >>> had to walk into mine at 16:58 on Wednesday 19 September 2018 and say:
> >>>
> >>>>> From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
> >>>>> Sent: Wednesday, September 19, 2018 4:38 PM
> >>>>> To: Vladimir Olovyannikov
> >>>>> Cc: edk2-devel@lists.01.org
> >>>>> Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
> >>>>> Armv8 platform
> >>>>>
> >>>>>
> >>>>> On 19 September 2018 at 15:55, Vladimir Olovyannikov
> >>>>>
> >>>>> <vladimir.olovyannikov@broadcom.com> wrote:
> >>>>>> Hi All,
> >>>>>>
> >>>>>> I need UEFI experts help on the problem with Armv8 board on warm
> UEFI
> >>>>>> reset.
> >>>>>> Cold reset works fine.
> >>>>>>
> >>>>>> Here is how I set up a warm reset:
> >>>>>>
> >>>>>> STATIC
> >>>>>> EFI_STATUS
> >>>>>> ShutdownUefiBootServices (
> >>>>>>
> >>>>>> VOID
> >>>>>> )
> >>>>>>
> >>>>>> {
> >>>>>>
> >>>>>> EFI_STATUS              Status;
> >>>>>> UINTN                   MemoryMapSize;
> >>>>>> EFI_MEMORY_DESCRIPTOR   *MemoryMap;
> >>>>>> UINTN                   MapKey;
> >>>>>> UINTN                   DescriptorSize;
> >>>>>> UINT32                  DescriptorVersion;
> >>>>>> UINTN                   Pages;
> >>>>>>
> >>>>>> MemoryMap = NULL;
> >>>>>> MemoryMapSize = 0;
> >>>>>> Pages = 0;
> >>>>>>
> >>>>>> do {
> >>>>>>
> >>>>>>   Status = gBS->GetMemoryMap (
> >>>>>>
> >>>>>>                   &MemoryMapSize,
> >>>>>>                   MemoryMap,
> >>>>>>                   &MapKey,
> >>>>>>                   &DescriptorSize,
> >>>>>>                   &DescriptorVersion
> >>>>>>                   );
> >>>>>>
> >>>>>>   if (Status == EFI_BUFFER_TOO_SMALL) {
> >>>>>>
> >>>>>>     Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
> >>>>>>     MemoryMap = AllocatePages (Pages);
> >>>>>>
> >>>>>>     //
> >>>>>>     // Get System MemoryMap
> >>>>>>     //
> >>>>>>     Status = gBS->GetMemoryMap (
> >>>>>>
> >>>>>>                     &MemoryMapSize,
> >>>>>>                     MemoryMap,
> >>>>>>                     &MapKey,
> >>>>>>                     &DescriptorSize,
> >>>>>>                     &DescriptorVersion
> >>>>>>                     );
> >>>>>>
> >>>>>>   }
> >>>>>>
> >>>>>>   // Don't do anything between the GetMemoryMap() and
> >>>>>>   ExitBootServices() if (!EFI_ERROR(Status)) {
> >>>>>>
> >>>>>>     Status = gBS->ExitBootServices (gImageHandle, MapKey);
> >>>>>>     if (EFI_ERROR(Status)) {
> >>>>>>
> >>>>>>       FreePages (MemoryMap, Pages);
> >>>>>>       MemoryMap = NULL;
> >>>>>>       MemoryMapSize = 0;
> >>>>>>
> >>>>>>     }
> >>>>>>
> >>>>>>   }
> >>>>>>
> >>>>>> } while (EFI_ERROR(Status));
> >>>>>>
> >>>>>> return Status;
> >>>>>>
> >>>>>> }
> >>>>>>
> >>>>>> Then perform
> >>>>>> ArmCleanDataCache ();
> >>>>>> ArmInvalidateDataCache ();
> >>>>>> ArmDisableInstructionCache ();
> >>>>>> ArmInvalidateInstructionCache ();
> >>>>>
> >>>>> These don't do anything useful on ARM. You can only reliably perform
> >>>>> cache
> >>>>> maintenance by virtual address.
> >>>>
> >>>> So, should I just remove them altogether?
> >>>>
> >>>>>> ArmDisableMmu ();
> >>>>>
> >>>>> ... so after this call returns, all bets are off with regards to
> >>>>> whether
> >>>>> what is popped from the stack is actually what we pushed when we
> >>>>> entered
> >>>>> the function.
> >>>>
> >>>> OK, thank you for explanation.
> >>>> But this call returns back into ResetLib implementation as it should,
> >>>> and
> >>>> then there is a direct jump to the start of FV.
> >>>> Am I doing anything wrong here?
> >>>> Then, up to the point of enabling of MMU the stack is OK. But right
> >>>> after
> >>>> enabling MMU it points at _ModuleEntryPoint end of function in
> >>>> DxeCoreEntryPoint.c
> >>>> Am I missing anything? Maybe some stack cleanup before jumping to the
> >>>> start
> >>>> of FV?
> >>>
> >>> When the MMU is enabled, does the mapping for the stack pages change?
> That
> >>> is,
> >>> could the stack now be mapped to different physical page now?
> >> Thanks for ideas Bill,
> >> No, the mapping stays the same.
> >> The issue is only with warm reset, and only on an A72 board.
> >> There is another platform on A53 sharing the same code, which has no
> issues
> >> with warm reset.
> >> I cannot explain why.
> >>>
> >>> Instead of showing a stack trace, can you dump the stack pages and
> compare
> >>> the
> >>> before and after contents?
> >> I can clearly see that before and after contents are different.
> >>>
> >>> Assuming the same physical memory pages are still being used, then
> there
> >>> could
> >>> be a cache flushing problem. What could happen is:
> >>>
> >>> - some stack memory has been touched recently and is now in the data
> cache
> >>> - changes are made, which are written to the cache, but not yet flushed
> >>> out to
> >>> RAM
> >>> - enabling the MMU causes a full invalidate of the cache
> >>>
> >>> Now when you look at the stack, you see the earlier contents that were
> in
> >>> RAM
> >>> -- the changes previously only written to the cache have been lost.
> >>>
> >>> Enabling/disabling caches and MMU is always tricky. I can't say for
> sure,
> >>> but
> >>> I wouldn't be surprised if there's some subtle bug that causes a flush
> >>> operation to be missed and things may just work by coincidence in the
> cold
> >>> start case.
> >> I might be missing something preparing for warm reset.
> >> Disabling interrupts does not help though.
> >> Ard, I switched off all DMA-capable devices, so am just booting into
> UEFI
> >> with no disks or network,
> >> disabled interrupts. The issue is here. Any ideas on how to debug it and
> >> fix?
> > Update: when I add this as an experiment:
> > UINTN StackBottom;
> >
> > __asm__ volatile ("mov %0,SP" : "=r"(StackBottom));
> > WriteBackInvalidateDataCacheRange((VOID *)StackBottom,
> >    PcdGet64(PcdSystemMemorySize) +
> >    PcdGet64(PcdSystemMemoryBase) - StackBottom);
> > then the stack data were not corrupted anymore on the next
> ArmEnableMmu() call,
> > and warm reset worked (though unreliably, can throw exception on
> > memcpy down the road on UEFI boot; probably because I invalidated only
> > the stack area in the experiment).
> > Considering that A53 board does not have issues, does this means that
> > ArmInvalidateDataCache() implementation is useless for A72?
> > Based on this, should the approach be "find all data regions and
> > invalidate them using InvalidateDataCacheRange()"?
>
> Vladimir,
>
> We hit an issue like this a while back on x86 and it turned out our
> sequence was dependent on the C compiler code generation not touching a
> specific register. It might be worth while to disassemble this code and
> take a look at the assembler. I'd also point out the __asm__ volatile is a
> serializing event to the C memory model so it could change how the C code
> was optimized. You could try  __asm__ volatile with a no-op instruction to
> see if it really is the SP read at this point that fixes the issue. But it
> is likely the bug will be more obvious if you look at the assembly.
>

Is your primary FV hosted in DRAM? Are you sure it has not been corrupted
by the time you attempt your warm reboot?

>



>
> >>>
> >>> -Bill
> >>>
> >>>>>> Then jump to start of FV:
> >>>>>>
> >>>>>> typedef
> >>>>>> VOID
> >>>>>>
> >>>>>> (EFIAPI *START_FV)(
> >>>>>>
> >>>>>> VOID
> >>>>>>
> >>>>>> );
> >>>>>> StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
> >>>>>> StartOfFv ();
> >>>>>>
> >>>>>> Now this is what happens on warm reset:
> >>>>>> reset -c warm
> >>>>>> 1. Until ArmEnableMmu() gets called, everything works as expected.
> >>>>>>
> >>>>>>   Here is the stack right before ArmEnableMmu() is called:
> >>>>>>    ArmConfigureMmu+0x4f8
> >>>>>>    InitMmu+0x24
> >>>>>>    MemoryPeim+0x440
> >>>>>>    PrePiMain+0x114
> >>>>>>    PrimaryMain+0x68
> >>>>>>    CEntryPoint+0xC4
> >>>>>>    EL2:0x00000000800008BC
> >>>>>>    -----  End of stack info -----
> >>>>>>
> >>>>>> 2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
> >>>>>>   ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
> >>>>>>
> >>>>>> ArmConfigureMmu() in
> ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
> >>>>>> (return EFI_SUCCESS)
> >>>>>>
> >>>>>>  _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
> >>>>>>
> >>>>>> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines
> >>>>>> 59-60.
> >>>>>>
> >>>>>>  El2:0x000000008E5E8300 <-- Absolutely bogus
> >>>>>>
> >>>>>>   --- End of stack info ---
> >>>>>>
> >>>>>> So, as soon as ArmEnableMmu() exits, execution jumps directly to
> >>>>>> CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
> >>>>>>
> >>>>>> Would be grateful for any advice.
> >>>>>>
> >>>>>> Thank you,
> >>>>>> Vladimir
> >>>>
> >>>> _______________________________________________
> >>>> edk2-devel mailing list
> >>>> edk2-devel@lists.01.org
> >>>> https://lists.01.org/mailman/listinfo/edk2-devel
> >>> --
> >>>
> =============================================================================
> >>> -Bill Paul            (510) 749-2329 | Senior Member of Technical
> Staff,
> >>>                 wpaul@windriver.com | Master of Unix-Fu - Wind River
> >>> Systems
> >>>
> =============================================================================
> >>>   "I put a dollar in a change machine. Nothing changed." - George
> Carlin
> >>>
> =============================================================================
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform
  2018-09-22  0:14     ` Ard Biesheuvel
@ 2018-09-24 17:44       ` Vladimir Olovyannikov
  0 siblings, 0 replies; 10+ messages in thread
From: Vladimir Olovyannikov @ 2018-09-24 17:44 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Andrew Fish, Bill Paul, edk2-devel@lists.01.org

On Fri, Sep 21, 2018 at 5:14 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>
>
>
> On Fri 21 Sep 2018 at 16:57, Andrew Fish <afish@apple.com> wrote:
>>
>>
>>
>> > On Sep 21, 2018, at 4:24 PM, Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com> wrote:
>> >
>> > On Thu, Sep 20, 2018 at 2:52 PM Vladimir Olovyannikov
>> > <vladimir.olovyannikov@broadcom.com> wrote:
>> >>
>> >> On Wed, Sep 19, 2018 at 5:21 PM Bill Paul <wpaul@windriver.com> wrote:
>> >>>
>> >>> Of all the gin joints in all the towns in all the world, Vladimir
>> >>> Olovyannikov
>> >>> had to walk into mine at 16:58 on Wednesday 19 September 2018 and say:
>> >>>
>> >>>>> From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
>> >>>>> Sent: Wednesday, September 19, 2018 4:38 PM
>> >>>>> To: Vladimir Olovyannikov
>> >>>>> Cc: edk2-devel@lists.01.org
>> >>>>> Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an
>> >>>>> Armv8 platform
>> >>>>>
>> >>>>>
>> >>>>> On 19 September 2018 at 15:55, Vladimir Olovyannikov
>> >>>>>
>> >>>>> <vladimir.olovyannikov@broadcom.com> wrote:
>> >>>>>> Hi All,
>> >>>>>>
>> >>>>>> I need UEFI experts help on the problem with Armv8 board on warm UEFI
>> >>>>>> reset.
>> >>>>>> Cold reset works fine.
>> >>>>>>
>> >>>>>> Here is how I set up a warm reset:
>> >>>>>>
>> >>>>>> STATIC
>> >>>>>> EFI_STATUS
>> >>>>>> ShutdownUefiBootServices (
>> >>>>>>
>> >>>>>> VOID
>> >>>>>> )
>> >>>>>>
>> >>>>>> {
>> >>>>>>
>> >>>>>> EFI_STATUS              Status;
>> >>>>>> UINTN                   MemoryMapSize;
>> >>>>>> EFI_MEMORY_DESCRIPTOR   *MemoryMap;
>> >>>>>> UINTN                   MapKey;
>> >>>>>> UINTN                   DescriptorSize;
>> >>>>>> UINT32                  DescriptorVersion;
>> >>>>>> UINTN                   Pages;
>> >>>>>>
>> >>>>>> MemoryMap = NULL;
>> >>>>>> MemoryMapSize = 0;
>> >>>>>> Pages = 0;
>> >>>>>>
>> >>>>>> do {
>> >>>>>>
>> >>>>>>   Status = gBS->GetMemoryMap (
>> >>>>>>
>> >>>>>>                   &MemoryMapSize,
>> >>>>>>                   MemoryMap,
>> >>>>>>                   &MapKey,
>> >>>>>>                   &DescriptorSize,
>> >>>>>>                   &DescriptorVersion
>> >>>>>>                   );
>> >>>>>>
>> >>>>>>   if (Status == EFI_BUFFER_TOO_SMALL) {
>> >>>>>>
>> >>>>>>     Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1;
>> >>>>>>     MemoryMap = AllocatePages (Pages);
>> >>>>>>
>> >>>>>>     //
>> >>>>>>     // Get System MemoryMap
>> >>>>>>     //
>> >>>>>>     Status = gBS->GetMemoryMap (
>> >>>>>>
>> >>>>>>                     &MemoryMapSize,
>> >>>>>>                     MemoryMap,
>> >>>>>>                     &MapKey,
>> >>>>>>                     &DescriptorSize,
>> >>>>>>                     &DescriptorVersion
>> >>>>>>                     );
>> >>>>>>
>> >>>>>>   }
>> >>>>>>
>> >>>>>>   // Don't do anything between the GetMemoryMap() and
>> >>>>>>   ExitBootServices() if (!EFI_ERROR(Status)) {
>> >>>>>>
>> >>>>>>     Status = gBS->ExitBootServices (gImageHandle, MapKey);
>> >>>>>>     if (EFI_ERROR(Status)) {
>> >>>>>>
>> >>>>>>       FreePages (MemoryMap, Pages);
>> >>>>>>       MemoryMap = NULL;
>> >>>>>>       MemoryMapSize = 0;
>> >>>>>>
>> >>>>>>     }
>> >>>>>>
>> >>>>>>   }
>> >>>>>>
>> >>>>>> } while (EFI_ERROR(Status));
>> >>>>>>
>> >>>>>> return Status;
>> >>>>>>
>> >>>>>> }
>> >>>>>>
>> >>>>>> Then perform
>> >>>>>> ArmCleanDataCache ();
>> >>>>>> ArmInvalidateDataCache ();
>> >>>>>> ArmDisableInstructionCache ();
>> >>>>>> ArmInvalidateInstructionCache ();
>> >>>>>
>> >>>>> These don't do anything useful on ARM. You can only reliably perform
>> >>>>> cache
>> >>>>> maintenance by virtual address.
>> >>>>
>> >>>> So, should I just remove them altogether?
>> >>>>
>> >>>>>> ArmDisableMmu ();
>> >>>>>
>> >>>>> ... so after this call returns, all bets are off with regards to
>> >>>>> whether
>> >>>>> what is popped from the stack is actually what we pushed when we
>> >>>>> entered
>> >>>>> the function.
>> >>>>
>> >>>> OK, thank you for explanation.
>> >>>> But this call returns back into ResetLib implementation as it should,
>> >>>> and
>> >>>> then there is a direct jump to the start of FV.
>> >>>> Am I doing anything wrong here?
>> >>>> Then, up to the point of enabling of MMU the stack is OK. But right
>> >>>> after
>> >>>> enabling MMU it points at _ModuleEntryPoint end of function in
>> >>>> DxeCoreEntryPoint.c
>> >>>> Am I missing anything? Maybe some stack cleanup before jumping to the
>> >>>> start
>> >>>> of FV?
>> >>>
>> >>> When the MMU is enabled, does the mapping for the stack pages change? That
>> >>> is,
>> >>> could the stack now be mapped to different physical page now?
>> >> Thanks for ideas Bill,
>> >> No, the mapping stays the same.
>> >> The issue is only with warm reset, and only on an A72 board.
>> >> There is another platform on A53 sharing the same code, which has no issues
>> >> with warm reset.
>> >> I cannot explain why.
>> >>>
>> >>> Instead of showing a stack trace, can you dump the stack pages and compare
>> >>> the
>> >>> before and after contents?
>> >> I can clearly see that before and after contents are different.
>> >>>
>> >>> Assuming the same physical memory pages are still being used, then there
>> >>> could
>> >>> be a cache flushing problem. What could happen is:
>> >>>
>> >>> - some stack memory has been touched recently and is now in the data cache
>> >>> - changes are made, which are written to the cache, but not yet flushed
>> >>> out to
>> >>> RAM
>> >>> - enabling the MMU causes a full invalidate of the cache
>> >>>
>> >>> Now when you look at the stack, you see the earlier contents that were in
>> >>> RAM
>> >>> -- the changes previously only written to the cache have been lost.
>> >>>
>> >>> Enabling/disabling caches and MMU is always tricky. I can't say for sure,
>> >>> but
>> >>> I wouldn't be surprised if there's some subtle bug that causes a flush
>> >>> operation to be missed and things may just work by coincidence in the cold
>> >>> start case.
>> >> I might be missing something preparing for warm reset.
>> >> Disabling interrupts does not help though.
>> >> Ard, I switched off all DMA-capable devices, so am just booting into UEFI
>> >> with no disks or network,
>> >> disabled interrupts. The issue is here. Any ideas on how to debug it and
>> >> fix?
>> > Update: when I add this as an experiment:
>> > UINTN StackBottom;
>> >
>> > __asm__ volatile ("mov %0,SP" : "=r"(StackBottom));
>> > WriteBackInvalidateDataCacheRange((VOID *)StackBottom,
>> >    PcdGet64(PcdSystemMemorySize) +
>> >    PcdGet64(PcdSystemMemoryBase) - StackBottom);
>> > then the stack data were not corrupted anymore on the next ArmEnableMmu() call,
>> > and warm reset worked (though unreliably, can throw exception on
>> > memcpy down the road on UEFI boot; probably because I invalidated only
>> > the stack area in the experiment).
>> > Considering that A53 board does not have issues, does this means that
>> > ArmInvalidateDataCache() implementation is useless for A72?
>> > Based on this, should the approach be "find all data regions and
>> > invalidate them using InvalidateDataCacheRange()"?
>>
>> Vladimir,
>>
>> We hit an issue like this a while back on x86 and it turned out our sequence was dependent on the C compiler code generation not touching a specific register. It might be worth while to disassemble this code and take a look at the assembler. I'd also point out the __asm__ volatile is a serializing event to the C memory model so it could change how the C code was optimized. You could try  __asm__ volatile with a no-op instruction to see if it really is the SP read at this point that fixes the issue. But it is likely the bug will be more obvious if you look at the assembly.
Andrew,

Thanks for points. I used that __asm__ directive just as an experiment
to see if InvalidateDataCacheRange would make any difference and it
did.
I could replace it with InvalidateDataCacheRange(DEFAULT_STACK_BOTTOM,
0x260) and it would work the same way.
I verified assembly code and do not see any issues. The issue is fixed
not by SP read, but by using InvalidateDataCacheRange() API instead of
ArmInvalidateDataCache().
Now I can do say 4-10 successful warm resets in the row which
eventually crashes on the next warm reset.
Need to find the reason for that crash.
>
>
> Is your primary FV hosted in DRAM? Are you sure it has not been corrupted by the time you attempt your warm reboot?
Yes, it is hosted at the start of DRAM from 0x80000000.
SystemMemorySize Pcd is declared as 240M from 0x80000000.
Another memory is declared by MemoryPeiLib. (1.76G).
As I can see all modules are loaded into lower 240M space. DxeCore is
at 0x8ceaa000.
Other modules are loaded below it.
As I told, if I use InvalidateDataCacheRange API for the stack area, I
am able to get warm reset to work (for several consecutive times),
then eventually there is a crash on warm reset.
I will compare contents nevertheless, just to make sure.
>
>
>
>
>>
>>
>> >>>
>> >>> -Bill
>> >>>
>> >>>>>> Then jump to start of FV:
>> >>>>>>
>> >>>>>> typedef
>> >>>>>> VOID
>> >>>>>>
>> >>>>>> (EFIAPI *START_FV)(
>> >>>>>>
>> >>>>>> VOID
>> >>>>>>
>> >>>>>> );
>> >>>>>> StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress);
>> >>>>>> StartOfFv ();
>> >>>>>>
>> >>>>>> Now this is what happens on warm reset:
>> >>>>>> reset -c warm
>> >>>>>> 1. Until ArmEnableMmu() gets called, everything works as expected.
>> >>>>>>
>> >>>>>>   Here is the stack right before ArmEnableMmu() is called:
>> >>>>>>    ArmConfigureMmu+0x4f8
>> >>>>>>    InitMmu+0x24
>> >>>>>>    MemoryPeim+0x440
>> >>>>>>    PrePiMain+0x114
>> >>>>>>    PrimaryMain+0x68
>> >>>>>>    CEntryPoint+0xC4
>> >>>>>>    EL2:0x00000000800008BC
>> >>>>>>    -----  End of stack info -----
>> >>>>>>
>> >>>>>> 2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() :
>> >>>>>>   ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in
>> >>>>>>
>> >>>>>> ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c
>> >>>>>> (return EFI_SUCCESS)
>> >>>>>>
>> >>>>>>  _ModuleEntryPoint+0x24 <-- Wrong. This points directly to
>> >>>>>>
>> >>>>>> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines
>> >>>>>> 59-60.
>> >>>>>>
>> >>>>>>  El2:0x000000008E5E8300 <-- Absolutely bogus
>> >>>>>>
>> >>>>>>   --- End of stack info ---
>> >>>>>>
>> >>>>>> So, as soon as ArmEnableMmu() exits, execution jumps directly to
>> >>>>>> CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint().
>> >>>>>>
>> >>>>>> Would be grateful for any advice.
>> >>>>>>
>> >>>>>> Thank you,
>> >>>>>> Vladimir
>> >>>>
>> >>>> _______________________________________________
>> >>>> edk2-devel mailing list
>> >>>> edk2-devel@lists.01.org
>> >>>> https://lists.01.org/mailman/listinfo/edk2-devel
>> >>> --
>> >>> =============================================================================
>> >>> -Bill Paul            (510) 749-2329 | Senior Member of Technical Staff,
>> >>>                 wpaul@windriver.com | Master of Unix-Fu - Wind River
>> >>> Systems
>> >>> =============================================================================
>> >>>   "I put a dollar in a change machine. Nothing changed." - George Carlin
>> >>> =============================================================================
>> > _______________________________________________
>> > edk2-devel mailing list
>> > edk2-devel@lists.01.org
>> > https://lists.01.org/mailman/listinfo/edk2-devel
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-09-24 17:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-19 22:55 Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform Vladimir Olovyannikov
2018-09-19 23:38 ` Ard Biesheuvel
2018-09-19 23:58   ` Vladimir Olovyannikov
2018-09-19 23:56     ` Bill Paul
2018-09-20  0:09     ` Ard Biesheuvel
  -- strict thread matches above, loose matches on Subject: below --
2018-09-20 21:52 Vladimir Olovyannikov
2018-09-21 23:24 ` Vladimir Olovyannikov
2018-09-21 23:57   ` Andrew Fish
2018-09-22  0:14     ` Ard Biesheuvel
2018-09-24 17:44       ` Vladimir Olovyannikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox