public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
@ 2017-11-03 11:33 Ard Biesheuvel
  2017-11-05  5:52 ` Leif Lindholm
  2017-11-06  4:25 ` Gao, Liming
  0 siblings, 2 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2017-11-03 11:33 UTC (permalink / raw)
  To: edk2-devel, leif.lindholm, lersek, liming.gao; +Cc: Ard Biesheuvel

DEBUG builds of PEI code will print a diagnostic message regarding
the utilization of temporary RAM before switching to permanent RAM.
For example,

  Total temporary memory:    16352 bytes.
    temporary memory stack ever used:       4820 bytes.
    temporary memory heap used for HobList: 4720 bytes.

Tracking stack utilization like this requires the stack to be seeded
with a known magic value, and this needs to occur before entering C
code, given that it uses the stack. Currently, only Nt32Pkg appears
to implement this feature, but it is useful nonetheless, so let's
wire it up for PrePeiCore as well.

Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
v2: switch to newly introduced PCD

 ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S | 6 ++++++
 ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S     | 8 ++++++++
 ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm   | 8 ++++++++
 ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf           | 2 ++
 ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf          | 2 ++
 5 files changed, 26 insertions(+)

diff --git a/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S b/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
index aab5edab0c42..0950fd0c0cdb 100644
--- a/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
+++ b/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
@@ -84,4 +84,10 @@ _PrepareArguments:
 
 _SetupPrimaryCoreStack:
   mov   sp, x1
+  MOV64 (x8, FixedPcdGet64 (PcdCPUCoresStackBase))
+  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
+             FixedPcdGet32 (PcdInitValueInTempStack) << 32)
+0:stp   x9, x9, [x8], #16
+  cmp   x8, x1
+  b.lt  0b
   b     _PrepareArguments
diff --git a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
index 14344425ad4c..a491af30a048 100644
--- a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
+++ b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
@@ -65,6 +65,14 @@ _PrepareArguments:
 
 _SetupPrimaryCoreStack:
   mov   sp, r1
+  MOV32 (r8, FixedPcdGet64 (PcdCPUCoresStackBase))
+  MOV32 (r9, FixedPcdGet32 (PcdInitValueInTempStack))
+  mov   r10, r9
+  mov   r11, r9
+  mov   r12, r9
+0:stm   r8!, {r9-r12}
+  cmp   r8, r1
+  blt   0b
   b     _PrepareArguments
 
 _NeverReturn:
diff --git a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
index abea675828df..dc1ad8144492 100644
--- a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
+++ b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
@@ -79,6 +79,14 @@ _PrepareArguments
 
 _SetupPrimaryCoreStack
   mov   sp, r1
+  mov32 r8, FixedPcdGet64 (PcdCPUCoresStackBase)
+  mov32 r9, FixedPcdGet32 (PcdInitValueInTempStack)
+  mov   r10, r9
+  mov   r11, r9
+  mov   r12, r9
+0:stm   r8!, {r9-r12}
+  cmp   r8, r1
+  blt   0b
   b     _PrepareArguments
 
 _NeverReturn
diff --git a/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf b/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
index ecdbccb8d620..8e0456f8dc2a 100644
--- a/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
+++ b/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
@@ -75,3 +75,5 @@ [FixedPcd]
   gArmTokenSpaceGuid.PcdGicDistributorBase
   gArmTokenSpaceGuid.PcdGicInterruptInterfaceBase
   gArmTokenSpaceGuid.PcdGicSgiIntId
+
+  gEfiMdeModulePkgTokenSpaceGuid.PcdInitValueInTempStack
diff --git a/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf b/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
index b5d4e389b2a4..ec83cec2d879 100644
--- a/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
+++ b/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
@@ -69,3 +69,5 @@ [FixedPcd]
   gArmPlatformTokenSpaceGuid.PcdCPUCoresStackBase
   gArmPlatformTokenSpaceGuid.PcdCPUCorePrimaryStackSize
   gArmPlatformTokenSpaceGuid.PcdCPUCoreSecondaryStackSize
+
+  gEfiMdeModulePkgTokenSpaceGuid.PcdInitValueInTempStack
-- 
2.11.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-03 11:33 [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core Ard Biesheuvel
@ 2017-11-05  5:52 ` Leif Lindholm
  2017-11-05 16:27   ` Ard Biesheuvel
  2017-11-06  4:25 ` Gao, Liming
  1 sibling, 1 reply; 12+ messages in thread
From: Leif Lindholm @ 2017-11-05  5:52 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: edk2-devel, lersek, liming.gao

On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
> DEBUG builds of PEI code will print a diagnostic message regarding
> the utilization of temporary RAM before switching to permanent RAM.
> For example,
> 
>   Total temporary memory:    16352 bytes.
>     temporary memory stack ever used:       4820 bytes.
>     temporary memory heap used for HobList: 4720 bytes.
> 
> Tracking stack utilization like this requires the stack to be seeded
> with a known magic value, and this needs to occur before entering C
> code, given that it uses the stack. Currently, only Nt32Pkg appears
> to implement this feature, but it is useful nonetheless, so let's
> wire it up for PrePeiCore as well.
> 
> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

OK, this may sound completely unreasonable, but seeing those
implementations overwrite callee-saved registers without saving them
makes my brain unhappy. (Yes, I know.)

Could they either:
- Have a comment prepended establishing the implicit ABI of which
  registers the caller cannot rely on reusing after return.
  Preferably somewhat echoed at the call site.
- Be rewritten to use only scratch registers?

/
    Leif

> ---
> v2: switch to newly introduced PCD
> 
>  ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S | 6 ++++++
>  ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S     | 8 ++++++++
>  ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm   | 8 ++++++++
>  ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf           | 2 ++
>  ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf          | 2 ++
>  5 files changed, 26 insertions(+)
> 
> diff --git a/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S b/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
> index aab5edab0c42..0950fd0c0cdb 100644
> --- a/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
> +++ b/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
> @@ -84,4 +84,10 @@ _PrepareArguments:
>  
>  _SetupPrimaryCoreStack:
>    mov   sp, x1
> +  MOV64 (x8, FixedPcdGet64 (PcdCPUCoresStackBase))
> +  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
> +             FixedPcdGet32 (PcdInitValueInTempStack) << 32)
> +0:stp   x9, x9, [x8], #16
> +  cmp   x8, x1
> +  b.lt  0b
>    b     _PrepareArguments
> diff --git a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
> index 14344425ad4c..a491af30a048 100644
> --- a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
> +++ b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
> @@ -65,6 +65,14 @@ _PrepareArguments:
>  
>  _SetupPrimaryCoreStack:
>    mov   sp, r1
> +  MOV32 (r8, FixedPcdGet64 (PcdCPUCoresStackBase))
> +  MOV32 (r9, FixedPcdGet32 (PcdInitValueInTempStack))
> +  mov   r10, r9
> +  mov   r11, r9
> +  mov   r12, r9
> +0:stm   r8!, {r9-r12}
> +  cmp   r8, r1
> +  blt   0b
>    b     _PrepareArguments
>  
>  _NeverReturn:
> diff --git a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
> index abea675828df..dc1ad8144492 100644
> --- a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
> +++ b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
> @@ -79,6 +79,14 @@ _PrepareArguments
>  
>  _SetupPrimaryCoreStack
>    mov   sp, r1
> +  mov32 r8, FixedPcdGet64 (PcdCPUCoresStackBase)
> +  mov32 r9, FixedPcdGet32 (PcdInitValueInTempStack)
> +  mov   r10, r9
> +  mov   r11, r9
> +  mov   r12, r9
> +0:stm   r8!, {r9-r12}
> +  cmp   r8, r1
> +  blt   0b
>    b     _PrepareArguments
>  
>  _NeverReturn
> diff --git a/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf b/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
> index ecdbccb8d620..8e0456f8dc2a 100644
> --- a/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
> +++ b/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
> @@ -75,3 +75,5 @@ [FixedPcd]
>    gArmTokenSpaceGuid.PcdGicDistributorBase
>    gArmTokenSpaceGuid.PcdGicInterruptInterfaceBase
>    gArmTokenSpaceGuid.PcdGicSgiIntId
> +
> +  gEfiMdeModulePkgTokenSpaceGuid.PcdInitValueInTempStack
> diff --git a/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf b/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
> index b5d4e389b2a4..ec83cec2d879 100644
> --- a/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
> +++ b/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
> @@ -69,3 +69,5 @@ [FixedPcd]
>    gArmPlatformTokenSpaceGuid.PcdCPUCoresStackBase
>    gArmPlatformTokenSpaceGuid.PcdCPUCorePrimaryStackSize
>    gArmPlatformTokenSpaceGuid.PcdCPUCoreSecondaryStackSize
> +
> +  gEfiMdeModulePkgTokenSpaceGuid.PcdInitValueInTempStack
> -- 
> 2.11.0
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-05  5:52 ` Leif Lindholm
@ 2017-11-05 16:27   ` Ard Biesheuvel
  2017-11-05 16:29     ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2017-11-05 16:27 UTC (permalink / raw)
  To: Leif Lindholm; +Cc: edk2-devel@lists.01.org, Laszlo Ersek, Gao, Liming

On 5 November 2017 at 05:52, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
>> DEBUG builds of PEI code will print a diagnostic message regarding
>> the utilization of temporary RAM before switching to permanent RAM.
>> For example,
>>
>>   Total temporary memory:    16352 bytes.
>>     temporary memory stack ever used:       4820 bytes.
>>     temporary memory heap used for HobList: 4720 bytes.
>>
>> Tracking stack utilization like this requires the stack to be seeded
>> with a known magic value, and this needs to occur before entering C
>> code, given that it uses the stack. Currently, only Nt32Pkg appears
>> to implement this feature, but it is useful nonetheless, so let's
>> wire it up for PrePeiCore as well.
>>
>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>> Contributed-under: TianoCore Contribution Agreement 1.1
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> OK, this may sound completely unreasonable, but seeing those
> implementations overwrite callee-saved registers without saving them
> makes my brain unhappy. (Yes, I know.)
>
> Could they either:
> - Have a comment prepended establishing the implicit ABI of which
>   registers the caller cannot rely on reusing after return.
>   Preferably somewhat echoed at the call site.
> - Be rewritten to use only scratch registers?
>

I think it is implied that the startup code does not adhere to the
AAPCS. That code already uses r5 and r6 without stacking them, simply
because we're in the middle of preparing the stack and other execution
context, precisely so the C code we call into can rely on AAPCS
guarantees.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-05 16:27   ` Ard Biesheuvel
@ 2017-11-05 16:29     ` Ard Biesheuvel
  2017-11-07 18:09       ` Laszlo Ersek
  2017-11-08 16:12       ` Leif Lindholm
  0 siblings, 2 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2017-11-05 16:29 UTC (permalink / raw)
  To: Leif Lindholm; +Cc: edk2-devel@lists.01.org, Laszlo Ersek, Gao, Liming

On 5 November 2017 at 16:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 5 November 2017 at 05:52, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
>>> DEBUG builds of PEI code will print a diagnostic message regarding
>>> the utilization of temporary RAM before switching to permanent RAM.
>>> For example,
>>>
>>>   Total temporary memory:    16352 bytes.
>>>     temporary memory stack ever used:       4820 bytes.
>>>     temporary memory heap used for HobList: 4720 bytes.
>>>
>>> Tracking stack utilization like this requires the stack to be seeded
>>> with a known magic value, and this needs to occur before entering C
>>> code, given that it uses the stack. Currently, only Nt32Pkg appears
>>> to implement this feature, but it is useful nonetheless, so let's
>>> wire it up for PrePeiCore as well.
>>>
>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>
>> OK, this may sound completely unreasonable, but seeing those
>> implementations overwrite callee-saved registers without saving them
>> makes my brain unhappy. (Yes, I know.)
>>
>> Could they either:
>> - Have a comment prepended establishing the implicit ABI of which
>>   registers the caller cannot rely on reusing after return.
>>   Preferably somewhat echoed at the call site.
>> - Be rewritten to use only scratch registers?
>>
>
> I think it is implied that the startup code does not adhere to the
> AAPCS. That code already uses r5 and r6 without stacking them, simply
> because we're in the middle of preparing the stack and other execution
> context, precisely so the C code we call into can rely on AAPCS
> guarantees.


Ehm, hold on, what do you mean by 'call site'? This code just runs and
jumps back to a local label. There are no functions calls here until
the point where we call into C (with the exception of the lovely
ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
it can use)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-03 11:33 [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core Ard Biesheuvel
  2017-11-05  5:52 ` Leif Lindholm
@ 2017-11-06  4:25 ` Gao, Liming
  1 sibling, 0 replies; 12+ messages in thread
From: Gao, Liming @ 2017-11-06  4:25 UTC (permalink / raw)
  To: Ard Biesheuvel, edk2-devel@lists.01.org, leif.lindholm@linaro.org,
	lersek@redhat.com

Reviewed-by: Liming Gao <liming.gao@intel.com>

>-----Original Message-----
>From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org]
>Sent: Friday, November 03, 2017 7:34 PM
>To: edk2-devel@lists.01.org; leif.lindholm@linaro.org; lersek@redhat.com;
>Gao, Liming <liming.gao@intel.com>
>Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>Subject: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack
>before entering PEI core
>
>DEBUG builds of PEI code will print a diagnostic message regarding
>the utilization of temporary RAM before switching to permanent RAM.
>For example,
>
>  Total temporary memory:    16352 bytes.
>    temporary memory stack ever used:       4820 bytes.
>    temporary memory heap used for HobList: 4720 bytes.
>
>Tracking stack utilization like this requires the stack to be seeded
>with a known magic value, and this needs to occur before entering C
>code, given that it uses the stack. Currently, only Nt32Pkg appears
>to implement this feature, but it is useful nonetheless, so let's
>wire it up for PrePeiCore as well.
>
>Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>Contributed-under: TianoCore Contribution Agreement 1.1
>Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>---
>v2: switch to newly introduced PCD
>
> ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S | 6 ++++++
> ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S     | 8 ++++++++
> ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm   | 8 ++++++++
> ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf           | 2 ++
> ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf          | 2 ++
> 5 files changed, 26 insertions(+)
>
>diff --git a/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
>b/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
>index aab5edab0c42..0950fd0c0cdb 100644
>--- a/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
>+++ b/ArmPlatformPkg/PrePeiCore/AArch64/PrePeiCoreEntryPoint.S
>@@ -84,4 +84,10 @@ _PrepareArguments:
>
> _SetupPrimaryCoreStack:
>   mov   sp, x1
>+  MOV64 (x8, FixedPcdGet64 (PcdCPUCoresStackBase))
>+  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
>+             FixedPcdGet32 (PcdInitValueInTempStack) << 32)
>+0:stp   x9, x9, [x8], #16
>+  cmp   x8, x1
>+  b.lt  0b
>   b     _PrepareArguments
>diff --git a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
>b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
>index 14344425ad4c..a491af30a048 100644
>--- a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
>+++ b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.S
>@@ -65,6 +65,14 @@ _PrepareArguments:
>
> _SetupPrimaryCoreStack:
>   mov   sp, r1
>+  MOV32 (r8, FixedPcdGet64 (PcdCPUCoresStackBase))
>+  MOV32 (r9, FixedPcdGet32 (PcdInitValueInTempStack))
>+  mov   r10, r9
>+  mov   r11, r9
>+  mov   r12, r9
>+0:stm   r8!, {r9-r12}
>+  cmp   r8, r1
>+  blt   0b
>   b     _PrepareArguments
>
> _NeverReturn:
>diff --git a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
>b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
>index abea675828df..dc1ad8144492 100644
>--- a/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
>+++ b/ArmPlatformPkg/PrePeiCore/Arm/PrePeiCoreEntryPoint.asm
>@@ -79,6 +79,14 @@ _PrepareArguments
>
> _SetupPrimaryCoreStack
>   mov   sp, r1
>+  mov32 r8, FixedPcdGet64 (PcdCPUCoresStackBase)
>+  mov32 r9, FixedPcdGet32 (PcdInitValueInTempStack)
>+  mov   r10, r9
>+  mov   r11, r9
>+  mov   r12, r9
>+0:stm   r8!, {r9-r12}
>+  cmp   r8, r1
>+  blt   0b
>   b     _PrepareArguments
>
> _NeverReturn
>diff --git a/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
>b/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
>index ecdbccb8d620..8e0456f8dc2a 100644
>--- a/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
>+++ b/ArmPlatformPkg/PrePeiCore/PrePeiCoreMPCore.inf
>@@ -75,3 +75,5 @@ [FixedPcd]
>   gArmTokenSpaceGuid.PcdGicDistributorBase
>   gArmTokenSpaceGuid.PcdGicInterruptInterfaceBase
>   gArmTokenSpaceGuid.PcdGicSgiIntId
>+
>+  gEfiMdeModulePkgTokenSpaceGuid.PcdInitValueInTempStack
>diff --git a/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
>b/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
>index b5d4e389b2a4..ec83cec2d879 100644
>--- a/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
>+++ b/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore.inf
>@@ -69,3 +69,5 @@ [FixedPcd]
>   gArmPlatformTokenSpaceGuid.PcdCPUCoresStackBase
>   gArmPlatformTokenSpaceGuid.PcdCPUCorePrimaryStackSize
>   gArmPlatformTokenSpaceGuid.PcdCPUCoreSecondaryStackSize
>+
>+  gEfiMdeModulePkgTokenSpaceGuid.PcdInitValueInTempStack
>--
>2.11.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-05 16:29     ` Ard Biesheuvel
@ 2017-11-07 18:09       ` Laszlo Ersek
  2017-11-07 18:13         ` Ard Biesheuvel
  2017-11-08 16:12       ` Leif Lindholm
  1 sibling, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2017-11-07 18:09 UTC (permalink / raw)
  To: Ard Biesheuvel, Leif Lindholm; +Cc: edk2-devel@lists.01.org, Gao, Liming

On 11/05/17 17:29, Ard Biesheuvel wrote:
> On 5 November 2017 at 16:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 5 November 2017 at 05:52, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
>>>> DEBUG builds of PEI code will print a diagnostic message regarding
>>>> the utilization of temporary RAM before switching to permanent RAM.
>>>> For example,
>>>>
>>>>   Total temporary memory:    16352 bytes.
>>>>     temporary memory stack ever used:       4820 bytes.
>>>>     temporary memory heap used for HobList: 4720 bytes.
>>>>
>>>> Tracking stack utilization like this requires the stack to be seeded
>>>> with a known magic value, and this needs to occur before entering C
>>>> code, given that it uses the stack. Currently, only Nt32Pkg appears
>>>> to implement this feature, but it is useful nonetheless, so let's
>>>> wire it up for PrePeiCore as well.
>>>>
>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>
>>> OK, this may sound completely unreasonable, but seeing those
>>> implementations overwrite callee-saved registers without saving them
>>> makes my brain unhappy. (Yes, I know.)
>>>
>>> Could they either:
>>> - Have a comment prepended establishing the implicit ABI of which
>>>   registers the caller cannot rely on reusing after return.
>>>   Preferably somewhat echoed at the call site.
>>> - Be rewritten to use only scratch registers?
>>>
>>
>> I think it is implied that the startup code does not adhere to the
>> AAPCS. That code already uses r5 and r6 without stacking them, simply
>> because we're in the middle of preparing the stack and other execution
>> context, precisely so the C code we call into can rely on AAPCS
>> guarantees.
> 
> 
> Ehm, hold on, what do you mean by 'call site'? This code just runs and
> jumps back to a local label. There are no functions calls here until
> the point where we call into C (with the exception of the lovely
> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
> it can use)

Please continue the discussion with Leif on this; from my side, I'm
happy with the patch (I've sort of deduced what the assembly code does,
also relying on your v1 notes).

The only eyebrow-raising part was:

+  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
+             FixedPcdGet32 (PcdInitValueInTempStack) << 32)

where we left-shift a constant that is "in theory" UINT32 by 32 binary
places, using the << operator. In C that would be undefined behavior,
but this is assembly, so what do I know? ¯\_(ツ)_/¯

Acked-by: Laszlo Ersek <lersek@redhat.com>

(

By the way, just to see if I remember correctly, isn't STP:

+0:stp   x9, x9, [x8], #16

the kind of instruction that modifies multiple operands at once, and so
if it faults, it cannot be virtualized well? (Because the syndrome
register or whatever does not tell the VMM the whole picture about the
fault?)

Totally irrelevant here, I'm just curious.

)

Thanks!
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-07 18:09       ` Laszlo Ersek
@ 2017-11-07 18:13         ` Ard Biesheuvel
  2017-11-09 21:11           ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2017-11-07 18:13 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Leif Lindholm, edk2-devel@lists.01.org, Gao, Liming

On 7 November 2017 at 18:09, Laszlo Ersek <lersek@redhat.com> wrote:
> On 11/05/17 17:29, Ard Biesheuvel wrote:
>> On 5 November 2017 at 16:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> On 5 November 2017 at 05:52, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>>>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
>>>>> DEBUG builds of PEI code will print a diagnostic message regarding
>>>>> the utilization of temporary RAM before switching to permanent RAM.
>>>>> For example,
>>>>>
>>>>>   Total temporary memory:    16352 bytes.
>>>>>     temporary memory stack ever used:       4820 bytes.
>>>>>     temporary memory heap used for HobList: 4720 bytes.
>>>>>
>>>>> Tracking stack utilization like this requires the stack to be seeded
>>>>> with a known magic value, and this needs to occur before entering C
>>>>> code, given that it uses the stack. Currently, only Nt32Pkg appears
>>>>> to implement this feature, but it is useful nonetheless, so let's
>>>>> wire it up for PrePeiCore as well.
>>>>>
>>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>>>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>
>>>> OK, this may sound completely unreasonable, but seeing those
>>>> implementations overwrite callee-saved registers without saving them
>>>> makes my brain unhappy. (Yes, I know.)
>>>>
>>>> Could they either:
>>>> - Have a comment prepended establishing the implicit ABI of which
>>>>   registers the caller cannot rely on reusing after return.
>>>>   Preferably somewhat echoed at the call site.
>>>> - Be rewritten to use only scratch registers?
>>>>
>>>
>>> I think it is implied that the startup code does not adhere to the
>>> AAPCS. That code already uses r5 and r6 without stacking them, simply
>>> because we're in the middle of preparing the stack and other execution
>>> context, precisely so the C code we call into can rely on AAPCS
>>> guarantees.
>>
>>
>> Ehm, hold on, what do you mean by 'call site'? This code just runs and
>> jumps back to a local label. There are no functions calls here until
>> the point where we call into C (with the exception of the lovely
>> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
>> it can use)
>
> Please continue the discussion with Leif on this; from my side, I'm
> happy with the patch (I've sort of deduced what the assembly code does,
> also relying on your v1 notes).
>
> The only eyebrow-raising part was:
>
> +  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
> +             FixedPcdGet32 (PcdInitValueInTempStack) << 32)
>
> where we left-shift a constant that is "in theory" UINT32 by 32 binary
> places, using the << operator. In C that would be undefined behavior,
> but this is assembly, so what do I know? ¯\_(ツ)_/¯
>
> Acked-by: Laszlo Ersek <lersek@redhat.com>
>

Thanks. And you're right, this is not C so no need to worry about that.

> (
>
> By the way, just to see if I remember correctly, isn't STP:
>
> +0:stp   x9, x9, [x8], #16
>
> the kind of instruction that modifies multiple operands at once, and so
> if it faults, it cannot be virtualized well? (Because the syndrome
> register or whatever does not tell the VMM the whole picture about the
> fault?)
>
> Totally irrelevant here, I'm just curious.
>

STP == STore Pair, and so it stores the values in the registers to
memory. The only register that gets modified here is x8, due to the
post-increment.

But its converse

LDP  <reg>, <reg>, [<reg>], #<const>

is indeed such an instruction, given that it modifies three registers
at once, and so the registers that encode the exception run out of
space. Note that this only affects virtualized MMIO.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-05 16:29     ` Ard Biesheuvel
  2017-11-07 18:09       ` Laszlo Ersek
@ 2017-11-08 16:12       ` Leif Lindholm
  2017-11-09 21:09         ` Ard Biesheuvel
  1 sibling, 1 reply; 12+ messages in thread
From: Leif Lindholm @ 2017-11-08 16:12 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: edk2-devel@lists.01.org, Laszlo Ersek, Gao, Liming

On Sun, Nov 05, 2017 at 04:29:15PM +0000, Ard Biesheuvel wrote:
> >> OK, this may sound completely unreasonable, but seeing those
> >> implementations overwrite callee-saved registers without saving them
> >> makes my brain unhappy. (Yes, I know.)
> >>
> >> Could they either:
> >> - Have a comment prepended establishing the implicit ABI of which
> >>   registers the caller cannot rely on reusing after return.
> >>   Preferably somewhat echoed at the call site.
> >> - Be rewritten to use only scratch registers?
> >>
> >
> > I think it is implied that the startup code does not adhere to the
> > AAPCS. That code already uses r5 and r6 without stacking them, simply
> > because we're in the middle of preparing the stack and other execution
> > context, precisely so the C code we call into can rely on AAPCS
> > guarantees.
> 
> Ehm, hold on, what do you mean by 'call site'? This code just runs and
> jumps back to a local label. There are no functions calls here until
> the point where we call into C (with the exception of the lovely
> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
> it can use)

Yeah, you're right, I was misreading the block as a subroutine.

Seems the only register that must be preserved across jumps is r5/x5,
and neither of these modifications touch those (or change that fact).

Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>

/
    Leif


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-08 16:12       ` Leif Lindholm
@ 2017-11-09 21:09         ` Ard Biesheuvel
  0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2017-11-09 21:09 UTC (permalink / raw)
  To: Leif Lindholm; +Cc: edk2-devel@lists.01.org, Laszlo Ersek, Gao, Liming

On 8 November 2017 at 16:12, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> On Sun, Nov 05, 2017 at 04:29:15PM +0000, Ard Biesheuvel wrote:
>> >> OK, this may sound completely unreasonable, but seeing those
>> >> implementations overwrite callee-saved registers without saving them
>> >> makes my brain unhappy. (Yes, I know.)
>> >>
>> >> Could they either:
>> >> - Have a comment prepended establishing the implicit ABI of which
>> >>   registers the caller cannot rely on reusing after return.
>> >>   Preferably somewhat echoed at the call site.
>> >> - Be rewritten to use only scratch registers?
>> >>
>> >
>> > I think it is implied that the startup code does not adhere to the
>> > AAPCS. That code already uses r5 and r6 without stacking them, simply
>> > because we're in the middle of preparing the stack and other execution
>> > context, precisely so the C code we call into can rely on AAPCS
>> > guarantees.
>>
>> Ehm, hold on, what do you mean by 'call site'? This code just runs and
>> jumps back to a local label. There are no functions calls here until
>> the point where we call into C (with the exception of the lovely
>> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
>> it can use)
>
> Yeah, you're right, I was misreading the block as a subroutine.
>
> Seems the only register that must be preserved across jumps is r5/x5,
> and neither of these modifications touch those (or change that fact).
>
> Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>
>

Thanks.

Pushed as 7e2a8dfe8a9a


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-07 18:13         ` Ard Biesheuvel
@ 2017-11-09 21:11           ` Ard Biesheuvel
  2017-11-10  9:29             ` Laszlo Ersek
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2017-11-09 21:11 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Leif Lindholm, edk2-devel@lists.01.org, Gao, Liming

On 7 November 2017 at 18:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 7 November 2017 at 18:09, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 11/05/17 17:29, Ard Biesheuvel wrote:
>>> On 5 November 2017 at 16:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>> On 5 November 2017 at 05:52, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>>>>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
>>>>>> DEBUG builds of PEI code will print a diagnostic message regarding
>>>>>> the utilization of temporary RAM before switching to permanent RAM.
>>>>>> For example,
>>>>>>
>>>>>>   Total temporary memory:    16352 bytes.
>>>>>>     temporary memory stack ever used:       4820 bytes.
>>>>>>     temporary memory heap used for HobList: 4720 bytes.
>>>>>>
>>>>>> Tracking stack utilization like this requires the stack to be seeded
>>>>>> with a known magic value, and this needs to occur before entering C
>>>>>> code, given that it uses the stack. Currently, only Nt32Pkg appears
>>>>>> to implement this feature, but it is useful nonetheless, so let's
>>>>>> wire it up for PrePeiCore as well.
>>>>>>
>>>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>>>>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>
>>>>> OK, this may sound completely unreasonable, but seeing those
>>>>> implementations overwrite callee-saved registers without saving them
>>>>> makes my brain unhappy. (Yes, I know.)
>>>>>
>>>>> Could they either:
>>>>> - Have a comment prepended establishing the implicit ABI of which
>>>>>   registers the caller cannot rely on reusing after return.
>>>>>   Preferably somewhat echoed at the call site.
>>>>> - Be rewritten to use only scratch registers?
>>>>>
>>>>
>>>> I think it is implied that the startup code does not adhere to the
>>>> AAPCS. That code already uses r5 and r6 without stacking them, simply
>>>> because we're in the middle of preparing the stack and other execution
>>>> context, precisely so the C code we call into can rely on AAPCS
>>>> guarantees.
>>>
>>>
>>> Ehm, hold on, what do you mean by 'call site'? This code just runs and
>>> jumps back to a local label. There are no functions calls here until
>>> the point where we call into C (with the exception of the lovely
>>> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
>>> it can use)
>>
>> Please continue the discussion with Leif on this; from my side, I'm
>> happy with the patch (I've sort of deduced what the assembly code does,
>> also relying on your v1 notes).
>>
>> The only eyebrow-raising part was:
>>
>> +  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
>> +             FixedPcdGet32 (PcdInitValueInTempStack) << 32)
>>
>> where we left-shift a constant that is "in theory" UINT32 by 32 binary
>> places, using the << operator. In C that would be undefined behavior,
>> but this is assembly, so what do I know? ¯\_(ツ)_/¯
>>
>> Acked-by: Laszlo Ersek <lersek@redhat.com>
>>
>
> Thanks. And you're right, this is not C so no need to worry about that.
>
>> (
>>
>> By the way, just to see if I remember correctly, isn't STP:
>>
>> +0:stp   x9, x9, [x8], #16
>>
>> the kind of instruction that modifies multiple operands at once, and so
>> if it faults, it cannot be virtualized well? (Because the syndrome
>> register or whatever does not tell the VMM the whole picture about the
>> fault?)
>>
>> Totally irrelevant here, I'm just curious.
>>
>
> STP == STore Pair, and so it stores the values in the registers to
> memory. The only register that gets modified here is x8, due to the
> post-increment.
>

... which actually doesn't mean it is not affected by the same issue.

The reason such instructions are more difficult to virtualize is that
it requires KVM to decode the instruction, rather than read the
syndrome registers that can tell it which register we intended to
read/write from. So it is in fact perfectly feasible to virtualize it,
but the KVM authors just haven't bothered yet.

> But its converse
>
> LDP  <reg>, <reg>, [<reg>], #<const>
>
> is indeed such an instruction, given that it modifies three registers
> at once, and so the registers that encode the exception run out of
> space. Note that this only affects virtualized MMIO.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-09 21:11           ` Ard Biesheuvel
@ 2017-11-10  9:29             ` Laszlo Ersek
  2017-11-10 11:01               ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2017-11-10  9:29 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Leif Lindholm, edk2-devel@lists.01.org, Gao, Liming

On 11/09/17 22:11, Ard Biesheuvel wrote:
> On 7 November 2017 at 18:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 7 November 2017 at 18:09, Laszlo Ersek <lersek@redhat.com> wrote:
>>> On 11/05/17 17:29, Ard Biesheuvel wrote:
>>>> On 5 November 2017 at 16:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>> On 5 November 2017 at 05:52, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>>>>>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
>>>>>>> DEBUG builds of PEI code will print a diagnostic message regarding
>>>>>>> the utilization of temporary RAM before switching to permanent RAM.
>>>>>>> For example,
>>>>>>>
>>>>>>>   Total temporary memory:    16352 bytes.
>>>>>>>     temporary memory stack ever used:       4820 bytes.
>>>>>>>     temporary memory heap used for HobList: 4720 bytes.
>>>>>>>
>>>>>>> Tracking stack utilization like this requires the stack to be seeded
>>>>>>> with a known magic value, and this needs to occur before entering C
>>>>>>> code, given that it uses the stack. Currently, only Nt32Pkg appears
>>>>>>> to implement this feature, but it is useful nonetheless, so let's
>>>>>>> wire it up for PrePeiCore as well.
>>>>>>>
>>>>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>>>>>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>
>>>>>> OK, this may sound completely unreasonable, but seeing those
>>>>>> implementations overwrite callee-saved registers without saving them
>>>>>> makes my brain unhappy. (Yes, I know.)
>>>>>>
>>>>>> Could they either:
>>>>>> - Have a comment prepended establishing the implicit ABI of which
>>>>>>   registers the caller cannot rely on reusing after return.
>>>>>>   Preferably somewhat echoed at the call site.
>>>>>> - Be rewritten to use only scratch registers?
>>>>>>
>>>>>
>>>>> I think it is implied that the startup code does not adhere to the
>>>>> AAPCS. That code already uses r5 and r6 without stacking them, simply
>>>>> because we're in the middle of preparing the stack and other execution
>>>>> context, precisely so the C code we call into can rely on AAPCS
>>>>> guarantees.
>>>>
>>>>
>>>> Ehm, hold on, what do you mean by 'call site'? This code just runs and
>>>> jumps back to a local label. There are no functions calls here until
>>>> the point where we call into C (with the exception of the lovely
>>>> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
>>>> it can use)
>>>
>>> Please continue the discussion with Leif on this; from my side, I'm
>>> happy with the patch (I've sort of deduced what the assembly code does,
>>> also relying on your v1 notes).
>>>
>>> The only eyebrow-raising part was:
>>>
>>> +  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
>>> +             FixedPcdGet32 (PcdInitValueInTempStack) << 32)
>>>
>>> where we left-shift a constant that is "in theory" UINT32 by 32 binary
>>> places, using the << operator. In C that would be undefined behavior,
>>> but this is assembly, so what do I know? ¯\_(ツ)_/¯
>>>
>>> Acked-by: Laszlo Ersek <lersek@redhat.com>
>>>
>>
>> Thanks. And you're right, this is not C so no need to worry about that.
>>
>>> (
>>>
>>> By the way, just to see if I remember correctly, isn't STP:
>>>
>>> +0:stp   x9, x9, [x8], #16
>>>
>>> the kind of instruction that modifies multiple operands at once, and so
>>> if it faults, it cannot be virtualized well? (Because the syndrome
>>> register or whatever does not tell the VMM the whole picture about the
>>> fault?)
>>>
>>> Totally irrelevant here, I'm just curious.
>>>
>>
>> STP == STore Pair, and so it stores the values in the registers to
>> memory. The only register that gets modified here is x8, due to the
>> post-increment.
>>
> 
> ... which actually doesn't mean it is not affected by the same issue.
> 
> The reason such instructions are more difficult to virtualize is that
> it requires KVM to decode the instruction, rather than read the
> syndrome registers that can tell it which register we intended to
> read/write from. So it is in fact perfectly feasible to virtualize it,
> but the KVM authors just haven't bothered yet.

Hm, I'm slightly curious if and how this differs from x86 KVM :) In x86
KVM there are huge instruction tables for emulation etc.

Anyway I'm happy this patch is now committed!

Thanks!
Laszlo

> 
>> But its converse
>>
>> LDP  <reg>, <reg>, [<reg>], #<const>
>>
>> is indeed such an instruction, given that it modifies three registers
>> at once, and so the registers that encode the exception run out of
>> space. Note that this only affects virtualized MMIO.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core
  2017-11-10  9:29             ` Laszlo Ersek
@ 2017-11-10 11:01               ` Ard Biesheuvel
  0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2017-11-10 11:01 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: Leif Lindholm, edk2-devel@lists.01.org, Gao, Liming

On 10 November 2017 at 09:29, Laszlo Ersek <lersek@redhat.com> wrote:
> On 11/09/17 22:11, Ard Biesheuvel wrote:
>> On 7 November 2017 at 18:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>> On 7 November 2017 at 18:09, Laszlo Ersek <lersek@redhat.com> wrote:
>>>> On 11/05/17 17:29, Ard Biesheuvel wrote:
>>>>> On 5 November 2017 at 16:27, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>>>>>> On 5 November 2017 at 05:52, Leif Lindholm <leif.lindholm@linaro.org> wrote:
>>>>>>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote:
>>>>>>>> DEBUG builds of PEI code will print a diagnostic message regarding
>>>>>>>> the utilization of temporary RAM before switching to permanent RAM.
>>>>>>>> For example,
>>>>>>>>
>>>>>>>>   Total temporary memory:    16352 bytes.
>>>>>>>>     temporary memory stack ever used:       4820 bytes.
>>>>>>>>     temporary memory heap used for HobList: 4720 bytes.
>>>>>>>>
>>>>>>>> Tracking stack utilization like this requires the stack to be seeded
>>>>>>>> with a known magic value, and this needs to occur before entering C
>>>>>>>> code, given that it uses the stack. Currently, only Nt32Pkg appears
>>>>>>>> to implement this feature, but it is useful nonetheless, so let's
>>>>>>>> wire it up for PrePeiCore as well.
>>>>>>>>
>>>>>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748
>>>>>>>> Contributed-under: TianoCore Contribution Agreement 1.1
>>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>>>>
>>>>>>> OK, this may sound completely unreasonable, but seeing those
>>>>>>> implementations overwrite callee-saved registers without saving them
>>>>>>> makes my brain unhappy. (Yes, I know.)
>>>>>>>
>>>>>>> Could they either:
>>>>>>> - Have a comment prepended establishing the implicit ABI of which
>>>>>>>   registers the caller cannot rely on reusing after return.
>>>>>>>   Preferably somewhat echoed at the call site.
>>>>>>> - Be rewritten to use only scratch registers?
>>>>>>>
>>>>>>
>>>>>> I think it is implied that the startup code does not adhere to the
>>>>>> AAPCS. That code already uses r5 and r6 without stacking them, simply
>>>>>> because we're in the middle of preparing the stack and other execution
>>>>>> context, precisely so the C code we call into can rely on AAPCS
>>>>>> guarantees.
>>>>>
>>>>>
>>>>> Ehm, hold on, what do you mean by 'call site'? This code just runs and
>>>>> jumps back to a local label. There are no functions calls here until
>>>>> the point where we call into C (with the exception of the lovely
>>>>> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM
>>>>> it can use)
>>>>
>>>> Please continue the discussion with Leif on this; from my side, I'm
>>>> happy with the patch (I've sort of deduced what the assembly code does,
>>>> also relying on your v1 notes).
>>>>
>>>> The only eyebrow-raising part was:
>>>>
>>>> +  MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\
>>>> +             FixedPcdGet32 (PcdInitValueInTempStack) << 32)
>>>>
>>>> where we left-shift a constant that is "in theory" UINT32 by 32 binary
>>>> places, using the << operator. In C that would be undefined behavior,
>>>> but this is assembly, so what do I know? ¯\_(ツ)_/¯
>>>>
>>>> Acked-by: Laszlo Ersek <lersek@redhat.com>
>>>>
>>>
>>> Thanks. And you're right, this is not C so no need to worry about that.
>>>
>>>> (
>>>>
>>>> By the way, just to see if I remember correctly, isn't STP:
>>>>
>>>> +0:stp   x9, x9, [x8], #16
>>>>
>>>> the kind of instruction that modifies multiple operands at once, and so
>>>> if it faults, it cannot be virtualized well? (Because the syndrome
>>>> register or whatever does not tell the VMM the whole picture about the
>>>> fault?)
>>>>
>>>> Totally irrelevant here, I'm just curious.
>>>>
>>>
>>> STP == STore Pair, and so it stores the values in the registers to
>>> memory. The only register that gets modified here is x8, due to the
>>> post-increment.
>>>
>>
>> ... which actually doesn't mean it is not affected by the same issue.
>>
>> The reason such instructions are more difficult to virtualize is that
>> it requires KVM to decode the instruction, rather than read the
>> syndrome registers that can tell it which register we intended to
>> read/write from. So it is in fact perfectly feasible to virtualize it,
>> but the KVM authors just haven't bothered yet.
>
> Hm, I'm slightly curious if and how this differs from x86 KVM :) In x86
> KVM there are huge instruction tables for emulation etc.
>

It does differ from x86: on ARM, you can derive most information you
need to emulate an instruction from the CPU registers that describe
the fault condition (i.e.. the syndrome register and the fault address
register). Only, those registers can only describe a single general
purpose register, anything that uses more is difficult to emulate.

It is essentially laziness on the part of the KVM/ARM authors, because
they have been able to get away with it up to this point :-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-11-10 10:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-03 11:33 [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core Ard Biesheuvel
2017-11-05  5:52 ` Leif Lindholm
2017-11-05 16:27   ` Ard Biesheuvel
2017-11-05 16:29     ` Ard Biesheuvel
2017-11-07 18:09       ` Laszlo Ersek
2017-11-07 18:13         ` Ard Biesheuvel
2017-11-09 21:11           ` Ard Biesheuvel
2017-11-10  9:29             ` Laszlo Ersek
2017-11-10 11:01               ` Ard Biesheuvel
2017-11-08 16:12       ` Leif Lindholm
2017-11-09 21:09         ` Ard Biesheuvel
2017-11-06  4:25 ` Gao, Liming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox