public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* VariablePolicy: Final Changes Thread 1 - TPL Ordering
@ 2020-10-07  0:23 Bret Barkelew
  2020-10-07  1:54 ` Michael D Kinney
  2020-10-07 13:04 ` Laszlo Ersek
  0 siblings, 2 replies; 19+ messages in thread
From: Bret Barkelew @ 2020-10-07  0:23 UTC (permalink / raw)
  To: devel@edk2.groups.io

[-- Attachment #1: Type: text/plain, Size: 3473 bytes --]

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.
     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://github.com/corthon/edk2/commit/e7d0164c8263b1fbfb8b4e289851fbedaa8997f1>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.


Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.



We’re suggesting something like the below:



//

// Task priority level

//

#define TPL_APPLICATION       4

#define TPL_CALLBACK_LOW      7

#define TPL_CALLBACK          8

#define TPL_CALLBACK_HIGH     9

#define TPL_NOTIFY_LOW        15

#define TPL_NOTIFY            16

#define TPL_NOTIFY_HIGH       17

#define TPL_HIGH_LEVEL        31



There’s already a long-in-the-tooth bug tracking a similar issue:

https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=02%7C01%7CBret.Barkelew%40microsoft.com%7C9f3aa3e42bee4dd3a39208d8669a00e2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637372158932158036&sdata=vlC%2FB5f%2BDAMMp%2F3ECfBPukWiinHAmQj%2Flz1EPbSDKEA%3D&reserved=0>



This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).



Thoughts?



If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret

[-- Attachment #2: Type: text/html, Size: 9549 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-07  0:23 VariablePolicy: Final Changes Thread 1 - TPL Ordering Bret Barkelew
@ 2020-10-07  1:54 ` Michael D Kinney
  2020-10-07  2:17   ` [edk2-devel] " Andrew Fish
  2020-10-07 13:04 ` Laszlo Ersek
  1 sibling, 1 reply; 19+ messages in thread
From: Michael D Kinney @ 2020-10-07  1:54 UTC (permalink / raw)
  To: devel@edk2.groups.io, bret.barkelew@microsoft.com,
	Kinney, Michael D

[-- Attachment #1: Type: text/plain, Size: 4922 bytes --]

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Bret Barkelew via groups.io
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.
     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://github.com/corthon/edk2/commit/e7d0164c8263b1fbfb8b4e289851fbedaa8997f1>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.


Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.



We’re suggesting something like the below:



//

// Task priority level

//

#define TPL_APPLICATION       4

#define TPL_CALLBACK_LOW      7

#define TPL_CALLBACK          8

#define TPL_CALLBACK_HIGH     9

#define TPL_NOTIFY_LOW        15

#define TPL_NOTIFY            16

#define TPL_NOTIFY_HIGH       17

#define TPL_HIGH_LEVEL        31



There’s already a long-in-the-tooth bug tracking a similar issue:

https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=02%7C01%7CBret.Barkelew%40microsoft.com%7C9f3aa3e42bee4dd3a39208d8669a00e2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637372158932158036&sdata=vlC%2FB5f%2BDAMMp%2F3ECfBPukWiinHAmQj%2Flz1EPbSDKEA%3D&reserved=0>



This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).



Thoughts?



If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret


[-- Attachment #2: Type: text/html, Size: 53685 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-07  1:54 ` Michael D Kinney
@ 2020-10-07  2:17   ` Andrew Fish
  2020-10-07 16:21     ` Michael D Kinney
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Fish @ 2020-10-07  2:17 UTC (permalink / raw)
  To: edk2-devel-groups-io, Mike Kinney; +Cc: bret.barkelew@microsoft.com

[-- Attachment #1: Type: text/plain, Size: 5555 bytes --]

Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD. 

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it. 

Thanks,

Andrew Fish

> On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com> wrote:
> 
> Bret,
>
> It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.
>
> TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.
>
> I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.
>
> Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.
>
> Best regards,
>
> Mike
>
> From: devel@edk2.groups.io <mailto:devel@edk2.groups.io> <devel@edk2.groups.io <mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io <http://groups.io/>
> Sent: Tuesday, October 6, 2020 5:24 PM
> To: devel@edk2.groups.io <mailto:devel@edk2.groups.io>
> Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.
>
> A quick synopsis of the problem:
> Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
> VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
> However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
> I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.
> Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com) <https://github.com/corthon/edk2/commit/e7d0164c8263b1fbfb8b4e289851fbedaa8997f1>
>
> However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.
>
> Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.
>
> We’re suggesting something like the below:
>
> //
> // Task priority level
> //
> #define TPL_APPLICATION       4
> #define TPL_CALLBACK_LOW      7
> #define TPL_CALLBACK          8
> #define TPL_CALLBACK_HIGH     9
> #define TPL_NOTIFY_LOW        15
> #define TPL_NOTIFY            16
> #define TPL_NOTIFY_HIGH       17
> #define TPL_HIGH_LEVEL        31
>
> There’s already a long-in-the-tooth bug tracking a similar issue:
> https://bugzilla.tianocore.org/show_bug.cgi?id=1676 <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=02%7C01%7CBret.Barkelew%40microsoft.com%7C9f3aa3e42bee4dd3a39208d8669a00e2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637372158932158036&sdata=vlC%2FB5f%2BDAMMp%2F3ECfBPukWiinHAmQj%2Flz1EPbSDKEA%3D&reserved=0>
>
> This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).
>
> Thoughts?
>
> If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.
>
> - Bret
> 


[-- Attachment #2: Type: text/html, Size: 17133 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-07  0:23 VariablePolicy: Final Changes Thread 1 - TPL Ordering Bret Barkelew
  2020-10-07  1:54 ` Michael D Kinney
@ 2020-10-07 13:04 ` Laszlo Ersek
  2020-10-07 17:48   ` [EXTERNAL] " Bret Barkelew
  1 sibling, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2020-10-07 13:04 UTC (permalink / raw)
  To: bret.barkelew; +Cc: devel, Michael Kinney, Andrew Fish

Hi Bret,

On 10/07/20 02:23, Bret Barkelew via groups.io wrote:
> As many will be aware, I\x19m in the final stages of having Variable
> Policy ready for commit. However, after moving the \x1cLock\x1d event
> back to EndOfDxe, this exposed a race condition in variable services.
>
> A quick synopsis of the problem:
>
>   *   Previously, MorLock abused a privileged position by being
>       tightly coupled to Variable Services, and its lock callback was
>       called directly so that it could be strongly ordered with the
>       internal property lock that disables future RequestToLock calls.
>
>   *   VariablePolicy attempted to decouple this (without realizing it
>       was a problem) and go through formalized interfaces that could
>       ultimately be broken out of Variable Services altogether.
>
>   *   However, doing so triggered the race condition, causing an
>       ASSERT when MorLock attempted to lock its variables.
>
>   *   I current have a reimplementation of the strong ordering
>       workaround that can be previewed at the link below. I have
>       tested that it works the same as the OLD system.
>
>      *   Take a stab at solving the lock ordering problem ·
>          corthon/edk2@e7d0164
>          (github.com)<https://github.com/corthon/edk2/commit/e7d0164c8263b1fbfb8b4e289851fbedaa8997f1>
>
> However, replacing one bad design with another is not what this
> community is about (when we can help it), so we\x19d like to take a
> moment to revisit a conversation that has come up before: expanding
> the number of supported TPL levels.

Have you considered my suggestion at
<https://edk2.groups.io/g/devel/message/65576> (alternate link:
<http://mid.mail-archive.com/645bb333-c3f4-0acd-f0e8-b783149bec18@redhat.com>)?

The idea needs no services or guarantees other than what are already
specified in UEFI.

Originally, I attempted to write up my proposal in generic terms. But
here it is again, with the specifics substituted:

(1) The driver that provides the Variable Policy Engine -- that is, the
    Variable Services Driver, AIUI -- defines (standardizes) a new,
    well-known protocol GUID.

    If the protocol is present in the protocol database, that means that
    LockVariablePolicy() must not be performed by the Variable Services
    Driver in its EndOfDxe handler.

    Let's call this protocol
    EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL.

    There is no need for an actual protocol structure; it can be a null
    protocol.

(2) The driver that intends to lock down the
    "MemoryOverwriteRequestControlLock" variable at EndOfDxe, through
    the Variable Policy Engine, produces an instance of
    EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL, in its entry point
    function.

    (Note that this "MorLock owner" driver may be different from, or
    identical to, the driver that implements the Variable Policy Engine
    itself, i.e., the Variable Services Driver.)

    The protocol instance can be the sole protocol on a new handle, or
    it can be installed on a long-term handle (such as "gImageHandle" of
    the MorLock owner).

(3) In its EndOfDxe handler, the MorLock owner uninstalls
    EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL (potentially destroying
    the handle as well).

    This uninstallation is safe because the highest TPL that said
    EndOfDxe handler may run at is TPL_NOTIFY, and uninstalling
    protocols is safe at <=TPL_NOTIFY. Memory allocation services are
    also explicitly safe at <=TPL_NOTIFY.

(4) The Variable Services Driver factors the LockVariablePolicy()
    invocation out of its EndOfDxe handler to a new function.

    The new function -- consisting of locking the variable policy only
    -- has the function signature of a typical UEFI event notification
    function.

    Let's call this function LockVariablePolicyWhenSignaled().

(5) In its entry point function, the Variable Services Driver creates a
    new event (outside of any event group), for which
    LockVariablePolicyWhenSignaled() is the notification function. The
    TPL is TPL_CALLBACK. Let's call the event LockVariablePolicyEvent.

(6) In the Variable Services Driver's EndOfDxe handler, the original,
    open-coded LockVariablePolicy() call is replaced with the following
    logic:

    - If EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL exists in the
      protocol database (according to gBS->LocateProtocol()), then the
      Variable Services Driver's EndOfDxe handler *signals*
      LockVariablePolicyEvent.

      (This is safe because gBS->LocateProtocol() is safe to call at up
      to and including TPL_NOTIFY, and gBS->SignalEvent() is safe even
      at TPL_HIGH_LEVEL.)

    - Otherwise, the Variable Services Driver's EndOfDxe *calls*
      LockVariablePolicyWhenSignaled() with a normal function call.


It's supposed to work like this:

- If the MorLock owner is not built into the firmware at all, it won't
  install EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL. Thus, the
  Variable Services Driver directly calls LockVariablePolicy() from its
  original EndOfDxe handler. That is, no change in behavior.

- If the MorLock owner's EndOfDxe handler completes first (locking down
  the MorLock variable through the variable policy, and then
  uninstalling EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL), then the
  Variable Services Driver's EndOfDxe handler will again not see
  EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL. So goto previous point.

- Multiple drivers similar to the MorLock owner may coexist; multiple
  EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL instances may coexist
  (all NULL interfaces, but on different handles). If there is at least
  one such protocol instance, gBS->LocateProtocol() in the Variable
  Services Driver's EndOfDxe handler will find the first one, and
  postpone the LockVariablePolicy() call -- it will signal
  LockVariablePolicyEvent.

- When the Variable Services Driver's EndOfDxe handler signals
  LockVariablePolicyEvent, LockVariablePolicyWhenSignaled() will be
  enqueued at TPL_CALLBACK. Note that, given that the Variable Services
  Driver's EndOfDxe handler is running at the moment, the EndOfDxe event
  group has been signaled previously. And that means that *all* events
  in the EndOfDxe event group have been signaled, and *all* their
  notification functions have been enqueued too (in some unspecified
  order, except for the specified dispatch ordering between TPL_NOTIFY
  and TPL_CALLBACK).

  So when LockVariablePolicyEvent is signaled,
  LockVariablePolicyWhenSignaled() is enqueued *after* all the other --
  already enqueued -- EndOfDxe notification functions, *including* the
  EndOfDxe handler of the MorLock owner.

  That's because

  (a) LockVariablePolicyEvent uses TPL_CALLBACK, so the already pending
      TPL_NOTIFY notifications will be dispatched before it of course,
      and

  (b) regarding notify functions enqueued previously at the same TPL
      (=TPL_CALLBACK), functions in any given TPL queue are queued /
      dispatched in FIFO order.

  This means that LockVariablePolicyWhenSignaled() will be dispatched
  *after* the MorLock owner's EndOfDxe handler, *regardless* of the
  relative dispatch order between the Variable Services Driver's
  EndOfDxe handler and the MorLock owner's EndOfDxe handler.

- By the time LockVariablePolicyWhenSignaled() runs, it could (and
  should) ASSERT() that EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL
  does not exist.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-07  2:17   ` [edk2-devel] " Andrew Fish
@ 2020-10-07 16:21     ` Michael D Kinney
  2020-10-07 16:39       ` Bret Barkelew
  0 siblings, 1 reply; 19+ messages in thread
From: Michael D Kinney @ 2020-10-07 16:21 UTC (permalink / raw)
  To: Andrew Fish, edk2-devel-groups-io, Kinney, Michael D
  Cc: bret.barkelew@microsoft.com

[-- Attachment #1: Type: text/plain, Size: 6669 bytes --]

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: bret.barkelew@microsoft.com
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish


On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<http://groups.io/>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://github.com/corthon/edk2/commit/e7d0164c8263b1fbfb8b4e289851fbedaa8997f1>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=02%7C01%7CBret.Barkelew%40microsoft.com%7C9f3aa3e42bee4dd3a39208d8669a00e2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637372158932158036&sdata=vlC%2FB5f%2BDAMMp%2F3ECfBPukWiinHAmQj%2Flz1EPbSDKEA%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret



[-- Attachment #2: Type: text/html, Size: 61556 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-07 16:21     ` Michael D Kinney
@ 2020-10-07 16:39       ` Bret Barkelew
  2020-10-12 16:44         ` Michael D Kinney
  0 siblings, 1 reply; 19+ messages in thread
From: Bret Barkelew @ 2020-10-07 16:39 UTC (permalink / raw)
  To: Kinney, Michael D, Andrew Fish, edk2-devel-groups-io


[-- Attachment #1.1: Type: text/plain, Size: 8090 bytes --]

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: bret.barkelew@microsoft.com
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish


On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=02%7C01%7Cbret.barkelew%40microsoft.com%7C648157d1494c4150193808d86add058c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376844812012299&sdata=QPFFE2qiERYshhdMKPeBP8qsaXhTpppj79LPtTOytI4%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=02%7C01%7Cbret.barkelew%40microsoft.com%7C648157d1494c4150193808d86add058c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376844812022290&sdata=Fi7G%2FiN25f5EIi0SkgmeXbPnh4OU%2BH1lGKk6dujzDwc%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=02%7C01%7Cbret.barkelew%40microsoft.com%7C648157d1494c4150193808d86add058c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376844812022290&sdata=RzWFMtDFkI7WKJyewO2LI%2BmHZX2nZupFloDMSJlS1rM%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret




[-- Attachment #1.2: Type: text/html, Size: 20896 bytes --]

[-- Attachment #2: 7A42A8CF9BBC4E589B6D9D129C3881BD.png --]
[-- Type: image/png, Size: 151 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-07 13:04 ` Laszlo Ersek
@ 2020-10-07 17:48   ` Bret Barkelew
  0 siblings, 0 replies; 19+ messages in thread
From: Bret Barkelew @ 2020-10-07 17:48 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: devel@edk2.groups.io, Kinney, Michael D, Andrew Fish

[-- Attachment #1: Type: text/plain, Size: 9693 bytes --]

Laszlo,
Thanks! We did see your proposal and discussed it internal, but concluded it was too complex a pattern to replicate throughout the code and would rather settle on a consistent solution. We’d rather keep pursuing this TPL discussion and it sounds like there’s some appetite outside of our team, and a possible path forward with PCDs.

I do really appreciate the detailed proposal and write-up, though!


Mike,
Regarding your PCD approach, is that something I could help you write up a PoC for?

- Bret

From: Laszlo Ersek<mailto:lersek@redhat.com>
Sent: Wednesday, October 7, 2020 6:05 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Cc: devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>; Andrew Fish<mailto:afish@apple.com>
Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Bret,

On 10/07/20 02:23, Bret Barkelew via groups.io wrote:
> As many will be aware, Im in the final stages of having Variable
> Policy ready for commit. However, after moving the \x1cLock event
> back to EndOfDxe, this exposed a race condition in variable services.
>
> A quick synopsis of the problem:
>
>   *   Previously, MorLock abused a privileged position by being
>       tightly coupled to Variable Services, and its lock callback was
>       called directly so that it could be strongly ordered with the
>       internal property lock that disables future RequestToLock calls.
>
>   *   VariablePolicy attempted to decouple this (without realizing it
>       was a problem) and go through formalized interfaces that could
>       ultimately be broken out of Variable Services altogether.
>
>   *   However, doing so triggered the race condition, causing an
>       ASSERT when MorLock attempted to lock its variables.
>
>   *   I current have a reimplementation of the strong ordering
>       workaround that can be previewed at the link below. I have
>       tested that it works the same as the OLD system.
>
>      *   Take a stab at solving the lock ordering problem ·
>          corthon/edk2@e7d0164
>          (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C3ab943e1b943407df03308d86ac1a467%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376727204151506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oslzgrrWjmtYotlmSnTgiB74JVogjttS3Rvmn0s7%2BGY%3D&amp;reserved=0>
>
> However, replacing one bad design with another is not what this
> community is about (when we can help it), so wed like to take a
> moment to revisit a conversation that has come up before: expanding
> the number of supported TPL levels.

Have you considered my suggestion at
<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fedk2.groups.io%2Fg%2Fdevel%2Fmessage%2F65576&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C3ab943e1b943407df03308d86ac1a467%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376727204151506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2Fhgd6pDv69ME2QFzqECVwU6utTzVbqXaVrWs2DOZJTw%3D&amp;reserved=0> (alternate link:
<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmid.mail-archive.com%2F645bb333-c3f4-0acd-f0e8-b783149bec18%40redhat.com&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C3ab943e1b943407df03308d86ac1a467%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376727204151506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=RuymdfGtLkMcp%2FGq31VFRg9IePLTlrB4ZC3AAzMlUfo%3D&amp;reserved=0>)?

The idea needs no services or guarantees other than what are already
specified in UEFI.

Originally, I attempted to write up my proposal in generic terms. But
here it is again, with the specifics substituted:

(1) The driver that provides the Variable Policy Engine -- that is, the
    Variable Services Driver, AIUI -- defines (standardizes) a new,
    well-known protocol GUID.

    If the protocol is present in the protocol database, that means that
    LockVariablePolicy() must not be performed by the Variable Services
    Driver in its EndOfDxe handler.

    Let's call this protocol
    EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL.

    There is no need for an actual protocol structure; it can be a null
    protocol.

(2) The driver that intends to lock down the
    "MemoryOverwriteRequestControlLock" variable at EndOfDxe, through
    the Variable Policy Engine, produces an instance of
    EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL, in its entry point
    function.

    (Note that this "MorLock owner" driver may be different from, or
    identical to, the driver that implements the Variable Policy Engine
    itself, i.e., the Variable Services Driver.)

    The protocol instance can be the sole protocol on a new handle, or
    it can be installed on a long-term handle (such as "gImageHandle" of
    the MorLock owner).

(3) In its EndOfDxe handler, the MorLock owner uninstalls
    EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL (potentially destroying
    the handle as well).

    This uninstallation is safe because the highest TPL that said
    EndOfDxe handler may run at is TPL_NOTIFY, and uninstalling
    protocols is safe at <=TPL_NOTIFY. Memory allocation services are
    also explicitly safe at <=TPL_NOTIFY.

(4) The Variable Services Driver factors the LockVariablePolicy()
    invocation out of its EndOfDxe handler to a new function.

    The new function -- consisting of locking the variable policy only
    -- has the function signature of a typical UEFI event notification
    function.

    Let's call this function LockVariablePolicyWhenSignaled().

(5) In its entry point function, the Variable Services Driver creates a
    new event (outside of any event group), for which
    LockVariablePolicyWhenSignaled() is the notification function. The
    TPL is TPL_CALLBACK. Let's call the event LockVariablePolicyEvent.

(6) In the Variable Services Driver's EndOfDxe handler, the original,
    open-coded LockVariablePolicy() call is replaced with the following
    logic:

    - If EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL exists in the
      protocol database (according to gBS->LocateProtocol()), then the
      Variable Services Driver's EndOfDxe handler *signals*
      LockVariablePolicyEvent.

      (This is safe because gBS->LocateProtocol() is safe to call at up
      to and including TPL_NOTIFY, and gBS->SignalEvent() is safe even
      at TPL_HIGH_LEVEL.)

    - Otherwise, the Variable Services Driver's EndOfDxe *calls*
      LockVariablePolicyWhenSignaled() with a normal function call.


It's supposed to work like this:

- If the MorLock owner is not built into the firmware at all, it won't
  install EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL. Thus, the
  Variable Services Driver directly calls LockVariablePolicy() from its
  original EndOfDxe handler. That is, no change in behavior.

- If the MorLock owner's EndOfDxe handler completes first (locking down
  the MorLock variable through the variable policy, and then
  uninstalling EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL), then the
  Variable Services Driver's EndOfDxe handler will again not see
  EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL. So goto previous point.

- Multiple drivers similar to the MorLock owner may coexist; multiple
  EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL instances may coexist
  (all NULL interfaces, but on different handles). If there is at least
  one such protocol instance, gBS->LocateProtocol() in the Variable
  Services Driver's EndOfDxe handler will find the first one, and
  postpone the LockVariablePolicy() call -- it will signal
  LockVariablePolicyEvent.

- When the Variable Services Driver's EndOfDxe handler signals
  LockVariablePolicyEvent, LockVariablePolicyWhenSignaled() will be
  enqueued at TPL_CALLBACK. Note that, given that the Variable Services
  Driver's EndOfDxe handler is running at the moment, the EndOfDxe event
  group has been signaled previously. And that means that *all* events
  in the EndOfDxe event group have been signaled, and *all* their
  notification functions have been enqueued too (in some unspecified
  order, except for the specified dispatch ordering between TPL_NOTIFY
  and TPL_CALLBACK).

  So when LockVariablePolicyEvent is signaled,
  LockVariablePolicyWhenSignaled() is enqueued *after* all the other --
  already enqueued -- EndOfDxe notification functions, *including* the
  EndOfDxe handler of the MorLock owner.

  That's because

  (a) LockVariablePolicyEvent uses TPL_CALLBACK, so the already pending
      TPL_NOTIFY notifications will be dispatched before it of course,
      and

  (b) regarding notify functions enqueued previously at the same TPL
      (=TPL_CALLBACK), functions in any given TPL queue are queued /
      dispatched in FIFO order.

  This means that LockVariablePolicyWhenSignaled() will be dispatched
  *after* the MorLock owner's EndOfDxe handler, *regardless* of the
  relative dispatch order between the Variable Services Driver's
  EndOfDxe handler and the MorLock owner's EndOfDxe handler.

- By the time LockVariablePolicyWhenSignaled() runs, it could (and
  should) ASSERT() that EDKII_KEEP_VARIABLE_POLICY_UNLOCKED_PROTOCOL
  does not exist.

Thanks
Laszlo


[-- Attachment #2: Type: text/html, Size: 15010 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-07 16:39       ` Bret Barkelew
@ 2020-10-12 16:44         ` Michael D Kinney
  2020-10-12 22:14           ` Bret Barkelew
  0 siblings, 1 reply; 19+ messages in thread
From: Michael D Kinney @ 2020-10-12 16:44 UTC (permalink / raw)
  To: Bret Barkelew, Andrew Fish, edk2-devel-groups-io,
	Kinney, Michael D


[-- Attachment #1.1: Type: text/plain, Size: 9425 bytes --]

Bret,

How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.

Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com>
Sent: Wednesday, October 7, 2020 9:39 AM
To: Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>; edk2-devel-groups-io <devel@edk2.groups.io>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish

On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=02%7C01%7Cbret.barkelew%40microsoft.com%7C648157d1494c4150193808d86add058c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376844812012299&sdata=QPFFE2qiERYshhdMKPeBP8qsaXhTpppj79LPtTOytI4%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=02%7C01%7Cbret.barkelew%40microsoft.com%7C648157d1494c4150193808d86add058c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376844812022290&sdata=Fi7G%2FiN25f5EIi0SkgmeXbPnh4OU%2BH1lGKk6dujzDwc%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=02%7C01%7Cbret.barkelew%40microsoft.com%7C648157d1494c4150193808d86add058c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637376844812022290&sdata=RzWFMtDFkI7WKJyewO2LI%2BmHZX2nZupFloDMSJlS1rM%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret




[-- Attachment #1.2: Type: text/html, Size: 65109 bytes --]

[-- Attachment #2: image002.png --]
[-- Type: image/png, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-12 16:44         ` Michael D Kinney
@ 2020-10-12 22:14           ` Bret Barkelew
  2020-10-13 15:04             ` Michael D Kinney
  0 siblings, 1 reply; 19+ messages in thread
From: Bret Barkelew @ 2020-10-12 22:14 UTC (permalink / raw)
  To: Kinney, Michael D, Andrew Fish, edk2-devel-groups-io


[-- Attachment #1.1: Type: text/plain, Size: 10770 bytes --]

Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.

It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Monday, October 12, 2020 9:44 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Bret,

How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.

Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com>
Sent: Wednesday, October 7, 2020 9:39 AM
To: Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>; edk2-devel-groups-io <devel@edk2.groups.io>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish

On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939940223%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=B%2FSkE6r5bdZUiS4DvdcIiYEV4sATOssRs1CyDKJeLyQ%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939950217%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YAITWECmthe%2Bc%2F7wYkqCuQTWp0%2FMrYFbe4DLGkRJD7M%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939950217%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jqOpc94XUQPPHpuUFSYRGzr23UTtMBVM4ppdor66rtg%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret





[-- Attachment #1.2: Type: text/html, Size: 24890 bytes --]

[-- Attachment #2: 3EC3C307D51E4C48BFCC86763C0E0E74.png --]
[-- Type: image/png, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-12 22:14           ` Bret Barkelew
@ 2020-10-13 15:04             ` Michael D Kinney
  2020-10-13 17:56               ` Bret Barkelew
  0 siblings, 1 reply; 19+ messages in thread
From: Michael D Kinney @ 2020-10-13 15:04 UTC (permalink / raw)
  To: Bret Barkelew, Andrew Fish, edk2-devel-groups-io,
	Kinney, Michael D


[-- Attachment #1.1: Type: text/plain, Size: 11997 bytes --]

Hi Bret,

If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.

I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.

I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.

The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com>
Sent: Monday, October 12, 2020 3:14 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>; edk2-devel-groups-io <devel@edk2.groups.io>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.

It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Monday, October 12, 2020 9:44 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Bret,

How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.

Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
Sent: Wednesday, October 7, 2020 9:39 AM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish

On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939940223%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=B%2FSkE6r5bdZUiS4DvdcIiYEV4sATOssRs1CyDKJeLyQ%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939950217%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=YAITWECmthe%2Bc%2F7wYkqCuQTWp0%2FMrYFbe4DLGkRJD7M%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C3630fdf48d3349fedde408d86ece23e3%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381178939950217%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jqOpc94XUQPPHpuUFSYRGzr23UTtMBVM4ppdor66rtg%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret





[-- Attachment #1.2: Type: text/html, Size: 69680 bytes --]

[-- Attachment #2: image002.png --]
[-- Type: image/png, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-13 15:04             ` Michael D Kinney
@ 2020-10-13 17:56               ` Bret Barkelew
  2020-10-13 18:22                 ` Michael D Kinney
  0 siblings, 1 reply; 19+ messages in thread
From: Bret Barkelew @ 2020-10-13 17:56 UTC (permalink / raw)
  To: devel@edk2.groups.io, Kinney, Michael D, Andrew Fish


[-- Attachment #1.1: Type: text/plain, Size: 12947 bytes --]

The problem with the other solution is that it doesn’t give the platform any flexibility to move which event the lock is attached to. For EDK2 default, we’re making it EndOfDxe, because that matches most platforms’ threat models. For Mu, we use ReadyToBoot.

The current “hack” makes that more difficult without multiple patches that we have to carry to remove the “fixed” ordering of the events.

That said, I do think I’ll clean that up and submit it, unless Sean reads this and stops me. 😉

- Bret

From: Michael D Kinney via groups.io<mailto:michael.d.kinney=intel.com@groups.io>
Sent: Tuesday, October 13, 2020 8:05 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Bret,

If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.

I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.

I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.

The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com>
Sent: Monday, October 12, 2020 3:14 PM
To: Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>; edk2-devel-groups-io <devel@edk2.groups.io>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.

It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Monday, October 12, 2020 9:44 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Bret,

How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.

Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
Sent: Wednesday, October 7, 2020 9:39 AM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish

On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=04%7C01%7CBret.Barkelew%40microsoft.com%7Cd4b3fd22deb24d7f1d8b08d86f896007%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381983104766230%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kdTijFccFTPsaf1dk3VScTBFOpl%2BmV02J%2BpAfgLpUeM%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=04%7C01%7CBret.Barkelew%40microsoft.com%7Cd4b3fd22deb24d7f1d8b08d86f896007%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381983104776188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cru%2BpM5Y8dhq7bgruSm2lmPrllS1jz57R5I4ZVFWSZk%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=04%7C01%7CBret.Barkelew%40microsoft.com%7Cd4b3fd22deb24d7f1d8b08d86f896007%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381983104776188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=V8%2BK18%2B7Rimfs6BouBgI6qINT6TU6%2B5TG8xSqNnS2ks%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret






[-- Attachment #1.2: Type: text/html, Size: 28524 bytes --]

[-- Attachment #2: BCBA48177298481DB9FCE39134D77DDD.png --]
[-- Type: image/png, Size: 140 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-13 17:56               ` Bret Barkelew
@ 2020-10-13 18:22                 ` Michael D Kinney
  2020-10-13 18:37                   ` Bret Barkelew
       [not found]                   ` <163DA129E4AEAEEE.24865@groups.io>
  0 siblings, 2 replies; 19+ messages in thread
From: Michael D Kinney @ 2020-10-13 18:22 UTC (permalink / raw)
  To: Bret Barkelew, devel@edk2.groups.io, Andrew Fish,
	Kinney, Michael D


[-- Attachment #1.1: Type: text/plain, Size: 13754 bytes --]

Hi Bret,

The concept of making the platform lock point configurable seems like a good idea from a platform porting perspective.  If we had that feature would that make this better?

Have you studied the features DPC provides?  It is functionally similar to having NOTIFY-1 and CALLBACK-1 TPL levels.

https://github.com/tianocore/edk2/blob/master/NetworkPkg/DpcDxe/Dpc.c

Thanks,

Mike


From: Bret Barkelew <Bret.Barkelew@microsoft.com>
Sent: Tuesday, October 13, 2020 10:57 AM
To: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

The problem with the other solution is that it doesn’t give the platform any flexibility to move which event the lock is attached to. For EDK2 default, we’re making it EndOfDxe, because that matches most platforms’ threat models. For Mu, we use ReadyToBoot.

The current “hack” makes that more difficult without multiple patches that we have to carry to remove the “fixed” ordering of the events.

That said, I do think I’ll clean that up and submit it, unless Sean reads this and stops me. 😉

- Bret

From: Michael D Kinney via groups.io<mailto:michael.d.kinney=intel.com@groups.io>
Sent: Tuesday, October 13, 2020 8:05 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Bret,

If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.

I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.

I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.

The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
Sent: Monday, October 12, 2020 3:14 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.

It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Monday, October 12, 2020 9:44 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Bret,

How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.

Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
Sent: Wednesday, October 7, 2020 9:39 AM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish

On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=04%7C01%7CBret.Barkelew%40microsoft.com%7Cd4b3fd22deb24d7f1d8b08d86f896007%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381983104766230%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=kdTijFccFTPsaf1dk3VScTBFOpl%2BmV02J%2BpAfgLpUeM%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=04%7C01%7CBret.Barkelew%40microsoft.com%7Cd4b3fd22deb24d7f1d8b08d86f896007%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381983104776188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=cru%2BpM5Y8dhq7bgruSm2lmPrllS1jz57R5I4ZVFWSZk%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=04%7C01%7CBret.Barkelew%40microsoft.com%7Cd4b3fd22deb24d7f1d8b08d86f896007%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637381983104776188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=V8%2BK18%2B7Rimfs6BouBgI6qINT6TU6%2B5TG8xSqNnS2ks%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret






[-- Attachment #1.2: Type: text/html, Size: 73421 bytes --]

[-- Attachment #2: image002.png --]
[-- Type: image/png, Size: 162 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-13 18:22                 ` Michael D Kinney
@ 2020-10-13 18:37                   ` Bret Barkelew
       [not found]                   ` <163DA129E4AEAEEE.24865@groups.io>
  1 sibling, 0 replies; 19+ messages in thread
From: Bret Barkelew @ 2020-10-13 18:37 UTC (permalink / raw)
  To: Kinney, Michael D, devel@edk2.groups.io, Andrew Fish


[-- Attachment #1.1: Type: text/plain, Size: 14660 bytes --]

I’ll try to take a look at that today. Thanks!

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Tuesday, October 13, 2020 11:22 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Bret,

The concept of making the platform lock point configurable seems like a good idea from a platform porting perspective.  If we had that feature would that make this better?

Have you studied the features DPC provides?  It is functionally similar to having NOTIFY-1 and CALLBACK-1 TPL levels.

https://github.com/tianocore/edk2/blob/master/NetworkPkg/DpcDxe/Dpc.c<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C2cf0ec72467b461c38fc08d86fa4ed18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382101443215576%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=00oFUUF75fDy%2Bc0ztQqEnKAIXNaOTSebdf6ceKAIyI0%3D&reserved=0>

Thanks,

Mike


From: Bret Barkelew <Bret.Barkelew@microsoft.com>
Sent: Tuesday, October 13, 2020 10:57 AM
To: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

The problem with the other solution is that it doesn’t give the platform any flexibility to move which event the lock is attached to. For EDK2 default, we’re making it EndOfDxe, because that matches most platforms’ threat models. For Mu, we use ReadyToBoot.

The current “hack” makes that more difficult without multiple patches that we have to carry to remove the “fixed” ordering of the events.

That said, I do think I’ll clean that up and submit it, unless Sean reads this and stops me. 😉

- Bret

From: Michael D Kinney via groups.io<mailto:michael.d.kinney=intel.com@groups.io>
Sent: Tuesday, October 13, 2020 8:05 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Bret,

If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.

I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.

I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.

The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
Sent: Monday, October 12, 2020 3:14 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.

It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Monday, October 12, 2020 9:44 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Bret,

How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.

Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
Sent: Wednesday, October 7, 2020 9:39 AM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish

On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C2cf0ec72467b461c38fc08d86fa4ed18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382101443220566%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=V860%2BNqgnILm5502JDboE%2F5JMEUpkFCq1GkV70lqNQA%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C2cf0ec72467b461c38fc08d86fa4ed18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382101443225556%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=HTYL1RRxDnM14ElcxhjmO9t5CMU2Gfpy68KTwsMQrlk%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=04%7C01%7CBret.Barkelew%40microsoft.com%7C2cf0ec72467b461c38fc08d86fa4ed18%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382101443235539%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jFApBnmowZc62f2Shj3StbMB%2FTXMPiUauS%2Fz5r%2FNt40%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret







[-- Attachment #1.2: Type: text/html, Size: 31773 bytes --]

[-- Attachment #2: 3A8F868D150A4F3892D8C71AE381FA58.png --]
[-- Type: image/png, Size: 140 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
       [not found]                   ` <163DA129E4AEAEEE.24865@groups.io>
@ 2020-10-15 17:03                     ` Bret Barkelew
  2020-10-19 18:44                       ` Laszlo Ersek
  0 siblings, 1 reply; 19+ messages in thread
From: Bret Barkelew @ 2020-10-15 17:03 UTC (permalink / raw)
  To: devel@edk2.groups.io, Kinney, Michael D, Andrew Fish


[-- Attachment #1.1: Type: text/plain, Size: 16143 bytes --]

I looked over it and I think that DPC solves the mirror image to the problem we’re trying to solve (and behaves in a fashion very similar to DelayedDispatch for PEI).

However, while studying it, I thought of a proposal that may be a balance between the complexity of Laszlo’s and the current “fixed” ordering.

What if we register (and encourage anyone who needs it to register) the MorLock callback on both EndOfDxe AND a new VarPolReadyToLock event. VarPolReadyToLock is signaled in the EndOfDxe callback which locks the VarPolicy. In the MorLock callback, we close both even registrations whenever the first one completes. In this pattern, we may have to register the MorLock as TPL_NOTIFY for the VarPolReadyToLock callback.

Or, we could formalize the registration with an official protocol interface and even pass a pointer to the VarPol interface in the callback when invoked.

Upon further review, I could also be open to Laszlo’s approach if the whole design could be reduced to a library for reuse. It’s not something I would want to be copying and pasting around the code.

- Bret

From: Bret Barkelew via groups.io<mailto:bret.barkelew=microsoft.com@groups.io>
Sent: Tuesday, October 13, 2020 11:37 AM
To: Kinney, Michael D<mailto:michael.d.kinney@intel.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>
Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

I’ll try to take a look at that today. Thanks!

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Tuesday, October 13, 2020 11:22 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Bret,

The concept of making the platform lock point configurable seems like a good idea from a platform porting perspective.  If we had that feature would that make this better?

Have you studied the features DPC provides?  It is functionally similar to having NOTIFY-1 and CALLBACK-1 TPL levels.

https://github.com/tianocore/edk2/blob/master/NetworkPkg/DpcDxe/Dpc.c<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&data=04%7C01%7Cbret.barkelew%40microsoft.com%7C22b77b20547945404a0208d86fa707f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382110480463333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CZHMkcqc81%2BnNovzWaS3uiUiv5i7HVWUDt3dc88%2Fkbw%3D&reserved=0>

Thanks,

Mike


From: Bret Barkelew <Bret.Barkelew@microsoft.com>
Sent: Tuesday, October 13, 2020 10:57 AM
To: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

The problem with the other solution is that it doesn’t give the platform any flexibility to move which event the lock is attached to. For EDK2 default, we’re making it EndOfDxe, because that matches most platforms’ threat models. For Mu, we use ReadyToBoot.

The current “hack” makes that more difficult without multiple patches that we have to carry to remove the “fixed” ordering of the events.

That said, I do think I’ll clean that up and submit it, unless Sean reads this and stops me. 😉

- Bret

From: Michael D Kinney via groups.io<mailto:michael.d.kinney=intel.com@groups.io>
Sent: Tuesday, October 13, 2020 8:05 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Bret,

If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.

I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.

I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.

The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
Sent: Monday, October 12, 2020 3:14 PM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.

It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Monday, October 12, 2020 9:44 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Bret,

How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.

Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?

Mike

From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
Sent: Wednesday, October 7, 2020 9:39 AM
To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.

- Bret

From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Sent: Wednesday, October 7, 2020 9:21 AM
To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

Hi Andrew,

I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.

Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?

Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.

Mike

From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
Sent: Tuesday, October 6, 2020 7:18 PM
To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
Mike,

When I’ve done things like this in the past I think of making them configurable like via a PCD.

In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.

Thanks,

Andrew Fish

On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:

Bret,

It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.

TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.

I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.

Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.

Best regards,

Mike

From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=04%7C01%7Cbret.barkelew%40microsoft.com%7C22b77b20547945404a0208d86fa707f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382110480473324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=K5uUtMvoqgI3j61HWeYnmLXre3%2FcTnfIoN0rnivaRVI%3D&reserved=0>
Sent: Tuesday, October 6, 2020 5:24 PM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.

A quick synopsis of the problem:

  *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
  *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
  *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
  *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.

     *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=04%7C01%7Cbret.barkelew%40microsoft.com%7C22b77b20547945404a0208d86fa707f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382110480483324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WZeCatbRVhpZZWx%2FuACtpUes00OjXWFEoQZsmSDXNAc%3D&reserved=0>

However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.

Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.

We’re suggesting something like the below:

//
// Task priority level
//
#define TPL_APPLICATION       4
#define TPL_CALLBACK_LOW      7
#define TPL_CALLBACK          8
#define TPL_CALLBACK_HIGH     9
#define TPL_NOTIFY_LOW        15
#define TPL_NOTIFY            16
#define TPL_NOTIFY_HIGH       17
#define TPL_HIGH_LEVEL        31

There’s already a long-in-the-tooth bug tracking a similar issue:
https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=04%7C01%7Cbret.barkelew%40microsoft.com%7C22b77b20547945404a0208d86fa707f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382110480483324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=746gH1KiXtZNbqmvfS3ZDfIa1m7q67dBfv%2F345F%2FH6c%3D&reserved=0>

This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).

Thoughts?

If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.

- Bret








[-- Attachment #1.2: Type: text/html, Size: 33990 bytes --]

[-- Attachment #2: 905ACECE328E4A09BC5A9D7A7DF04CDB.png --]
[-- Type: image/png, Size: 149 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-15 17:03                     ` Bret Barkelew
@ 2020-10-19 18:44                       ` Laszlo Ersek
  2020-10-19 18:48                         ` [EXTERNAL] " Bret Barkelew
  0 siblings, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2020-10-19 18:44 UTC (permalink / raw)
  To: devel, bret.barkelew, Kinney, Michael D, Andrew Fish

On 10/15/20 19:03, Bret Barkelew via groups.io wrote:
> I looked over it and I think that DPC solves the mirror image to the problem we’re trying to solve (and behaves in a fashion very similar to DelayedDispatch for PEI).
> 
> However, while studying it, I thought of a proposal that may be a balance between the complexity of Laszlo’s and the current “fixed” ordering.
> 
> What if we register (and encourage anyone who needs it to register) the MorLock callback on both EndOfDxe AND a new VarPolReadyToLock event. VarPolReadyToLock is signaled in the EndOfDxe callback which locks the VarPolicy. In the MorLock callback, we close both even registrations whenever the first one completes. In this pattern, we may have to register the MorLock as TPL_NOTIFY for the VarPolReadyToLock callback.

Right; unconditionally deferring the policy locking (rather than
conditionally, per my proposal) does the same job, and it's simpler to code.

You shouldn't even need to register MorLock for VarPolReadyToLock:
- At EndOfDxe, lock the MorLock variable and signal VarPolReadyToLock
(should be queued at TPL_CALLBACK),
- in the VarPolReadyToLock callback, only lock the policy.

AIUI, this is basically a special case / simplification of my proposal,
with the "delay the locking" protocol instance always *assumed* present
at EndOfDxe (and therefore never actually *produced* in the protocol
database).

Thanks
Laszlo


> 
> Or, we could formalize the registration with an official protocol interface and even pass a pointer to the VarPol interface in the callback when invoked.
> 
> Upon further review, I could also be open to Laszlo’s approach if the whole design could be reduced to a library for reuse. It’s not something I would want to be copying and pasting around the code.
> 
> - Bret
> 
> From: Bret Barkelew via groups.io<mailto:bret.barkelew=microsoft.com@groups.io>
> Sent: Tuesday, October 13, 2020 11:37 AM
> To: Kinney, Michael D<mailto:michael.d.kinney@intel.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>
> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> I’ll try to take a look at that today. Thanks!
> 
> - Bret
> 
> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Sent: Tuesday, October 13, 2020 11:22 AM
> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> Hi Bret,
> 
> The concept of making the platform lock point configurable seems like a good idea from a platform porting perspective.  If we had that feature would that make this better?
> 
> Have you studied the features DPC provides?  It is functionally similar to having NOTIFY-1 and CALLBACK-1 TPL levels.
> 
> https://github.com/tianocore/edk2/blob/master/NetworkPkg/DpcDxe/Dpc.c<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&data=04%7C01%7Cbret.barkelew%40microsoft.com%7C22b77b20547945404a0208d86fa707f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382110480463333%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=CZHMkcqc81%2BnNovzWaS3uiUiv5i7HVWUDt3dc88%2Fkbw%3D&reserved=0>
> 
> Thanks,
> 
> Mike
> 
> 
> From: Bret Barkelew <Bret.Barkelew@microsoft.com>
> Sent: Tuesday, October 13, 2020 10:57 AM
> To: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>
> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> The problem with the other solution is that it doesn’t give the platform any flexibility to move which event the lock is attached to. For EDK2 default, we’re making it EndOfDxe, because that matches most platforms’ threat models. For Mu, we use ReadyToBoot.
> 
> The current “hack” makes that more difficult without multiple patches that we have to carry to remove the “fixed” ordering of the events.
> 
> That said, I do think I’ll clean that up and submit it, unless Sean reads this and stops me. 😉
> 
> - Bret
> 
> From: Michael D Kinney via groups.io<mailto:michael.d.kinney=intel.com@groups.io>
> Sent: Tuesday, October 13, 2020 8:05 AM
> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> Hi Bret,
> 
> If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.
> 
> I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.
> 
> I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.
> 
> The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.
> 
> Mike
> 
> From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
> Sent: Monday, October 12, 2020 3:14 PM
> To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.
> 
> It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).
> 
> - Bret
> 
> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Sent: Monday, October 12, 2020 9:44 AM
> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> Bret,
> 
> How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.
> 
> Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?
> 
> Mike
> 
> From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
> Sent: Wednesday, October 7, 2020 9:39 AM
> To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.
> 
> - Bret
> 
> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Sent: Wednesday, October 7, 2020 9:21 AM
> To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> Hi Andrew,
> 
> I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.
> 
> Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?
> 
> Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.
> 
> Mike
> 
> From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
> Sent: Tuesday, October 6, 2020 7:18 PM
> To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
> Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> Mike,
> 
> When I’ve done things like this in the past I think of making them configurable like via a PCD.
> 
> In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.
> 
> Thanks,
> 
> Andrew Fish
> 
> On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:
> 
> Bret,
> 
> It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.
> 
> TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.
> 
> I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.
> 
> Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.
> 
> Best regards,
> 
> Mike
> 
> From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&data=04%7C01%7Cbret.barkelew%40microsoft.com%7C22b77b20547945404a0208d86fa707f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382110480473324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=K5uUtMvoqgI3j61HWeYnmLXre3%2FcTnfIoN0rnivaRVI%3D&reserved=0>
> Sent: Tuesday, October 6, 2020 5:24 PM
> To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
> Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.
> 
> A quick synopsis of the problem:
> 
>   *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
>   *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
>   *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
>   *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.
> 
>      *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&data=04%7C01%7Cbret.barkelew%40microsoft.com%7C22b77b20547945404a0208d86fa707f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382110480483324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WZeCatbRVhpZZWx%2FuACtpUes00OjXWFEoQZsmSDXNAc%3D&reserved=0>
> 
> However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.
> 
> Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.
> 
> We’re suggesting something like the below:
> 
> //
> // Task priority level
> //
> #define TPL_APPLICATION       4
> #define TPL_CALLBACK_LOW      7
> #define TPL_CALLBACK          8
> #define TPL_CALLBACK_HIGH     9
> #define TPL_NOTIFY_LOW        15
> #define TPL_NOTIFY            16
> #define TPL_NOTIFY_HIGH       17
> #define TPL_HIGH_LEVEL        31
> 
> There’s already a long-in-the-tooth bug tracking a similar issue:
> https://bugzilla.tianocore.org/show_bug.cgi?id=1676<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&data=04%7C01%7Cbret.barkelew%40microsoft.com%7C22b77b20547945404a0208d86fa707f6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637382110480483324%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=746gH1KiXtZNbqmvfS3ZDfIa1m7q67dBfv%2F345F%2FH6c%3D&reserved=0>
> 
> This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).
> 
> Thoughts?
> 
> If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.
> 
> - Bret
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-19 18:44                       ` Laszlo Ersek
@ 2020-10-19 18:48                         ` Bret Barkelew
  2020-10-20  9:23                           ` Laszlo Ersek
  0 siblings, 1 reply; 19+ messages in thread
From: Bret Barkelew @ 2020-10-19 18:48 UTC (permalink / raw)
  To: Laszlo Ersek, devel@edk2.groups.io, Kinney, Michael D,
	Andrew Fish

[-- Attachment #1: Type: text/plain, Size: 20871 bytes --]

Oh, sorry. Got ahead of myself. Can’t do that. We don’t know that MorLock is the ONLY one. We’d have to have everyone wait for the ReadyToLock signal rather than EndOfDxe. Then VarPol could finally lock after all the ReadyToLock callbacks are processed. If that processing can’t happen immediately (if it has to be requeued) then we’d also need an event in VarPol itself that queues a NEW event called ReallyReadyToLockNoForRealThisTime that finally locks VarPol.

It’s the same number of steps in a different order, and I dislike it just as much. Do you think your version could be packed into a library? How do you feel about parameter currying?

- Bret

From: Laszlo Ersek<mailto:lersek@redhat.com>
Sent: Monday, October 19, 2020 11:44 AM
To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>; Andrew Fish<mailto:afish@apple.com>
Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

On 10/15/20 19:03, Bret Barkelew via groups.io wrote:
> I looked over it and I think that DPC solves the mirror image to the problem we’re trying to solve (and behaves in a fashion very similar to DelayedDispatch for PEI).
>
> However, while studying it, I thought of a proposal that may be a balance between the complexity of Laszlo’s and the current “fixed” ordering.
>
> What if we register (and encourage anyone who needs it to register) the MorLock callback on both EndOfDxe AND a new VarPolReadyToLock event. VarPolReadyToLock is signaled in the EndOfDxe callback which locks the VarPolicy. In the MorLock callback, we close both even registrations whenever the first one completes. In this pattern, we may have to register the MorLock as TPL_NOTIFY for the VarPolReadyToLock callback.

Right; unconditionally deferring the policy locking (rather than
conditionally, per my proposal) does the same job, and it's simpler to code.

You shouldn't even need to register MorLock for VarPolReadyToLock:
- At EndOfDxe, lock the MorLock variable and signal VarPolReadyToLock
(should be queued at TPL_CALLBACK),
- in the VarPolReadyToLock callback, only lock the policy.

AIUI, this is basically a special case / simplification of my proposal,
with the "delay the locking" protocol instance always *assumed* present
at EndOfDxe (and therefore never actually *produced* in the protocol
database).

Thanks
Laszlo


>
> Or, we could formalize the registration with an official protocol interface and even pass a pointer to the VarPol interface in the callback when invoked.
>
> Upon further review, I could also be open to Laszlo’s approach if the whole design could be reduced to a library for reuse. It’s not something I would want to be copying and pasting around the code.
>
> - Bret
>
> From: Bret Barkelew via groups.io<mailto:bret.barkelew=microsoft.com@groups.io>
> Sent: Tuesday, October 13, 2020 11:37 AM
> To: Kinney, Michael D<mailto:michael.d.kinney@intel.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>
> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> I’ll try to take a look at that today. Thanks!
>
> - Bret
>
> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Sent: Tuesday, October 13, 2020 11:22 AM
> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> Hi Bret,
>
> The concept of making the platform lock point configurable seems like a good idea from a platform porting perspective.  If we had that feature would that make this better?
>
> Have you studied the features DPC provides?  It is functionally similar to having NOTIFY-1 and CALLBACK-1 TPL levels.
>
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Ae1U%2Fl6c8ZWpKxJokKeHVmjTZ3KgZqktJ2Kmr9gZn8%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Ae1U%2Fl6c8ZWpKxJokKeHVmjTZ3KgZqktJ2Kmr9gZn8%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Ae1U%2Fl6c8ZWpKxJokKeHVmjTZ3KgZqktJ2Kmr9gZn8%3D&amp;reserved=0%3chttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Ae1U%2Fl6c8ZWpKxJokKeHVmjTZ3KgZqktJ2Kmr9gZn8%3D&amp;reserved=0>>
>
> Thanks,
>
> Mike
>
>
> From: Bret Barkelew <Bret.Barkelew@microsoft.com>
> Sent: Tuesday, October 13, 2020 10:57 AM
> To: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>
> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> The problem with the other solution is that it doesn’t give the platform any flexibility to move which event the lock is attached to. For EDK2 default, we’re making it EndOfDxe, because that matches most platforms’ threat models. For Mu, we use ReadyToBoot.
>
> The current “hack” makes that more difficult without multiple patches that we have to carry to remove the “fixed” ordering of the events.
>
> That said, I do think I’ll clean that up and submit it, unless Sean reads this and stops me. 😉
>
> - Bret
>
> From: Michael D Kinney via groups.io<mailto:michael.d.kinney=intel.com@groups.io>
> Sent: Tuesday, October 13, 2020 8:05 AM
> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> Hi Bret,
>
> If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.
>
> I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.
>
> I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.
>
> The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.
>
> Mike
>
> From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
> Sent: Monday, October 12, 2020 3:14 PM
> To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.
>
> It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).
>
> - Bret
>
> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Sent: Monday, October 12, 2020 9:44 AM
> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> Bret,
>
> How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.
>
> Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?
>
> Mike
>
> From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
> Sent: Wednesday, October 7, 2020 9:39 AM
> To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.
>
> - Bret
>
> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Sent: Wednesday, October 7, 2020 9:21 AM
> To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
> Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> Hi Andrew,
>
> I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.
>
> Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?
>
> Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.
>
> Mike
>
> From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
> Sent: Tuesday, October 6, 2020 7:18 PM
> To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
> Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> Mike,
>
> When I’ve done things like this in the past I think of making them configurable like via a PCD.
>
> In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.
>
> Thanks,
>
> Andrew Fish
>
> On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:
>
> Bret,
>
> It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.
>
> TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.
>
> I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.
>
> Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.
>
> Best regards,
>
> Mike
>
> From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=muAasLFiX0ZAFY0%2B9VYE3t8IRXS9PZlN90onw4PmNto%3D&amp;reserved=0>
> Sent: Tuesday, October 6, 2020 5:24 PM
> To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
> Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.
>
> A quick synopsis of the problem:
>
>   *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
>   *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
>   *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
>   *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.
>
>      *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wX6ujRguD%2FnEy0ekqyYcnGTmJ5OCnsXbPbLZqcW1NF8%3D&amp;reserved=0>
>
> However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.
>
> Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.
>
> We’re suggesting something like the below:
>
> //
> // Task priority level
> //
> #define TPL_APPLICATION       4
> #define TPL_CALLBACK_LOW      7
> #define TPL_CALLBACK          8
> #define TPL_CALLBACK_HIGH     9
> #define TPL_NOTIFY_LOW        15
> #define TPL_NOTIFY            16
> #define TPL_NOTIFY_HIGH       17
> #define TPL_HIGH_LEVEL        31
>
> There’s already a long-in-the-tooth bug tracking a similar issue:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7sdZtpL8l2MQjUBJhiZLj8YhiT%2Fgay%2FMF%2Fuw%2BHIQmSA%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7sdZtpL8l2MQjUBJhiZLj8YhiT%2Fgay%2FMF%2Fuw%2BHIQmSA%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7sdZtpL8l2MQjUBJhiZLj8YhiT%2Fgay%2FMF%2Fuw%2BHIQmSA%3D&amp;reserved=0%3chttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7sdZtpL8l2MQjUBJhiZLj8YhiT%2Fgay%2FMF%2Fuw%2BHIQmSA%3D&amp;reserved=0>>
>
> This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).
>
> Thoughts?
>
> If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.
>
> - Bret
>
>
>
>
>
>
>
>
>
> 
>
>


[-- Attachment #2: Type: text/html, Size: 26854 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-19 18:48                         ` [EXTERNAL] " Bret Barkelew
@ 2020-10-20  9:23                           ` Laszlo Ersek
  2020-10-22 21:54                             ` Bret Barkelew
  0 siblings, 1 reply; 19+ messages in thread
From: Laszlo Ersek @ 2020-10-20  9:23 UTC (permalink / raw)
  To: Bret Barkelew, devel@edk2.groups.io, Kinney, Michael D,
	Andrew Fish

On 10/19/20 20:48, Bret Barkelew wrote:
> Oh, sorry. Got ahead of myself. Can’t do that. We don’t know that MorLock is the ONLY one. We’d have to have everyone wait for the ReadyToLock signal rather than EndOfDxe. Then VarPol could finally lock after all the ReadyToLock callbacks are processed. If that processing can’t happen immediately (if it has to be requeued) then we’d also need an event in VarPol itself that queues a NEW event called ReallyReadyToLockNoForRealThisTime that finally locks VarPol.
> 
> It’s the same number of steps in a different order, and I dislike it just as much. Do you think your version could be packed into a library? How do you feel about parameter currying?

Because the MorLock variable's locking works only by chance even in
current master, I would propose to set the VariablePolicy feature
briefly aside. Rework the current MorLock use case with the new protocol
/ event / signaling (with open-coded boot service calls at first).
Extract that into some new library functions. (I think DxeServicesLib
would be a good candidate. EndOfDxe is a PI concept, so UefiLib is not
the right choice; plus the one DxeServicesLib instance we have already
consumes UefiBootServicesTableLib, which is all we'd depen on.) Finally,
rebase the VariablePolicy feature on top of the new APIs in DxeServicesLib.

(

The reason I suggest open-coding the boot service calls at first, and
then extracting them into a library (possibly even in direct successor
patches), is two-fold:

- New APIs mean new abstractions; boot services are old (well-known)
abstractions. Extending existent variable driver code with calls to the
latter (at first) is easier to review -- no new concepts needed.

- Future-proof API design requires either seeing the future, or immense
experience with interfaces that have *failed* in the past. I don't excel
at either of those, so I'm a big fan of bottom-up library design --
first code up what you actually need, and then *maybe* attempt to
abstract it. (I prefer to delay the generalization / extraction until
the 3rd use case.)

)

Anyway, from the coding / design perspective, I'd be OK to work on the
above in the form of a patch series (except rebasing VariablePolicy on
top). My problem is that contributions to MdePkg, MdeModulePkg etc have
always taken *forever* -- and those memories of mine date back to an era
when maintainers would still provide feedback in a *somewhat* timely
manner on the list. I'm very much not looking forward to a review
experience that you too have gone through with the VariablePolicy series.

(In some cases in the past, when a utility library appeared useful in
OvmfPkg, but was really generic enough for addition to e.g. MdeModulePkg
(e.g. because it would apply to physical platforms just the same), we
*still* ended up adding them to OvmfPkg because MdeModulePkg owners
didn't want to spend the effort of reviewing the code, and/or they
didn't want the increased maintenance burden. And since then, maintainer
feedback has only become less responsive.)

Regarding currying -- not sure what you mean by that in this context,
but AIUI, it's generally available in C, with structures that contain
function pointers and context pointers ("parameter blocks").

Thanks
Laszlo


> 
> - Bret
> 
> From: Laszlo Ersek<mailto:lersek@redhat.com>
> Sent: Monday, October 19, 2020 11:44 AM
> To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>; Andrew Fish<mailto:afish@apple.com>
> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
> 
> On 10/15/20 19:03, Bret Barkelew via groups.io wrote:
>> I looked over it and I think that DPC solves the mirror image to the problem we’re trying to solve (and behaves in a fashion very similar to DelayedDispatch for PEI).
>>
>> However, while studying it, I thought of a proposal that may be a balance between the complexity of Laszlo’s and the current “fixed” ordering.
>>
>> What if we register (and encourage anyone who needs it to register) the MorLock callback on both EndOfDxe AND a new VarPolReadyToLock event. VarPolReadyToLock is signaled in the EndOfDxe callback which locks the VarPolicy. In the MorLock callback, we close both even registrations whenever the first one completes. In this pattern, we may have to register the MorLock as TPL_NOTIFY for the VarPolReadyToLock callback.
> 
> Right; unconditionally deferring the policy locking (rather than
> conditionally, per my proposal) does the same job, and it's simpler to code.
> 
> You shouldn't even need to register MorLock for VarPolReadyToLock:
> - At EndOfDxe, lock the MorLock variable and signal VarPolReadyToLock
> (should be queued at TPL_CALLBACK),
> - in the VarPolReadyToLock callback, only lock the policy.
> 
> AIUI, this is basically a special case / simplification of my proposal,
> with the "delay the locking" protocol instance always *assumed* present
> at EndOfDxe (and therefore never actually *produced* in the protocol
> database).
> 
> Thanks
> Laszlo
> 
> 
>>
>> Or, we could formalize the registration with an official protocol interface and even pass a pointer to the VarPol interface in the callback when invoked.
>>
>> Upon further review, I could also be open to Laszlo’s approach if the whole design could be reduced to a library for reuse. It’s not something I would want to be copying and pasting around the code.
>>
>> - Bret
>>
>> From: Bret Barkelew via groups.io<mailto:bret.barkelew=microsoft.com@groups.io>
>> Sent: Tuesday, October 13, 2020 11:37 AM
>> To: Kinney, Michael D<mailto:michael.d.kinney@intel.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>
>> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> I’ll try to take a look at that today. Thanks!
>>
>> - Bret
>>
>> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Sent: Tuesday, October 13, 2020 11:22 AM
>> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Hi Bret,
>>
>> The concept of making the platform lock point configurable seems like a good idea from a platform porting perspective.  If we had that feature would that make this better?
>>
>> Have you studied the features DPC provides?  It is functionally similar to having NOTIFY-1 and CALLBACK-1 TPL levels.
>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Ae1U%2Fl6c8ZWpKxJokKeHVmjTZ3KgZqktJ2Kmr9gZn8%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Ae1U%2Fl6c8ZWpKxJokKeHVmjTZ3KgZqktJ2Kmr9gZn8%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Ae1U%2Fl6c8ZWpKxJokKeHVmjTZ3KgZqktJ2Kmr9gZn8%3D&amp;reserved=0%3chttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Ae1U%2Fl6c8ZWpKxJokKeHVmjTZ3KgZqktJ2Kmr9gZn8%3D&amp;reserved=0>>
>>
>> Thanks,
>>
>> Mike
>>
>>
>> From: Bret Barkelew <Bret.Barkelew@microsoft.com>
>> Sent: Tuesday, October 13, 2020 10:57 AM
>> To: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>
>> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> The problem with the other solution is that it doesn’t give the platform any flexibility to move which event the lock is attached to. For EDK2 default, we’re making it EndOfDxe, because that matches most platforms’ threat models. For Mu, we use ReadyToBoot.
>>
>> The current “hack” makes that more difficult without multiple patches that we have to carry to remove the “fixed” ordering of the events.
>>
>> That said, I do think I’ll clean that up and submit it, unless Sean reads this and stops me. 😉
>>
>> - Bret
>>
>> From: Michael D Kinney via groups.io<mailto:michael.d.kinney=intel.com@groups.io>
>> Sent: Tuesday, October 13, 2020 8:05 AM
>> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Hi Bret,
>>
>> If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.
>>
>> I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.
>>
>> I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.
>>
>> The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.
>>
>> Mike
>>
>> From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
>> Sent: Monday, October 12, 2020 3:14 PM
>> To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
>> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.
>>
>> It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).
>>
>> - Bret
>>
>> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Sent: Monday, October 12, 2020 9:44 AM
>> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Bret,
>>
>> How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.
>>
>> Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?
>>
>> Mike
>>
>> From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
>> Sent: Wednesday, October 7, 2020 9:39 AM
>> To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
>> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.
>>
>> - Bret
>>
>> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Sent: Wednesday, October 7, 2020 9:21 AM
>> To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
>> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Hi Andrew,
>>
>> I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.
>>
>> Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?
>>
>> Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.
>>
>> Mike
>>
>> From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
>> Sent: Tuesday, October 6, 2020 7:18 PM
>> To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
>> Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
>> Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>> Mike,
>>
>> When I’ve done things like this in the past I think of making them configurable like via a PCD.
>>
>> In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.
>>
>> Thanks,
>>
>> Andrew Fish
>>
>> On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:
>>
>> Bret,
>>
>> It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.
>>
>> TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.
>>
>> I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.
>>
>> Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.
>>
>> Best regards,
>>
>> Mike
>>
>> From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833816832%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=muAasLFiX0ZAFY0%2B9VYE3t8IRXS9PZlN90onw4PmNto%3D&amp;reserved=0>
>> Sent: Tuesday, October 6, 2020 5:24 PM
>> To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
>> Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.
>>
>> A quick synopsis of the problem:
>>
>>   *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
>>   *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
>>   *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
>>   *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.
>>
>>      *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wX6ujRguD%2FnEy0ekqyYcnGTmJ5OCnsXbPbLZqcW1NF8%3D&amp;reserved=0>
>>
>> However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.
>>
>> Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.
>>
>> We’re suggesting something like the below:
>>
>> //
>> // Task priority level
>> //
>> #define TPL_APPLICATION       4
>> #define TPL_CALLBACK_LOW      7
>> #define TPL_CALLBACK          8
>> #define TPL_CALLBACK_HIGH     9
>> #define TPL_NOTIFY_LOW        15
>> #define TPL_NOTIFY            16
>> #define TPL_NOTIFY_HIGH       17
>> #define TPL_HIGH_LEVEL        31
>>
>> There’s already a long-in-the-tooth bug tracking a similar issue:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7sdZtpL8l2MQjUBJhiZLj8YhiT%2Fgay%2FMF%2Fuw%2BHIQmSA%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7sdZtpL8l2MQjUBJhiZLj8YhiT%2Fgay%2FMF%2Fuw%2BHIQmSA%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7sdZtpL8l2MQjUBJhiZLj8YhiT%2Fgay%2FMF%2Fuw%2BHIQmSA%3D&amp;reserved=0%3chttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C485b54216f8b459f2e1908d8745f09ba%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387298833826824%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=7sdZtpL8l2MQjUBJhiZLj8YhiT%2Fgay%2FMF%2Fuw%2BHIQmSA%3D&amp;reserved=0>>
>>
>> This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).
>>
>> Thoughts?
>>
>> If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.
>>
>> - Bret
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 
>>
>>
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-20  9:23                           ` Laszlo Ersek
@ 2020-10-22 21:54                             ` Bret Barkelew
  2020-10-27 14:04                               ` Laszlo Ersek
  0 siblings, 1 reply; 19+ messages in thread
From: Bret Barkelew @ 2020-10-22 21:54 UTC (permalink / raw)
  To: Laszlo Ersek, devel@edk2.groups.io, Kinney, Michael D,
	Andrew Fish

[-- Attachment #1: Type: text/plain, Size: 26667 bytes --]

We discussed this in our team this week. For the time being, I think the solution we’d be interested in pursuing (since the extra TPLs are a non-starter) would be to divide the EndOfDxe event into two events: EndOfDxeForPeopleWhoWantToAccessAllDrivers and EndOfDxeForPeopleWhoWantToProtectThingsBeforeBdsAndOpRoms. This would clarify the two groups that I mentioned earlier: resource manipulators and resource protectors.

Is there any appetite for making this change? If so, I can write up a patch for it and submit it as part of this series (or separately). If not, I think I’ll drop this entire discussion and stick with the status quo – leaving the general solution for another day.

Thoughts?

- Bret

From: Laszlo Ersek<mailto:lersek@redhat.com>
Sent: Tuesday, October 20, 2020 2:23 AM
To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>; Andrew Fish<mailto:afish@apple.com>
Subject: Re: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering

On 10/19/20 20:48, Bret Barkelew wrote:
> Oh, sorry. Got ahead of myself. Can’t do that. We don’t know that MorLock is the ONLY one. We’d have to have everyone wait for the ReadyToLock signal rather than EndOfDxe. Then VarPol could finally lock after all the ReadyToLock callbacks are processed. If that processing can’t happen immediately (if it has to be requeued) then we’d also need an event in VarPol itself that queues a NEW event called ReallyReadyToLockNoForRealThisTime that finally locks VarPol.
>
> It’s the same number of steps in a different order, and I dislike it just as much. Do you think your version could be packed into a library? How do you feel about parameter currying?

Because the MorLock variable's locking works only by chance even in
current master, I would propose to set the VariablePolicy feature
briefly aside. Rework the current MorLock use case with the new protocol
/ event / signaling (with open-coded boot service calls at first).
Extract that into some new library functions. (I think DxeServicesLib
would be a good candidate. EndOfDxe is a PI concept, so UefiLib is not
the right choice; plus the one DxeServicesLib instance we have already
consumes UefiBootServicesTableLib, which is all we'd depen on.) Finally,
rebase the VariablePolicy feature on top of the new APIs in DxeServicesLib.

(

The reason I suggest open-coding the boot service calls at first, and
then extracting them into a library (possibly even in direct successor
patches), is two-fold:

- New APIs mean new abstractions; boot services are old (well-known)
abstractions. Extending existent variable driver code with calls to the
latter (at first) is easier to review -- no new concepts needed.

- Future-proof API design requires either seeing the future, or immense
experience with interfaces that have *failed* in the past. I don't excel
at either of those, so I'm a big fan of bottom-up library design --
first code up what you actually need, and then *maybe* attempt to
abstract it. (I prefer to delay the generalization / extraction until
the 3rd use case.)

)

Anyway, from the coding / design perspective, I'd be OK to work on the
above in the form of a patch series (except rebasing VariablePolicy on
top). My problem is that contributions to MdePkg, MdeModulePkg etc have
always taken *forever* -- and those memories of mine date back to an era
when maintainers would still provide feedback in a *somewhat* timely
manner on the list. I'm very much not looking forward to a review
experience that you too have gone through with the VariablePolicy series.

(In some cases in the past, when a utility library appeared useful in
OvmfPkg, but was really generic enough for addition to e.g. MdeModulePkg
(e.g. because it would apply to physical platforms just the same), we
*still* ended up adding them to OvmfPkg because MdeModulePkg owners
didn't want to spend the effort of reviewing the code, and/or they
didn't want the increased maintenance burden. And since then, maintainer
feedback has only become less responsive.)

Regarding currying -- not sure what you mean by that in this context,
but AIUI, it's generally available in C, with structures that contain
function pointers and context pointers ("parameter blocks").

Thanks
Laszlo


>
> - Bret
>
> From: Laszlo Ersek<mailto:lersek@redhat.com>
> Sent: Monday, October 19, 2020 11:44 AM
> To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>; Andrew Fish<mailto:afish@apple.com>
> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>
> On 10/15/20 19:03, Bret Barkelew via groups.io wrote:
>> I looked over it and I think that DPC solves the mirror image to the problem we’re trying to solve (and behaves in a fashion very similar to DelayedDispatch for PEI).
>>
>> However, while studying it, I thought of a proposal that may be a balance between the complexity of Laszlo’s and the current “fixed” ordering.
>>
>> What if we register (and encourage anyone who needs it to register) the MorLock callback on both EndOfDxe AND a new VarPolReadyToLock event. VarPolReadyToLock is signaled in the EndOfDxe callback which locks the VarPolicy. In the MorLock callback, we close both even registrations whenever the first one completes. In this pattern, we may have to register the MorLock as TPL_NOTIFY for the VarPolReadyToLock callback.
>
> Right; unconditionally deferring the policy locking (rather than
> conditionally, per my proposal) does the same job, and it's simpler to code.
>
> You shouldn't even need to register MorLock for VarPolReadyToLock:
> - At EndOfDxe, lock the MorLock variable and signal VarPolReadyToLock
> (should be queued at TPL_CALLBACK),
> - in the VarPolReadyToLock callback, only lock the policy.
>
> AIUI, this is basically a special case / simplification of my proposal,
> with the "delay the locking" protocol instance always *assumed* present
> at EndOfDxe (and therefore never actually *produced* in the protocol
> database).
>
> Thanks
> Laszlo
>
>
>>
>> Or, we could formalize the registration with an official protocol interface and even pass a pointer to the VarPol interface in the callback when invoked.
>>
>> Upon further review, I could also be open to Laszlo’s approach if the whole design could be reduced to a library for reuse. It’s not something I would want to be copying and pasting around the code.
>>
>> - Bret
>>
>> From: Bret Barkelew via groups.io<mailto:bret.barkelew=microsoft.com@groups.io>
>> Sent: Tuesday, October 13, 2020 11:37 AM
>> To: Kinney, Michael D<mailto:michael.d.kinney@intel.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>
>> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> I’ll try to take a look at that today. Thanks!
>>
>> - Bret
>>
>> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Sent: Tuesday, October 13, 2020 11:22 AM
>> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>; Andrew Fish<mailto:afish@apple.com>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Hi Bret,
>>
>> The concept of making the platform lock point configurable seems like a good idea from a platform porting perspective.  If we had that feature would that make this better?
>>
>> Have you studied the features DPC provides?  It is functionally similar to having NOTIFY-1 and CALLBACK-1 TPL levels.
>>
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978640309%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=LpfNwdqTvCaGSYemlKYsmuKpkIS0S%2Fwo86bV6KbB0F4%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978650310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WICrMv0O5RkGaWmei7Y507iHpyYEE0JtClehrDSnWXQ%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978650310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WICrMv0O5RkGaWmei7Y507iHpyYEE0JtClehrDSnWXQ%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978640309%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=LpfNwdqTvCaGSYemlKYsmuKpkIS0S%2Fwo86bV6KbB0F4%3D&amp;reserved=0%3chttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978650310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WICrMv0O5RkGaWmei7Y507iHpyYEE0JtClehrDSnWXQ%3D&amp;reserved=0%3chttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftianocore%2Fedk2%2Fblob%2Fmaster%2FNetworkPkg%2FDpcDxe%2FDpc.c&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978650310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WICrMv0O5RkGaWmei7Y507iHpyYEE0JtClehrDSnWXQ%3D&amp;reserved=0>>>
>>
>> Thanks,
>>
>> Mike
>>
>>
>> From: Bret Barkelew <Bret.Barkelew@microsoft.com>
>> Sent: Tuesday, October 13, 2020 10:57 AM
>> To: devel@edk2.groups.io; Kinney, Michael D <michael.d.kinney@intel.com>; Andrew Fish <afish@apple.com>
>> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> The problem with the other solution is that it doesn’t give the platform any flexibility to move which event the lock is attached to. For EDK2 default, we’re making it EndOfDxe, because that matches most platforms’ threat models. For Mu, we use ReadyToBoot.
>>
>> The current “hack” makes that more difficult without multiple patches that we have to carry to remove the “fixed” ordering of the events.
>>
>> That said, I do think I’ll clean that up and submit it, unless Sean reads this and stops me. 😉
>>
>> - Bret
>>
>> From: Michael D Kinney via groups.io<mailto:michael.d.kinney=intel.com@groups.io>
>> Sent: Tuesday, October 13, 2020 8:05 AM
>> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Subject: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Hi Bret,
>>
>> If the other solution more directly expresses the dependency and guarantees ordering, then it may not be considered a hack.
>>
>> I prefer solutions that minimize the chances of failures and the need for platform developers to do extra work to integrate a feature or do extra work when they add an unrelated feature.
>>
>> I see Laszlo’s proposal to add a new tag GUID protocol to detect and enforce ordering for this specific use case.
>>
>> The network stack added a DPC (Deferred Procedure Call) Protocol to help with ordering issues and remove the introduction of extra TPL levels in an earlier version of the network stack.  I am wondering of there is a more generic solution to this specific problem through the use of DPC.
>>
>> Mike
>>
>> From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
>> Sent: Monday, October 12, 2020 3:14 PM
>> To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
>> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Like I said, I’m also happy to go with the lesser solution of replacing the hack that was already in the code. The last person didn’t care to solve this problem, and I’m good to not solve it, too. I mean, I think it’s turtles all the way down no matter what.
>>
>> It was actually the ASSERT in the code that highlighted this problem to being with, so I would say that it’s doing its job. It’s incumbent upon the code author to determine what resource they’re trying to access and whether they’ve accessed it successfully, and I agree that it seems like an appropriate use of ASSERTs so long as it’s backed up with some OTHER appropriate action (even if that action is ignoring it).
>>
>> - Bret
>>
>> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Sent: Monday, October 12, 2020 9:44 AM
>> To: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>; Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Bret,
>>
>> How to platform creators know for the complete set of drivers if there is anything then need to worry about and why and what they need to address the concern?  This is about order that events are signaled for a given event trigger.  When a platform adds more driver that may use the same event triggers, how do they know if there is a potential for a race condition or not?  If event notification functions are design to be independent of signaling order, then there is no issue.  As soon as there are requirements for event notification functions to be executed in a specific order at a specific event trigger, we have to make sure the platform creator knows and preferably, the FW can tell them if they got it wrong.
>>
>> Can your data/device manipulators and data/device protectors use case generate an ASSERT() if they are signaled in the wrong order?
>>
>> Mike
>>
>> From: Bret Barkelew <Bret.Barkelew@microsoft.com<mailto:Bret.Barkelew@microsoft.com>>
>> Sent: Wednesday, October 7, 2020 9:39 AM
>> To: Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>>; edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>
>> Subject: RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Agreed with your concern, Mike. This mechanism (and we can document it as such) should NOT be used to accomplish an explicit ordering (a la the “apriori list”). It’s just to provide a little separation for two patterns that we’ve seen time and again in our code: data/device manipulators and data/device protectors. It does not eliminate the necessity for platform creators to understand things like driver ordering if they have one driver that requires a protocol be installed or a bus connected.
>>
>> - Bret
>>
>> From: Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Sent: Wednesday, October 7, 2020 9:21 AM
>> To: Andrew Fish<mailto:afish@apple.com>; edk2-devel-groups-io<mailto:devel@edk2.groups.io>; Kinney, Michael D<mailto:michael.d.kinney@intel.com>
>> Cc: Bret Barkelew<mailto:Bret.Barkelew@microsoft.com>
>> Subject: [EXTERNAL] RE: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> Hi Andrew,
>>
>> I agree DXE drivers could use a PCD to make it configurable and prevent collisions with UEFI defined TPL levels.
>>
>> Bret’s suggestion of adding a DXE scoped services to create events using non-UEFI defined TPL levels could be used with these TPL levels from PCDs.  Would also allow DXE drivers to use TPL levels associated with the firmware interrupts in the range 17..30.  Perhaps extensions to the DXE Services Table?
>>
>> Still does not address my concern that many DXE drivers using these extra TPL levels may run into race conditions if more than one DXE driver selects the same TPL level.  Platform integrators will need to understand the relative priorities of all DXE drivers that use extra TPL levels so they can assign values that both avoid collisions with future UEFI specs and prevent race conditions among DXE drivers.
>>
>> Mike
>>
>> From: Andrew Fish <afish@apple.com<mailto:afish@apple.com>>
>> Sent: Tuesday, October 6, 2020 7:18 PM
>> To: edk2-devel-groups-io <devel@edk2.groups.io<mailto:devel@edk2.groups.io>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
>> Cc: bret.barkelew@microsoft.com<mailto:bret.barkelew@microsoft.com>
>> Subject: Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>> Mike,
>>
>> When I’ve done things like this in the past I think of making them configurable like via a PCD.
>>
>> In terms of the #defines I think it makes more sense to just do math on the spec defined values. So TPL_CALLBACK + 1 or TPL_CALLBACK - 1 etc.  I’ve got an lldb type formatter for TPL and it prints out <UEFI Spec TPL> [+ <extra>] as I think this is the clearest way to do it.
>>
>> Thanks,
>>
>> Andrew Fish
>>
>> On Oct 6, 2020, at 6:54 PM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote:
>>
>> Bret,
>>
>> It is likely best to go with the first approach.  The discussion on TPL levels can continue and you could adopt it in the future if a general solution is identified.
>>
>> TPL 17..30 are reserved by the UEFI Spec for firmware interrupts.  So TPL_NOTIFY_HIGH as defined would not be allowed.
>>
>> I agree that the use of TPL values other than those defined by the UEFI Spec can potentially be used by DXE.  However, that DXE usage must be flexible enough to handle a future extension to the UEFI Spec for new TPL levels without a collision.
>>
>> Instead of defining specific TPL values, you could add a DXE scoped service to allocate the use of a new TPL level that is not being used by UEFI or other DXE drivers.  I will point out that these approaches (defining new TPL levels or allocating unused TPL levels) just moves the same problem.  You can solve it for the first driver that needs something special.  As soon as there is more than one driver that need something special at the same TPL level, the potential for a race condition for ordering will show up again.  So I do not consider adding TPL levels to be a good general solution to this problem.
>>
>> Best regards,
>>
>> Mike
>>
>> From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Bret Barkelew via groups.io<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgroups.io%2F&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978650310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jVwYhGnMhEUyInKtp2ENeRgWVyNAxvHZFsmaflf5HzU%3D&amp;reserved=0>
>> Sent: Tuesday, October 6, 2020 5:24 PM
>> To: devel@edk2.groups.io<mailto:devel@edk2.groups.io>
>> Subject: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
>>
>> As many will be aware, I’m in the final stages of having Variable Policy ready for commit. However, after moving the “Lock” event back to EndOfDxe, this exposed a race condition in variable services.
>>
>> A quick synopsis of the problem:
>>
>>   *   Previously, MorLock abused a privileged position by being tightly coupled to Variable Services, and its lock callback was called directly so that it could be strongly ordered with the internal property lock that disables future RequestToLock calls.
>>   *   VariablePolicy attempted to decouple this (without realizing it was a problem) and go through formalized interfaces that could ultimately be broken out of Variable Services altogether.
>>   *   However, doing so triggered the race condition, causing an ASSERT when MorLock attempted to lock its variables.
>>   *   I current have a reimplementation of the strong ordering workaround that can be previewed at the link below. I have tested that it works the same as the OLD system.
>>
>>      *   Take a stab at solving the lock ordering problem · corthon/edk2@e7d0164 (github.com)<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fcorthon%2Fedk2%2Fcommit%2Fe7d0164c8263b1fbfb8b4e289851fbedaa8997f1&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978650310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Zr7esiwrzlHtbtI5ngQWgKEG742x4%2FkSmTz1zAtouMo%3D&amp;reserved=0>
>>
>> However, replacing one bad design with another is not what this community is about (when we can help it), so we’d like to take a moment to revisit a conversation that has come up before: expanding the number of supported TPL levels.
>>
>> Now, I know that the current TPL levels are defined in the UEFI spec and we don’t want to have to change those, but there’s no reason that we can come up with not to add some more granularity in the PI spec, dedicated to platform and implementation ordering (the UEFI spec events will have to remain on UEFI spec TPLs). Basically there would be a set of DXE Services that allow WaitForEvent, CheckEvent, Event registration at TPLs other than notify/callback.  The UEFI system table versions of the functions would still have this restriction but code built with the platform could use the DXE Services. Right now, any attempt to use a non-UEFI TPL will ASSERT, so we can keep that in place on the SystemTable interface, but allow the platform to go around it with DXE Services. Similar functionality would have to be provided by the Mmst, but that’s already platform-specific and can probably allow it in all cases.
>>
>> We’re suggesting something like the below:
>>
>> //
>> // Task priority level
>> //
>> #define TPL_APPLICATION       4
>> #define TPL_CALLBACK_LOW      7
>> #define TPL_CALLBACK          8
>> #define TPL_CALLBACK_HIGH     9
>> #define TPL_NOTIFY_LOW        15
>> #define TPL_NOTIFY            16
>> #define TPL_NOTIFY_HIGH       17
>> #define TPL_HIGH_LEVEL        31
>>
>> There’s already a long-in-the-tooth bug tracking a similar issue:
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978650310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yAXdROHkvpxrb%2F4CI%2FdFncuApiW8ai0sX%2B9U8vPUZ2c%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978660297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6aX%2BaZGdaALQIn01iPl2hHB5RgWCIozPyqmaaBS9FK8%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978660297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6aX%2BaZGdaALQIn01iPl2hHB5RgWCIozPyqmaaBS9FK8%3D&amp;reserved=0<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978650310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=yAXdROHkvpxrb%2F4CI%2FdFncuApiW8ai0sX%2B9U8vPUZ2c%3D&amp;reserved=0%3chttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978660297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6aX%2BaZGdaALQIn01iPl2hHB5RgWCIozPyqmaaBS9FK8%3D&amp;reserved=0%3chttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D1676&amp;data=04%7C01%7Cbret.barkelew%40microsoft.com%7C2f662e883dbb49dccda808d874d9c654%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637387825978660297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6aX%2BaZGdaALQIn01iPl2hHB5RgWCIozPyqmaaBS9FK8%3D&amp;reserved=0>>>
>>
>> This proposal is simpler than what’s in that bug, and would greatly simplify some of our event ordering (and code).
>>
>> Thoughts?
>>
>> If this conversation takes too long, I will publish a set of patches that just go with the lesser solution posted above, but I’d much rather work the problem this way.
>>
>> - Bret
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 
>>
>>
>


[-- Attachment #2: Type: text/html, Size: 33897 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [edk2-devel] VariablePolicy: Final Changes Thread 1 - TPL Ordering
  2020-10-22 21:54                             ` Bret Barkelew
@ 2020-10-27 14:04                               ` Laszlo Ersek
  0 siblings, 0 replies; 19+ messages in thread
From: Laszlo Ersek @ 2020-10-27 14:04 UTC (permalink / raw)
  To: devel, bret.barkelew, Kinney, Michael D, Andrew Fish

Hi Bret,

On 10/22/20 23:54, Bret Barkelew via groups.io wrote:
> We discussed this in our team this week. For the time being, I think the solution we’d be interested in pursuing (since the extra TPLs are a non-starter) would be to divide the EndOfDxe event into two events: EndOfDxeForPeopleWhoWantToAccessAllDrivers and EndOfDxeForPeopleWhoWantToProtectThingsBeforeBdsAndOpRoms. This would clarify the two groups that I mentioned earlier: resource manipulators and resource protectors.
> 
> Is there any appetite for making this change? If so, I can write up a patch for it and submit it as part of this series (or separately). If not, I think I’ll drop this entire discussion and stick with the status quo – leaving the general solution for another day.

If I understand correctly, dropping the whole topic is not helpful for
your purposes. Albeit dormant, the bug is present in edk2. It gets
tickled by your VariablePolicy work -- so in the end, the problem is
*blocking* your work. We should find some solution for it.

I think there is precedent for notifying the two groups of agents that
you identify ("resource manipulators" and "resource protectors").

Namely, the relationship between signaling EndOfDxe, and installing
SmmReadyToLock seems very similar.

(With SmmReadyToLock being actually two protocols, one in the DXE
protocol database, and another (automatically replicated one) in the SMM
protocol database -- but that split is not really relevant here.)

This distinction is in fact so important that BDS implementations are
required to explicitly ensure their order on every single platform
Multiple Intel Whitepapers refer to this relationship between EndOfDxe
and SmmReadyToLock, and (as I recall) EndOfDxe is called "the last
chance for chipset drivers to make changes" (or some such, please excuse
the sloppy wording), before SmmReadyToLock finalizes and hides things.

This seems to suggest that a new protocol (or event group) should be
introduced, to be signaled (or installed) after the current EndOfDxe
event group is signaled. It should likely come between EndOfDxe and
SmmReadyToLock, and its notification function would be responsible for
locking the variable policy.

As a catch-all, SmmReadyToLock could perhaps also lock the variable
policy, in case a given platform BDS omitted kicking the new event /
protocol between EndOfDxe and SmmReadyToLock.

(Note that the edk2 specific "LockBox" abstraction is also locked in
response to SmmReadyToLock.)

UEFI variables are a unique and high-profile aspect of UEFI, so I think
it's justifiable if every platform BDS needs to be updated with a small
snippet, for explicitly locking down the variable policy, at the exactly
right spot.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-10-27 14:04 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-10-07  0:23 VariablePolicy: Final Changes Thread 1 - TPL Ordering Bret Barkelew
2020-10-07  1:54 ` Michael D Kinney
2020-10-07  2:17   ` [edk2-devel] " Andrew Fish
2020-10-07 16:21     ` Michael D Kinney
2020-10-07 16:39       ` Bret Barkelew
2020-10-12 16:44         ` Michael D Kinney
2020-10-12 22:14           ` Bret Barkelew
2020-10-13 15:04             ` Michael D Kinney
2020-10-13 17:56               ` Bret Barkelew
2020-10-13 18:22                 ` Michael D Kinney
2020-10-13 18:37                   ` Bret Barkelew
     [not found]                   ` <163DA129E4AEAEEE.24865@groups.io>
2020-10-15 17:03                     ` Bret Barkelew
2020-10-19 18:44                       ` Laszlo Ersek
2020-10-19 18:48                         ` [EXTERNAL] " Bret Barkelew
2020-10-20  9:23                           ` Laszlo Ersek
2020-10-22 21:54                             ` Bret Barkelew
2020-10-27 14:04                               ` Laszlo Ersek
2020-10-07 13:04 ` Laszlo Ersek
2020-10-07 17:48   ` [EXTERNAL] " Bret Barkelew

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox