* [edk2-devel] Memory Attribute for depex section @ 2024-01-08 10:11 Nhi Pham via groups.io 2024-01-10 0:48 ` Nhi Pham via groups.io 2024-01-10 13:09 ` Laszlo Ersek 0 siblings, 2 replies; 24+ messages in thread From: Nhi Pham via groups.io @ 2024-01-08 10:11 UTC (permalink / raw) To: ardb+tianocore@kernel.org, devel@edk2.groups.io Hi Ard, all, Could you please help explain how the depex section in an image is mapped in terms of memory attribute? As my observation, dispatcher locates[1] the depex section inside the module image and write[2] an evaluated data to the depex if necessary for scheduled boot order. The problem that the depex section is now in RO+X memory due to a part of the module image, so a writing to depex would cause data abort. I'm unsure whether this issue is generic in EDK2 or not. I think of two approaches: #1 Relocate the depex section to heap memory for dependency evaluation? #2 EDK2 build tool to support granting write permission for depex section. [1] StandaloneMmPkg/Core/FwVol.c:236 [2] StandaloneMmPkg/Core/Dependency.c:256 Thanks, Nhi -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113373): https://edk2.groups.io/g/devel/message/113373 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-08 10:11 [edk2-devel] Memory Attribute for depex section Nhi Pham via groups.io @ 2024-01-10 0:48 ` Nhi Pham via groups.io 2024-01-10 13:09 ` Laszlo Ersek 1 sibling, 0 replies; 24+ messages in thread From: Nhi Pham via groups.io @ 2024-01-10 0:48 UTC (permalink / raw) To: ardb+tianocore@kernel.org, devel@edk2.groups.io, discuss ++ discuss@edk2.groups.io On 1/8/2024 5:11 PM, Nhi Pham wrote: > Hi Ard, all, > > Could you please help explain how the depex section in an image is > mapped in terms of memory attribute? > > As my observation, dispatcher locates[1] the depex section inside the > module image and write[2] an evaluated data to the depex if necessary > for scheduled boot order. The problem that the depex section is now in > RO+X memory due to a part of the module image, so a writing to depex > would cause data abort. I'm unsure whether this issue is generic in EDK2 > or not. > > I think of two approaches: > > #1 Relocate the depex section to heap memory for dependency evaluation? > > #2 EDK2 build tool to support granting write permission for depex section. > > [1] StandaloneMmPkg/Core/FwVol.c:236 > [2] StandaloneMmPkg/Core/Dependency.c:256 > > Thanks, > Nhi -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113497): https://edk2.groups.io/g/devel/message/113497 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-08 10:11 [edk2-devel] Memory Attribute for depex section Nhi Pham via groups.io 2024-01-10 0:48 ` Nhi Pham via groups.io @ 2024-01-10 13:09 ` Laszlo Ersek 2024-01-10 13:45 ` Laszlo Ersek 1 sibling, 1 reply; 24+ messages in thread From: Laszlo Ersek @ 2024-01-10 13:09 UTC (permalink / raw) To: devel, nhi, ardb+tianocore@kernel.org Hi, On 1/8/24 11:11, Nhi Pham via groups.io wrote: > Hi Ard, all, > > Could you please help explain how the depex section in an image is > mapped in terms of memory attribute? > > As my observation, dispatcher locates[1] the depex section inside the > module image and write[2] an evaluated data to the depex if necessary > for scheduled boot order. The problem that the depex section is now in > RO+X memory due to a part of the module image, so a writing to depex > would cause data abort. I'm unsure whether this issue is generic in > EDK2 or not. > > I think of two approaches: > > #1 Relocate the depex section to heap memory for dependency > #evaluation? > > #2 EDK2 build tool to support granting write permission for depex > #section. > > [1] StandaloneMmPkg/Core/FwVol.c:236 > [2] StandaloneMmPkg/Core/Dependency.c:256 here's my understanding of the issue. In Platform Init 1.8, section II-10.7 describes the depex instruction set for the DXE DISPATCHER. In particular, section II-10.7.3 describes the PUSH opcode as follows: PUSH <Protocol GUID> Pushes a Boolean value onto the stack. If the GUID is present in the handle database, then a TRUE is pushed onto the stack. If the GUID is not present in the handle database, then a FALSE is pushed onto the stack. The test for the GUID in the handle database may be performed with the Boot Service LocateProtocol(). This basically says that every time the dispatcher sees a PUSH opcode in a depex, it has to look up the protocol GUID in the protocol database. Now, as far as I can tell, edk2's DXE Core implementation wanted to optimize this. (And this goes back to ancient commit 28a00297189c, from 2007.) Here's the idea AIUI: - Assume the protocol is not found, i.e. we push a FALSE. This will likely mean that the driver whose DEPEX contains this PUSH opcode cannot be dispatched just yet. In other words, we're going to have to re-evaluate this PUSH at least once more, possibly multiple times. We cannot optimize anything here, because the needed protocol is not present yet, but it may become available later. When we next evaluate the DEPEX for this driver, we'll check again if the protocol has become available by then. - Assume the protocol is found, i.e. we push a TRUE. The optimization here is that we assume the protocol will not *disappear*, once available. The evaluation of the driver's *entire* DEPEX may still result in FALSE, so it's not guaranteed that the driver can be dispatched right now. However, given our assumption that the protocol we've just found will not disappear during DXE driver dispatch, we might want to "cache" the current successful protocol lookup, and when we *next* evaluate the DEPEX for this same driver (assuming we cannot dispatch it right now due to something else missing from its DEPEX), we don't want to perform the *same* protocol lookup again -- we'll just want to remember it was already there last time. And the way the DXE core seems to implement this optimization (i.e., how it "remembers success") is that it patches the DEPEX in-place: it replaces the PUSH opcode with a special (edk2-specific) opcode called EFI_DEP_REPLACE_TRUE (in addition to pushing the TRUE of course, to the eval stack). This opcode is functionally identical to the plain (and standard) TRUE opcode, so the next time the dispatcher evaluates the same depex and sees EFI_DEP_REPLACE_TRUE it will just push TRUE, the difference with the TRUE opcode is that EFI_DEP_REPLACE_TRUE also provides "room" for the -- otherwise useless -- original protocol GUID that used to be the argument of the PUSH opcode. The dispatcher ignores the protocol GUID on EFI_DEP_REPLACE_TRUE, beyond logging it. EFI_DEP_REPLACE_TRUE is documented in the code as follows ("MdeModulePkg/Core/Dxe/DxeMain.h"): /// /// EFI_DEP_REPLACE_TRUE - Used to dynamically patch the dependency expression /// to save time. A EFI_DEP_PUSH is evaluated one an /// replaced with EFI_DEP_REPLACE_TRUE. If PI spec's Vol 2 /// Driver Execution Environment Core Interface use 0xff /// as new DEPEX opcode. EFI_DEP_REPLACE_TRUE should be /// defined to a new value that is not conflicting with PI spec. /// #define EFI_DEP_REPLACE_TRUE 0xff This documentation is hard to understand due to English grammar errors. It means to say: "Used to dynamically patch the dependency expression to save time. An EFI_DEP_PUSH opcode is evaluated *once*, *and* replaced with EFI_DEP_REPLACE_TRUE. If PI spec's Vol 2 Driver Execution Environment Core Interface *starts using* 0xff as new DEPEX opcode *in the future*, *then* EFI_DEP_REPLACE_TRUE should be *redefined* to a new value that is not conflicting with *said new* PI spec." Over time, this optimization (hack) has made its way into the traditional PiSmmCore, and finally into the standalone SMM core, apparently. In summary, this seems like a performance optimization, and should make no functional difference. If we remove it, then driver dispatch could become a bit slower (because we'd re-evaluate such PUSH opcodes too that had previously succeeded in locating the given protocol GUID), but then we'd not try to overwrite the depex section of the driver image in memory. I think that keeping the depex section read-only is valuable, so I'd rule out #2. I'd also not start with option #1 -- copying the depex to heap memory, just so this optimization can succeed. I'd simply start by removing the optimization, and measuring how much driver dispatch slows down in practice, on various platforms. Can you try this? (I have only build-tested and "uncrustified" it.) The patch removes the EFI_DEP_REPLACE_TRUE handling altogether, plus it CONST-ifies the Iterator pointer (which points into the DEPEX section), so that the compiler catch any possible accesses at *build time* that would write to the write-protected DEPEX memory area. ----------------------------- diff --git a/StandaloneMmPkg/Core/Dependency.c b/StandaloneMmPkg/Core/Dependency.c index 440fe3e45238..2bcb07d34666 100644 --- a/StandaloneMmPkg/Core/Dependency.c +++ b/StandaloneMmPkg/Core/Dependency.c @@ -13,16 +13,6 @@ #include "StandaloneMmCore.h" -/// -/// EFI_DEP_REPLACE_TRUE - Used to dynamically patch the dependency expression -/// to save time. A EFI_DEP_PUSH is evaluated one an -/// replaced with EFI_DEP_REPLACE_TRUE. If PI spec's Vol 2 -/// Driver Execution Environment Core Interface use 0xff -/// as new DEPEX opcode. EFI_DEP_REPLACE_TRUE should be -/// defined to a new value that is not conflicting with PI spec. -/// -#define EFI_DEP_REPLACE_TRUE 0xff - /// /// Define the initial size of the dependency expression evaluation stack /// @@ -170,12 +160,12 @@ MmIsSchedulable ( IN EFI_MM_DRIVER_ENTRY *DriverEntry ) { - EFI_STATUS Status; - UINT8 *Iterator; - BOOLEAN Operator; - BOOLEAN Operator2; - EFI_GUID DriverGuid; - VOID *Interface; + EFI_STATUS Status; + CONST UINT8 *Iterator; + BOOLEAN Operator; + BOOLEAN Operator2; + EFI_GUID DriverGuid; + VOID *Interface; Operator = FALSE; Operator2 = FALSE; @@ -253,8 +243,7 @@ MmIsSchedulable ( Status = PushBool (FALSE); } else { DEBUG ((DEBUG_DISPATCH, " PUSH GUID(%g) = TRUE\n", &DriverGuid)); - *Iterator = EFI_DEP_REPLACE_TRUE; - Status = PushBool (TRUE); + Status = PushBool (TRUE); } if (EFI_ERROR (Status)) { @@ -356,18 +345,6 @@ MmIsSchedulable ( DEBUG ((DEBUG_DISPATCH, " RESULT = %a\n", Operator ? "TRUE" : "FALSE")); return Operator; - case EFI_DEP_REPLACE_TRUE: - CopyMem (&DriverGuid, Iterator + 1, sizeof (EFI_GUID)); - DEBUG ((DEBUG_DISPATCH, " PUSH GUID(%g) = TRUE\n", &DriverGuid)); - Status = PushBool (TRUE); - if (EFI_ERROR (Status)) { - DEBUG ((DEBUG_DISPATCH, " RESULT = FALSE (Unexpected error)\n")); - return FALSE; - } - - Iterator += sizeof (EFI_GUID); - break; - default: DEBUG ((DEBUG_DISPATCH, " RESULT = FALSE (Unknown opcode)\n")); goto Done; ----------------------------- Thanks Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113531): https://edk2.groups.io/g/devel/message/113531 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-10 13:09 ` Laszlo Ersek @ 2024-01-10 13:45 ` Laszlo Ersek 2024-01-10 21:50 ` Pedro Falcato 2024-01-12 2:44 ` Andrew Fish via groups.io 0 siblings, 2 replies; 24+ messages in thread From: Laszlo Ersek @ 2024-01-10 13:45 UTC (permalink / raw) To: devel, nhi, ardb+tianocore@kernel.org, Andrew Fish (+ Andrew) On 1/10/24 14:09, Laszlo Ersek wrote: > I think that keeping the depex section read-only is valuable, so I'd > rule out #2. I'd also not start with option #1 -- copying the depex to > heap memory, just so this optimization can succeed. I'd simply start by > removing the optimization, and measuring how much driver dispatch slows > down in practice, on various platforms. > > Can you try this? (I have only build-tested and "uncrustified" it.) > > The patch removes the EFI_DEP_REPLACE_TRUE handling altogether, plus it > CONST-ifies the Iterator pointer (which points into the DEPEX section), > so that the compiler catch any possible accesses at *build time* that > would write to the write-protected DEPEX memory area. On a tangent: the optimization in question highlights a more general problem, namely that the DXE (and possibly MM/SMM) protocol databases are considered slow, for some access patterns. Edk2 implements those protocol databases with linked lists, where lookup costs O(n) operations (average and worst cases). And protocol lookups are quite frequent (for example, in depex evaluations, they could be considered "particularly frequent"). (1) The "Tasks" wiki page mentions something *similar* (but not the same); see https://github.com/tianocore/tianocore.github.io/wiki/Tasks#datahub--gcd-scalability The description is: "The DXE core's DataHub and GCD (Global Coherency Domain) layers don't scale well as the number of data items gets large, since they are based on simple linked lists. Find better data structures." The same might apply more or less to the protocol database implementation. (2) More to the point, Andrew Fish reported earlier that at Apple, they had rewritten the DXE protocol database, using the red-black tree OrderedCollectionLib that I had contributed previously to edk2 -- and they saw significant performance improvements. So upstreaming that feature to edk2 could be very valuable. (Red-black trees have O(log(n)) time cost (worst case) for lookup, insertion and deletion, and O(n) cost for iterating through the whole data structure.) Let me see if I can find the bugzilla ticket... Ah, I got it. Apologies, I misremembered: Andrew's comment was not about the protocol database, but about the handle database. Here it is: https://bugzilla.tianocore.org/show_bug.cgi?id=988#c7 (the BZ is still CONFIRMED btw...) Still, I think it must be related in a way. Namely, an EFI handle exists if and only if at least one protocol interface is installed on it. If you uninstall the last protocol interface from a handle, then the handle is destroyed -- in fact that's the *only* way to destroy a handle, to my understanding. See EFI_BOOT_SERVICES.UninstallProtocolInterface() in the UEFI spec: "If the last protocol interface is removed from a handle, the handle is freed and is no longer valid". Furthermore, calling InstallProtocolInterface() and InstallMultipleProtocolInterfaces() is how one *creates* new handles. So given how handles and protocol interfaces are conceptually interlinked, an rbtree-based protocol DB might have to maintain multiple rbtrees internally (for the ability to search the database quickly with different types of "keys"). I don't have a design ready in my mind at all (I'm not that familiar with the *current*, list-based implementation to begin with!). Upstreaming Apple's (experimental?) code could prove very helpful. Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113532): https://edk2.groups.io/g/devel/message/113532 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-10 13:45 ` Laszlo Ersek @ 2024-01-10 21:50 ` Pedro Falcato 2024-01-11 8:46 ` Laszlo Ersek 2024-01-12 2:44 ` Andrew Fish via groups.io 1 sibling, 1 reply; 24+ messages in thread From: Pedro Falcato @ 2024-01-10 21:50 UTC (permalink / raw) To: devel, lersek; +Cc: nhi, ardb+tianocore@kernel.org, Andrew Fish On Wed, Jan 10, 2024 at 1:45 PM Laszlo Ersek <lersek@redhat.com> wrote: > > (+ Andrew) > > On 1/10/24 14:09, Laszlo Ersek wrote: > > > I think that keeping the depex section read-only is valuable, so I'd > > rule out #2. I'd also not start with option #1 -- copying the depex to > > heap memory, just so this optimization can succeed. I'd simply start by > > removing the optimization, and measuring how much driver dispatch slows > > down in practice, on various platforms. > > > > Can you try this? (I have only build-tested and "uncrustified" it.) > > > > The patch removes the EFI_DEP_REPLACE_TRUE handling altogether, plus it > > CONST-ifies the Iterator pointer (which points into the DEPEX section), > > so that the compiler catch any possible accesses at *build time* that > > would write to the write-protected DEPEX memory area. > > On a tangent: the optimization in question highlights a more general > problem, namely that the DXE (and possibly MM/SMM) protocol databases > are considered slow, for some access patterns. > > Edk2 implements those protocol databases with linked lists, where lookup > costs O(n) operations (average and worst cases). And protocol lookups > are quite frequent (for example, in depex evaluations, they could be > considered "particularly frequent"). > > (1) The "Tasks" wiki page mentions something *similar* (but not the > same); see > > https://github.com/tianocore/tianocore.github.io/wiki/Tasks#datahub--gcd-scalability > > The description is: "The DXE core's DataHub and GCD (Global Coherency > Domain) layers don't scale well as the number of data items gets large, > since they are based on simple linked lists. Find better data structures." How large do they usually get? What's the worst case? > > The same might apply more or less to the protocol database implementation. > > (2) More to the point, Andrew Fish reported earlier that at Apple, they > had rewritten the DXE protocol database, using the red-black tree > OrderedCollectionLib that I had contributed previously to edk2 -- and > they saw significant performance improvements. > > So upstreaming that feature to edk2 could be very valuable. (Red-black > trees have O(log(n)) time cost (worst case) for lookup, insertion and > deletion, and O(n) cost for iterating through the whole data structure.) FWIW, can we do better than an RB tree? They're notoriously cache unfriendly... > > Let me see if I can find the bugzilla ticket... > > Ah, I got it. Apologies, I misremembered: Andrew's comment was not about > the protocol database, but about the handle database. Here it is: > > https://bugzilla.tianocore.org/show_bug.cgi?id=988#c7 > > (the BZ is still CONFIRMED btw...) > > Still, I think it must be related in a way. Namely, an EFI handle exists > if and only if at least one protocol interface is installed on it. If > you uninstall the last protocol interface from a handle, then the handle > is destroyed -- in fact that's the *only* way to destroy a handle, to my > understanding. See EFI_BOOT_SERVICES.UninstallProtocolInterface() in the > UEFI spec: "If the last protocol interface is removed from a handle, the > handle is freed and is no longer valid". Furthermore, calling > InstallProtocolInterface() and InstallMultipleProtocolInterfaces() is > how one *creates* new handles. > > So given how handles and protocol interfaces are conceptually > interlinked, an rbtree-based protocol DB might have to maintain multiple > rbtrees internally (for the ability to search the database quickly with > different types of "keys"). I don't have a design ready in my mind at > all (I'm not that familiar with the *current*, list-based implementation > to begin with!). Upstreaming Apple's (experimental?) code could prove > very helpful. Ok, some thoughts: For the handle database, if we made EFI_HANDLE an integer instead, we could very easily use something similar to a radix tree (see Linux's xarray). This would provide O(log(n)) lookups and insertion with a much higher fan-out, without needing CoreValidateHandle's awful O(n) lookup every time it gets an EFI_HANDLE. You'd just look up the handle in the tree and if NULL, it's invalid. [1] For the protocol database, you'd replace the linked list with a simple hashtable, hashed by protocol. Something as simple as LIST_ENTRY mProtocolHashtable[64]; would probably be enough to fan out most of the problems (I think? How many different protocols are there?) -- Pedro [1] One could be wary of doing this lookup for every EFI_HANDLE instead of having the pointer directly. But this is much more efficient than needing to iterate a linked list to validate a pointer. Considering an xarray-like radix tree as previously described, two levels with a fan-out of 64 could describe indices 0 - 4096 (64 * 64), which is much more efficient than chasing through pointers (worst case) 4096 times until we find the handle. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113541): https://edk2.groups.io/g/devel/message/113541 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-10 21:50 ` Pedro Falcato @ 2024-01-11 8:46 ` Laszlo Ersek 2024-01-11 11:03 ` Laszlo Ersek 2024-01-12 2:10 ` Pedro Falcato 0 siblings, 2 replies; 24+ messages in thread From: Laszlo Ersek @ 2024-01-11 8:46 UTC (permalink / raw) To: Pedro Falcato, devel; +Cc: nhi, ardb+tianocore@kernel.org, Andrew Fish On 1/10/24 22:50, Pedro Falcato wrote: > On Wed, Jan 10, 2024 at 1:45 PM Laszlo Ersek <lersek@redhat.com> > wrote: >> >> (+ Andrew) >> >> On 1/10/24 14:09, Laszlo Ersek wrote: >> >>> I think that keeping the depex section read-only is valuable, so I'd >>> rule out #2. I'd also not start with option #1 -- copying the depex >>> to heap memory, just so this optimization can succeed. I'd simply >>> start by removing the optimization, and measuring how much driver >>> dispatch slows down in practice, on various platforms. >>> >>> Can you try this? (I have only build-tested and "uncrustified" it.) >>> >>> The patch removes the EFI_DEP_REPLACE_TRUE handling altogether, plus >>> it CONST-ifies the Iterator pointer (which points into the DEPEX >>> section), so that the compiler catch any possible accesses at *build >>> time* that would write to the write-protected DEPEX memory area. >> >> On a tangent: the optimization in question highlights a more general >> problem, namely that the DXE (and possibly MM/SMM) protocol databases >> are considered slow, for some access patterns. >> >> Edk2 implements those protocol databases with linked lists, where >> lookup costs O(n) operations (average and worst cases). And protocol >> lookups are quite frequent (for example, in depex evaluations, they >> could be considered "particularly frequent"). >> >> (1) The "Tasks" wiki page mentions something *similar* (but not the >> same); see >> >> https://github.com/tianocore/tianocore.github.io/wiki/Tasks#datahub--gcd-scalability >> >> The description is: "The DXE core's DataHub and GCD (Global Coherency >> Domain) layers don't scale well as the number of data items gets >> large, since they are based on simple linked lists. Find better data >> structures." > > How large do they usually get? What's the worst case? No idea. > >> >> The same might apply more or less to the protocol database >> implementation. >> >> (2) More to the point, Andrew Fish reported earlier that at Apple, >> they had rewritten the DXE protocol database, using the red-black >> tree OrderedCollectionLib that I had contributed previously to edk2 >> -- and they saw significant performance improvements. >> >> So upstreaming that feature to edk2 could be very valuable. >> (Red-black trees have O(log(n)) time cost (worst case) for lookup, >> insertion and deletion, and O(n) cost for iterating through the whole >> data structure.) > > FWIW, can we do better than an RB tree? They're notoriously cache > unfriendly... Sure, if someone contributes a different data structure that is suitable for the task :) RB trees may be cache unfriendly, but the functionality they provide with O(log(n)) worst case performance is amazing. You can use them as read-write associate arrays, sorted lists supporting forward and backward traversal (in fact tools for sorting), priority queues, etc. When I contributed the code, edk2 didn't have any associative array type, so something generic that would address the widest range of use cases looked like a good idea. (And indeed the library has been well applied in several of those use cases since, in various parts of edk2 -- for sorting, uniqueness-checking, async interface token tracking & lookups.) This is not an argument against a more suitable data structure of course. Just pointing out that the RB tree has worked well thus far. E.g., under the BZ link below, Andrew mentions a diagnostic tool that creates 3000 handles. Looking up an element in a 3000-element list would cost 1500 iterations on average; using a balanced binary search tree it might cost ~11 iterations. Assuming that linked lists and linked binary search trees are similarly cache-unfriendly, that's a ~136 factor of improvement. > >> >> Let me see if I can find the bugzilla ticket... >> >> Ah, I got it. Apologies, I misremembered: Andrew's comment was not >> about the protocol database, but about the handle database. Here it >> is: >> >> https://bugzilla.tianocore.org/show_bug.cgi?id=988#c7 >> >> (the BZ is still CONFIRMED btw...) >> >> Still, I think it must be related in a way. Namely, an EFI handle >> exists if and only if at least one protocol interface is installed on >> it. If you uninstall the last protocol interface from a handle, then >> the handle is destroyed -- in fact that's the *only* way to destroy a >> handle, to my understanding. See >> EFI_BOOT_SERVICES.UninstallProtocolInterface() in the UEFI spec: "If >> the last protocol interface is removed from a handle, the handle is >> freed and is no longer valid". Furthermore, calling >> InstallProtocolInterface() and InstallMultipleProtocolInterfaces() is >> how one *creates* new handles. >> >> So given how handles and protocol interfaces are conceptually >> interlinked, an rbtree-based protocol DB might have to maintain >> multiple rbtrees internally (for the ability to search the database >> quickly with different types of "keys"). I don't have a design ready >> in my mind at all (I'm not that familiar with the *current*, >> list-based implementation to begin with!). Upstreaming Apple's >> (experimental?) code could prove very helpful. > > Ok, some thoughts: > > For the handle database, if we made EFI_HANDLE an integer instead, we > could very easily use something similar to a radix tree (see Linux's > xarray). This would provide O(log(n)) lookups and insertion with a > much higher fan-out, without needing CoreValidateHandle's awful O(n) > lookup every time it gets an EFI_HANDLE. You'd just look up the handle > in the tree and if NULL, it's invalid. [1] EFI_HANDLE is a typedef to (VOID*) per UEFI spec -- section 2.3.1 "Data Types" says: "A collection of related interfaces. Type VOID *" --, so EFI_HANDLE can't be redefined, it would break compatibility with everything. Internally, it could be cast to UINTN, and then those 32- or 64-bit long bit strings could be considered as keys for a radix tree. But I'm not sure if this would still be as useful as the two-level radix tree (with a fanout of 64) that you describe. (If we wanted to map the EFI_HANDLEs to the index space [0..4095], then we'd just land back at the original problem: storing the EFI_HANDLEs in an associative data structure, for example in a hash map. We could find a good hash function, but because it would mathematically reduce the key space from 64-bit pointers to 12-bit indices, collisions would be theoretically inevitable, and the data structure couldn't avoid *storing* the mapping. This is not a general argument against using a hash table, it just means that we'd have to provide a good initial table size, and then deal with resizing cost whenever necessary.) > For the protocol database, you'd replace the linked list with a simple > hashtable, hashed by protocol. Something as simple as LIST_ENTRY > mProtocolHashtable[64]; would probably be enough to fan out most of > the problems (I think? How many different protocols are there?) I can answer this question reasonably well, I think. I have a script that collects symbolic names of GUIDs from the edk2 tree (plus hardcodes a number of well-known but not edk2 GUIDs), and creates a "sed" script out of them. Then another script uses this "sed" script for filtering edk2 logs -- the idea being to replace the whole bunch of logged GUIDs with their symbolic names. That makes logs much easier to read. The generator script is written such a way that the generated "sed" script only grows over time; put differently, this "dictionary" of name<->GUID associations never forgets, it only picks up new GUIDs. The "sed" script (= the dictionary file) consists of entries like s:FFB19961-79CC-4684-84A8-C31B0A2BBE82:[EarlyFdt16550SerialPortHookLib]:ig s:FFB56F09-65E3-4462-A799-2F0D1930D38C:[DxeContextBufferModuleConfigLib]:ig s:FFE06BDD-6107-46A6-7BB2-5A9C7EC5275C:[EfiAcpiTableProtocol]:ig s:FFF12B8D-7696-4C8B-A985-2747075B4F50:[EfiSystemNvDataFv]:ig it's sorted uniquely by GUID. Right now, it has 3074 entries. (People like generating GUIDs! :) In PI/UEFI/edk2, *everything* is a GUID, not just protocols!) So using 64 buckets would hash ~50 different protocol GUIDs to any given bucket (although, again, not all of those three thousand GUIDs are *protocol* GUIDs). 1024 buckets might work better. Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113595): https://edk2.groups.io/g/devel/message/113595 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-11 8:46 ` Laszlo Ersek @ 2024-01-11 11:03 ` Laszlo Ersek 2024-01-12 2:10 ` Pedro Falcato 1 sibling, 0 replies; 24+ messages in thread From: Laszlo Ersek @ 2024-01-11 11:03 UTC (permalink / raw) To: Pedro Falcato, devel; +Cc: nhi, ardb+tianocore@kernel.org, Andrew Fish On 1/11/24 09:46, Laszlo Ersek wrote: > On 1/10/24 22:50, Pedro Falcato wrote: >> For the protocol database, you'd replace the linked list with a simple >> hashtable, hashed by protocol. Something as simple as LIST_ENTRY >> mProtocolHashtable[64]; would probably be enough to fan out most of >> the problems (I think? How many different protocols are there?) > > I can answer this question reasonably well, I think. I have a script > that collects symbolic names of GUIDs from the edk2 tree (plus hardcodes > a number of well-known but not edk2 GUIDs), and creates a "sed" script > out of them. Then another script uses this "sed" script for filtering > edk2 logs -- the idea being to replace the whole bunch of logged GUIDs > with their symbolic names. That makes logs much easier to read. > > The generator script is written such a way that the generated "sed" > script only grows over time; put differently, this "dictionary" of > name<->GUID associations never forgets, it only picks up new GUIDs. The > "sed" script (= the dictionary file) consists of entries like > > s:FFB19961-79CC-4684-84A8-C31B0A2BBE82:[EarlyFdt16550SerialPortHookLib]:ig > s:FFB56F09-65E3-4462-A799-2F0D1930D38C:[DxeContextBufferModuleConfigLib]:ig > s:FFE06BDD-6107-46A6-7BB2-5A9C7EC5275C:[EfiAcpiTableProtocol]:ig > s:FFF12B8D-7696-4C8B-A985-2747075B4F50:[EfiSystemNvDataFv]:ig > > it's sorted uniquely by GUID. > > Right now, it has 3074 entries. (People like generating GUIDs! :) In > PI/UEFI/edk2, *everything* is a GUID, not just protocols!) If I grep the dictionary for "Protocol", I get 515 hits. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113615): https://edk2.groups.io/g/devel/message/113615 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-11 8:46 ` Laszlo Ersek 2024-01-11 11:03 ` Laszlo Ersek @ 2024-01-12 2:10 ` Pedro Falcato 2024-01-12 9:35 ` Laszlo Ersek 1 sibling, 1 reply; 24+ messages in thread From: Pedro Falcato @ 2024-01-12 2:10 UTC (permalink / raw) To: Laszlo Ersek; +Cc: devel, nhi, ardb+tianocore@kernel.org, Andrew Fish On Thu, Jan 11, 2024 at 8:46 AM Laszlo Ersek <lersek@redhat.com> wrote: > > On 1/10/24 22:50, Pedro Falcato wrote: > > On Wed, Jan 10, 2024 at 1:45 PM Laszlo Ersek <lersek@redhat.com> > > wrote: > >> > >> (+ Andrew) > >> > >> On 1/10/24 14:09, Laszlo Ersek wrote: > >> > >>> I think that keeping the depex section read-only is valuable, so I'd > >>> rule out #2. I'd also not start with option #1 -- copying the depex > >>> to heap memory, just so this optimization can succeed. I'd simply > >>> start by removing the optimization, and measuring how much driver > >>> dispatch slows down in practice, on various platforms. > >>> > >>> Can you try this? (I have only build-tested and "uncrustified" it.) > >>> > >>> The patch removes the EFI_DEP_REPLACE_TRUE handling altogether, plus > >>> it CONST-ifies the Iterator pointer (which points into the DEPEX > >>> section), so that the compiler catch any possible accesses at *build > >>> time* that would write to the write-protected DEPEX memory area. > >> > >> On a tangent: the optimization in question highlights a more general > >> problem, namely that the DXE (and possibly MM/SMM) protocol databases > >> are considered slow, for some access patterns. > >> > >> Edk2 implements those protocol databases with linked lists, where > >> lookup costs O(n) operations (average and worst cases). And protocol > >> lookups are quite frequent (for example, in depex evaluations, they > >> could be considered "particularly frequent"). > >> > >> (1) The "Tasks" wiki page mentions something *similar* (but not the > >> same); see > >> > >> https://github.com/tianocore/tianocore.github.io/wiki/Tasks#datahub--gcd-scalability > >> > >> The description is: "The DXE core's DataHub and GCD (Global Coherency > >> Domain) layers don't scale well as the number of data items gets > >> large, since they are based on simple linked lists. Find better data > >> structures." > > > > How large do they usually get? What's the worst case? > > No idea. > > > > >> > >> The same might apply more or less to the protocol database > >> implementation. > >> > >> (2) More to the point, Andrew Fish reported earlier that at Apple, > >> they had rewritten the DXE protocol database, using the red-black > >> tree OrderedCollectionLib that I had contributed previously to edk2 > >> -- and they saw significant performance improvements. > >> > >> So upstreaming that feature to edk2 could be very valuable. > >> (Red-black trees have O(log(n)) time cost (worst case) for lookup, > >> insertion and deletion, and O(n) cost for iterating through the whole > >> data structure.) > > > > FWIW, can we do better than an RB tree? They're notoriously cache > > unfriendly... > > Sure, if someone contributes a different data structure that is suitable > for the task :) Hey, I happen to have written one! https://github.com/heatd/Onyx/blob/master/kernel/kernel/radix.cpp It just needs some C'ifying, then some EFI'ing on top, but I'm fairly confident in its stability. > > RB trees may be cache unfriendly, but the functionality they provide > with O(log(n)) worst case performance is amazing. You can use them as > read-write associate arrays, sorted lists supporting forward and > backward traversal (in fact tools for sorting), priority queues, etc. > > When I contributed the code, edk2 didn't have any associative array > type, so something generic that would address the widest range of use > cases looked like a good idea. (And indeed the library has been well > applied in several of those use cases since, in various parts of edk2 -- > for sorting, uniqueness-checking, async interface token tracking & > lookups.) Definitely, I use it in Ext4Dxe too, it's great! (Although I do have a small nit in that it allocates nodes itself, and there's no way around it, but oh well...) > > This is not an argument against a more suitable data structure of > course. Just pointing out that the RB tree has worked well thus far. > E.g., under the BZ link below, Andrew mentions a diagnostic tool that > creates 3000 handles. Looking up an element in a 3000-element list would > cost 1500 iterations on average; using a balanced binary search tree it > might cost ~11 iterations. Assuming that linked lists and linked binary > search trees are similarly cache-unfriendly, that's a ~136 factor of > improvement. I would guess that binary trees are more cache-and-speculation unfriendly as you need to decide which branch you're going down. But yes, even a RB tree would be a great improvement over the linked list. > > > > >> > >> Let me see if I can find the bugzilla ticket... > >> > >> Ah, I got it. Apologies, I misremembered: Andrew's comment was not > >> about the protocol database, but about the handle database. Here it > >> is: > >> > >> https://bugzilla.tianocore.org/show_bug.cgi?id=988#c7 > >> > >> (the BZ is still CONFIRMED btw...) > >> > >> Still, I think it must be related in a way. Namely, an EFI handle > >> exists if and only if at least one protocol interface is installed on > >> it. If you uninstall the last protocol interface from a handle, then > >> the handle is destroyed -- in fact that's the *only* way to destroy a > >> handle, to my understanding. See > >> EFI_BOOT_SERVICES.UninstallProtocolInterface() in the UEFI spec: "If > >> the last protocol interface is removed from a handle, the handle is > >> freed and is no longer valid". Furthermore, calling > >> InstallProtocolInterface() and InstallMultipleProtocolInterfaces() is > >> how one *creates* new handles. > >> > >> So given how handles and protocol interfaces are conceptually > >> interlinked, an rbtree-based protocol DB might have to maintain > >> multiple rbtrees internally (for the ability to search the database > >> quickly with different types of "keys"). I don't have a design ready > >> in my mind at all (I'm not that familiar with the *current*, > >> list-based implementation to begin with!). Upstreaming Apple's > >> (experimental?) code could prove very helpful. > > > > Ok, some thoughts: > > > > For the handle database, if we made EFI_HANDLE an integer instead, we > > could very easily use something similar to a radix tree (see Linux's > > xarray). This would provide O(log(n)) lookups and insertion with a > > much higher fan-out, without needing CoreValidateHandle's awful O(n) > > lookup every time it gets an EFI_HANDLE. You'd just look up the handle > > in the tree and if NULL, it's invalid. [1] > > EFI_HANDLE is a typedef to (VOID*) per UEFI spec -- section 2.3.1 "Data > Types" says: "A collection of related interfaces. Type VOID *" --, so > EFI_HANDLE can't be redefined, it would break compatibility with > everything. > > Internally, it could be cast to UINTN, and then those 32- or 64-bit long > bit strings could be considered as keys for a radix tree. But I'm not > sure if this would still be as useful as the two-level radix tree (with > a fanout of 64) that you describe. > > (If we wanted to map the EFI_HANDLEs to the index space [0..4095], then > we'd just land back at the original problem: storing the EFI_HANDLEs in > an associative data structure, for example in a hash map. We could find > a good hash function, but because it would mathematically reduce the key > space from 64-bit pointers to 12-bit indices, collisions would be > theoretically inevitable, and the data structure couldn't avoid > *storing* the mapping. This is not a general argument against using a > hash table, it just means that we'd have to provide a good initial table > size, and then deal with resizing cost whenever necessary.) My idea was to make each handle an index - like a file descriptor - AFAIK there's no reason why it *needs* to be an actual pointer. We'd allocate indices when creating a handle, and return that (casted to VOID*). Say, for a naive allocation scheme and the data structure I linked above: /* 0 (NULL) must be reserved, as it's a bad EFI_HANDLE result. We could also return HandleId + 1, and - 1 when indexing back to the radix tree. Either option works */ STATIC UINTN mNextHandleId = 1; STATIC radix_tree mHandleTree; STATIC IHANDLE * CreateHandle ( OUT UINTN *OutHandleId ) { IHANDLE *Handle = AllocatePool(sizeof(IHANDLE)); *OutHandleId = mNextHandleId++; mHandleTree.insert(*OutHandleId, Handle); return Handle; } /* And then transform CoreValidateHandle into */ STATIC EFI_STATUS CoreGetHandle ( IN EFI_HANDLE Handle, OUT IHANDLE **OutHandle ) { IHANDLE *Result; if (Handle == NULL) return EFI_INVALID_PARAMETER; expected<unsigned long, int> result = mHandleTree.get((UINTN) Handle); if (result.has_error()) return EFI_INVALID_PARAMETER; /* Handle not in the tree */ *OutHandle = (IHANDLE *) result.value(); return EFI_SUCCESS; } -- Hopefully the snippet gets my point across. Insertion, searching and iteration should all be super fast (O(logFanOut(N))). Note that one could also just use a standard resizable vector (a-la std::vector), but I don't know if that maps well to EFI handles and memory allocation. > > > For the protocol database, you'd replace the linked list with a simple > > hashtable, hashed by protocol. Something as simple as LIST_ENTRY > > mProtocolHashtable[64]; would probably be enough to fan out most of > > the problems (I think? How many different protocols are there?) > > I can answer this question reasonably well, I think. I have a script > that collects symbolic names of GUIDs from the edk2 tree (plus hardcodes > a number of well-known but not edk2 GUIDs), and creates a "sed" script > out of them. Then another script uses this "sed" script for filtering > edk2 logs -- the idea being to replace the whole bunch of logged GUIDs > with their symbolic names. That makes logs much easier to read. > > The generator script is written such a way that the generated "sed" > script only grows over time; put differently, this "dictionary" of > name<->GUID associations never forgets, it only picks up new GUIDs. The > "sed" script (= the dictionary file) consists of entries like > > s:FFB19961-79CC-4684-84A8-C31B0A2BBE82:[EarlyFdt16550SerialPortHookLib]:ig > s:FFB56F09-65E3-4462-A799-2F0D1930D38C:[DxeContextBufferModuleConfigLib]:ig > s:FFE06BDD-6107-46A6-7BB2-5A9C7EC5275C:[EfiAcpiTableProtocol]:ig > s:FFF12B8D-7696-4C8B-A985-2747075B4F50:[EfiSystemNvDataFv]:ig > > it's sorted uniquely by GUID. > > Right now, it has 3074 entries. (People like generating GUIDs! :) In > PI/UEFI/edk2, *everything* is a GUID, not just protocols!) > > So using 64 buckets would hash ~50 different protocol GUIDs to any given > bucket (although, again, not all of those three thousand GUIDs are > *protocol* GUIDs). 1024 buckets might work better. Oh! That's nice! Taking your protocol figure of 515 into account, 512 or even 1024 buckets should probably take care of that. I should note that I find it super hard to get a concrete idea on performance for EFI firmware without adequate tooling - I meant to write a standalone flamegraph tool a few weeks back (even posted in edk2-devel), but, as far as I could tell, the architectural timer protocol couldn't get me the interrupt frame[1]. Until then, whether any of this radix tree vs RB tree vs flat array stuff really matters... I find it hard to say. [1] x86 has 3 timers (PIT, LAPIC timer, HPET) and performance monitoring interrupts, and I couldn't freely use any of them :^) -- Pedro -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113638): https://edk2.groups.io/g/devel/message/113638 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 2:10 ` Pedro Falcato @ 2024-01-12 9:35 ` Laszlo Ersek 2024-01-12 14:46 ` Pedro Falcato 0 siblings, 1 reply; 24+ messages in thread From: Laszlo Ersek @ 2024-01-12 9:35 UTC (permalink / raw) To: Pedro Falcato; +Cc: devel, nhi, ardb+tianocore@kernel.org, Andrew Fish On 1/12/24 03:10, Pedro Falcato wrote: > On Thu, Jan 11, 2024 at 8:46 AM Laszlo Ersek <lersek@redhat.com> wrote: >> >> On 1/10/24 22:50, Pedro Falcato wrote: >>> FWIW, can we do better than an RB tree? They're notoriously cache >>> unfriendly... >> >> Sure, if someone contributes a different data structure that is suitable >> for the task :) > > Hey, I happen to have written one! > https://github.com/heatd/Onyx/blob/master/kernel/kernel/radix.cpp > It just needs some C'ifying, then some EFI'ing on top, but I'm fairly > confident in its stability. Yes, this is absolutely viable and welcome. You seem to hold the copyright on it, too, so you could even relicense (although MIT should just be fine for edk2). (I had done the same with the rbtree code -- I had written that code much earlier, not for edk2. I re-verified it and ported it to edk2, and relicensed it.) >> When I contributed the code, edk2 didn't have any associative array >> type, so something generic that would address the widest range of use >> cases looked like a good idea. (And indeed the library has been well >> applied in several of those use cases since, in various parts of edk2 -- >> for sorting, uniqueness-checking, async interface token tracking & >> lookups.) > > Definitely, I use it in Ext4Dxe too, it's great! Heh, I didn't know that. Thanks! > (Although I do have a > small nit in that it allocates nodes itself, and there's no way around > it, but oh well...) Right, that's the usual intrusive vs. non-intrusive containers debate :) I didn't mind more allocations (and hence perhaps more fragmentation), but I wanted to (a) hide the node internals, (b) the possibility to link any given piece of user structure into multiple trees (and other containers) without having to modify the user structure itself (i.e. without embedding further node / link structs into the user struct). I figure each of intrusive and non-intrusive has its advantages over the other; I just went with my preferences. > My idea was to make each handle an index - like a file descriptor - > AFAIK there's no reason why it *needs* to be an actual pointer. > We'd allocate indices when creating a handle, and return that (casted to VOID*). Huh, sneaky. I see two potential problems with this. First, per C std, these "pointers" would be invalid (they wouldn't actually point to valid memory), so even just evaluating them (not dereferencing them!) would invoke undefined behavior. May or may not matter in practice. With compilers getting smarter about optimization (and stricter about std conformance), there could be issues, perhaps. The other concern is a bit contrived, but I *guess* there could be code out there that actually dereferences EFI_HANDLEs. That code would crash. May or may not matter, because such code is arguably already semantically invalid. (It would not necessarily be invalid at the language level -- cf. my previous paragraph --, because passing an otherwise valid EFI_HANDLE to CopyMem, for copying just 1 byte out of the underlying opaque data structure, would not violate the language.) > I should note that I find it super hard to get a concrete idea on > performance for EFI firmware without adequate tooling - I meant to > write a standalone flamegraph tool a few weeks back (even posted in > edk2-devel), but, as far as I could tell, the architectural timer > protocol couldn't get me the interrupt frame[1]. Until then, whether > any of this radix tree vs RB tree vs flat array stuff really > matters... I find it hard to say. > > [1] x86 has 3 timers (PIT, LAPIC timer, HPET) and performance > monitoring interrupts, and I couldn't freely use any of them :^) Edk2 has some form of profiling already (see "MdePkg/Include/Library/PerformanceLib.h"). Usually one sees core code peppered with PERF_CODE_BEGIN and PERF_CODE_END macros. I *think* there is something like a "display performance" (dp) shell application too, that can show the collected stats. But I've never used these facilities. The wiki seems to have two related articles: https://github.com/tianocore/tianocore.github.io/wiki/Edk2-Performance-Infrastructure https://github.com/tianocore/tianocore.github.io/wiki/PerformancePkg The former looks quite comprehensive, at a quick skim. Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113703): https://edk2.groups.io/g/devel/message/113703 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 9:35 ` Laszlo Ersek @ 2024-01-12 14:46 ` Pedro Falcato 2024-01-12 16:37 ` Michael D Kinney 2024-01-12 18:50 ` Andrew Fish via groups.io 0 siblings, 2 replies; 24+ messages in thread From: Pedro Falcato @ 2024-01-12 14:46 UTC (permalink / raw) To: Laszlo Ersek; +Cc: devel, nhi, ardb+tianocore@kernel.org, Andrew Fish On Fri, Jan 12, 2024 at 9:35 AM Laszlo Ersek <lersek@redhat.com> wrote: > > On 1/12/24 03:10, Pedro Falcato wrote: > > My idea was to make each handle an index - like a file descriptor - > > AFAIK there's no reason why it *needs* to be an actual pointer. > > We'd allocate indices when creating a handle, and return that (casted to VOID*). > > Huh, sneaky. > > I see two potential problems with this. > > First, per C std, these "pointers" would be invalid (they wouldn't > actually point to valid memory), so even just evaluating them (not > dereferencing them!) would invoke undefined behavior. May or may not > matter in practice. With compilers getting smarter about optimization > (and stricter about std conformance), there could be issues, perhaps. This is true. Stashing random integers in pointers is implementation-defined. But it's also super common. Win32 HANDLEs are exactly this, just integers (stashed in VOID*). The Linux kernel world also has a bunch of fun tricks with stashing flags in a pointer's bottom bits, magic pointer values, etc. I severely doubt we can run into issues with this. EDK2 will not exactly run on the C standard's abstract machine anyway ;) > > The other concern is a bit contrived, but I *guess* there could be code > out there that actually dereferences EFI_HANDLEs. That code would crash. > May or may not matter, because such code is arguably already > semantically invalid. (It would not necessarily be invalid at the > language level -- cf. my previous paragraph --, because passing an > otherwise valid EFI_HANDLE to CopyMem, for copying just 1 byte out of > the underlying opaque data structure, would not violate the language.) This is a feature, not a bug! :P Seriously though, IHANDLE is not even exposed in semi-public headers, so any code that's derefing an EFI_HANDLE will need to do something like typedef struct { /* ... */ } IHANDLE; EFI_HANDLE Handle = /* ... */; IHANDLE *HandleImpl = (IHANDLE *) Handle; and I'm a strong believer in "play stupid games, win stupid prizes". You can definitely make an argument for "this should definitely crash" instead of just "maybe crashing" (for instance, platforms that still map the NULL page (like OVMF!), or handles > 4096), so I'm inclined to think that if we indeed go this route, we should set one or two upper bits (on 64-bit platforms!) to make handles non-canonical addresses and therefore necessarily crash on dereference. > > > I should note that I find it super hard to get a concrete idea on > > performance for EFI firmware without adequate tooling - I meant to > > write a standalone flamegraph tool a few weeks back (even posted in > > edk2-devel), but, as far as I could tell, the architectural timer > > protocol couldn't get me the interrupt frame[1]. Until then, whether > > any of this radix tree vs RB tree vs flat array stuff really > > matters... I find it hard to say. > > > > [1] x86 has 3 timers (PIT, LAPIC timer, HPET) and performance > > monitoring interrupts, and I couldn't freely use any of them :^) > > Edk2 has some form of profiling already (see > "MdePkg/Include/Library/PerformanceLib.h"). Usually one sees core code > peppered with PERF_CODE_BEGIN and PERF_CODE_END macros. I *think* there > is something like a "display performance" (dp) shell application too, > that can show the collected stats. But I've never used these facilities. > > The wiki seems to have two related articles: > > https://github.com/tianocore/tianocore.github.io/wiki/Edk2-Performance-Infrastructure > > https://github.com/tianocore/tianocore.github.io/wiki/PerformancePkg > > The former looks quite comprehensive, at a quick skim. /me nods I've seen those macros around, but I've never used them. In any case, this problem has piqued my interest, I'll see if I can find some free time this weekend to hack on a test benchmark and a PoC :) -- Pedro -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113741): https://edk2.groups.io/g/devel/message/113741 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 14:46 ` Pedro Falcato @ 2024-01-12 16:37 ` Michael D Kinney 2024-01-12 18:56 ` Andrew Fish via groups.io 2024-01-12 18:50 ` Andrew Fish via groups.io 1 sibling, 1 reply; 24+ messages in thread From: Michael D Kinney @ 2024-01-12 16:37 UTC (permalink / raw) To: devel@edk2.groups.io, pedro.falcato@gmail.com, Laszlo Ersek Cc: nhi@os.amperecomputing.com, ardb+tianocore@kernel.org, Andrew Fish, Kinney, Michael D Hi Pedro, Thank you for evaluating this idea change from linked list to improve performance of the handle database. The concept of using integers for an EFI_HANDLE has been considered before. One advantage over pointers is that a guarantee can be made that an EFI_HANDLE value can be guaranteed to be unique. In the current implementation with EFI_HANDLE being a pointer to an allocated buffer, an EFI_HANDLE value could potentially be reused if an EFI_HANDLE is freed and later allocated for a different EFI_HANDLE. If you continue to explore this path I agree that EFI_HANDLE value of 0 should be reserved and never used. I also recommend that new EFI_HANDLE values are always assigned a new unique value that freed EFI_HANDLE values are never reused. The overall linked list performance of the handle database has also been noted before, but has rarely raised to the level where it impacts the overall boot performance relative to other optimization opportunities. I look forward to the performance data. The PERF_X() macros are the right service to use. On x86 it is based on the time stamp counter (TSC) which has very small granularity. Mike > -----Original Message----- > From: devel@edk2.groups.io <devel@edk2.groups.io> On Behalf Of Pedro Falcato > Sent: Friday, January 12, 2024 6:47 AM > To: Laszlo Ersek <lersek@redhat.com> > Cc: devel@edk2.groups.io; nhi@os.amperecomputing.com; > ardb+tianocore@kernel.org; Andrew Fish <afish@apple.com> > Subject: Re: [edk2-devel] Memory Attribute for depex section > > On Fri, Jan 12, 2024 at 9:35 AM Laszlo Ersek <lersek@redhat.com> wrote: > > > > On 1/12/24 03:10, Pedro Falcato wrote: > > > My idea was to make each handle an index - like a file descriptor - > > > AFAIK there's no reason why it *needs* to be an actual pointer. > > > We'd allocate indices when creating a handle, and return that (casted to > VOID*). > > > > Huh, sneaky. > > > > I see two potential problems with this. > > > > First, per C std, these "pointers" would be invalid (they wouldn't > > actually point to valid memory), so even just evaluating them (not > > dereferencing them!) would invoke undefined behavior. May or may not > > matter in practice. With compilers getting smarter about optimization > > (and stricter about std conformance), there could be issues, perhaps. > > This is true. Stashing random integers in pointers is > implementation-defined. But it's also super common. Win32 HANDLEs are > exactly this, just integers (stashed in VOID*). The Linux kernel world > also has a bunch of fun tricks with stashing flags in a pointer's > bottom bits, magic pointer values, etc. I severely doubt we can run > into issues with this. EDK2 will not exactly run on the C standard's > abstract machine anyway ;) > > > > > The other concern is a bit contrived, but I *guess* there could be code > > out there that actually dereferences EFI_HANDLEs. That code would crash. > > May or may not matter, because such code is arguably already > > semantically invalid. (It would not necessarily be invalid at the > > language level -- cf. my previous paragraph --, because passing an > > otherwise valid EFI_HANDLE to CopyMem, for copying just 1 byte out of > > the underlying opaque data structure, would not violate the language.) > > This is a feature, not a bug! :P > > Seriously though, IHANDLE is not even exposed in semi-public headers, > so any code that's derefing an EFI_HANDLE will need to do something > like > > typedef struct { > /* ... */ > } IHANDLE; > > EFI_HANDLE Handle = /* ... */; > > IHANDLE *HandleImpl = (IHANDLE *) Handle; > > and I'm a strong believer in "play stupid games, win stupid prizes". > You can definitely make an argument for "this should definitely crash" > instead of just "maybe crashing" (for instance, platforms that still > map the NULL page (like OVMF!), or handles > 4096), so I'm inclined to > think that if we indeed go this route, we should set one or two upper > bits (on 64-bit platforms!) to make handles non-canonical addresses > and therefore necessarily crash on dereference. > > > > > > I should note that I find it super hard to get a concrete idea on > > > performance for EFI firmware without adequate tooling - I meant to > > > write a standalone flamegraph tool a few weeks back (even posted in > > > edk2-devel), but, as far as I could tell, the architectural timer > > > protocol couldn't get me the interrupt frame[1]. Until then, whether > > > any of this radix tree vs RB tree vs flat array stuff really > > > matters... I find it hard to say. > > > > > > [1] x86 has 3 timers (PIT, LAPIC timer, HPET) and performance > > > monitoring interrupts, and I couldn't freely use any of them :^) > > > > Edk2 has some form of profiling already (see > > "MdePkg/Include/Library/PerformanceLib.h"). Usually one sees core code > > peppered with PERF_CODE_BEGIN and PERF_CODE_END macros. I *think* there > > is something like a "display performance" (dp) shell application too, > > that can show the collected stats. But I've never used these facilities. > > > > The wiki seems to have two related articles: > > > > https://github.com/tianocore/tianocore.github.io/wiki/Edk2-Performance- > Infrastructure > > > > https://github.com/tianocore/tianocore.github.io/wiki/PerformancePkg > > > > The former looks quite comprehensive, at a quick skim. > > /me nods > I've seen those macros around, but I've never used them. > In any case, this problem has piqued my interest, I'll see if I can > find some free time this weekend to hack on a test benchmark and a PoC > :) > > -- > Pedro > > > > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113751): https://edk2.groups.io/g/devel/message/113751 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 16:37 ` Michael D Kinney @ 2024-01-12 18:56 ` Andrew Fish via groups.io 2024-01-12 19:04 ` Michael D Kinney 0 siblings, 1 reply; 24+ messages in thread From: Andrew Fish via groups.io @ 2024-01-12 18:56 UTC (permalink / raw) To: edk2-devel-groups-io, Mike Kinney Cc: pedro.falcato@gmail.com, Laszlo Ersek, nhi@os.amperecomputing.com, ardb+tianocore@kernel.org [-- Attachment #1: Type: text/plain, Size: 6970 bytes --] > On Jan 12, 2024, at 8:37 AM, Michael D Kinney <michael.d.kinney@intel.com> wrote: > > Hi Pedro, > > Thank you for evaluating this idea change from linked list to improve > performance of the handle database. > > The concept of using integers for an EFI_HANDLE has been considered before. > One advantage over pointers is that a guarantee can be made that an EFI_HANDLE > value can be guaranteed to be unique. In the current implementation with > EFI_HANDLE being a pointer to an allocated buffer, an EFI_HANDLE value could > potentially be reused if an EFI_HANDLE is freed and later allocated for a > different EFI_HANDLE. > > If you continue to explore this path I agree that EFI_HANDLE value of 0 should > be reserved and never used. I also recommend that new EFI_HANDLE values are > always assigned a new unique value that freed EFI_HANDLE values are never reused. > > The overall linked list performance of the handle database has also been noted > before, but has rarely raised to the level where it impacts the overall boot > performance relative to other optimization opportunities. I look forward to > the performance data. The PERF_X() macros are the right service to use. On > x86 it is based on the time stamp counter (TSC) which has very small granularity. > Mike, We tracked this a while back with the PERF macros when we had some performance issues running diagnostics on a system with 3,000+ handles. The hot path was CoreValidateHandle(). I think the number of calls to CoreValidateHandle() explodes if you have more handles so it is not just the raw performance of CoreValidateHandle(), but also how many times it gets called. Thanks, Andrew Fish > Mike > > >> -----Original Message----- >> From: devel@edk2.groups.io <mailto:devel@edk2.groups.io> <devel@edk2.groups.io <mailto:devel@edk2.groups.io>> On Behalf Of Pedro Falcato >> Sent: Friday, January 12, 2024 6:47 AM >> To: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>> >> Cc: devel@edk2.groups.io <mailto:devel@edk2.groups.io>; nhi@os.amperecomputing.com <mailto:nhi@os.amperecomputing.com>; >> ardb+tianocore@kernel.org <mailto:ardb+tianocore@kernel.org>; Andrew Fish <afish@apple.com <mailto:afish@apple.com>> >> Subject: Re: [edk2-devel] Memory Attribute for depex section >> >> On Fri, Jan 12, 2024 at 9:35 AM Laszlo Ersek <lersek@redhat.com> wrote: >>> >>> On 1/12/24 03:10, Pedro Falcato wrote: >>>> My idea was to make each handle an index - like a file descriptor - >>>> AFAIK there's no reason why it *needs* to be an actual pointer. >>>> We'd allocate indices when creating a handle, and return that (casted to >> VOID*). >>> >>> Huh, sneaky. >>> >>> I see two potential problems with this. >>> >>> First, per C std, these "pointers" would be invalid (they wouldn't >>> actually point to valid memory), so even just evaluating them (not >>> dereferencing them!) would invoke undefined behavior. May or may not >>> matter in practice. With compilers getting smarter about optimization >>> (and stricter about std conformance), there could be issues, perhaps. >> >> This is true. Stashing random integers in pointers is >> implementation-defined. But it's also super common. Win32 HANDLEs are >> exactly this, just integers (stashed in VOID*). The Linux kernel world >> also has a bunch of fun tricks with stashing flags in a pointer's >> bottom bits, magic pointer values, etc. I severely doubt we can run >> into issues with this. EDK2 will not exactly run on the C standard's >> abstract machine anyway ;) >> >>> >>> The other concern is a bit contrived, but I *guess* there could be code >>> out there that actually dereferences EFI_HANDLEs. That code would crash. >>> May or may not matter, because such code is arguably already >>> semantically invalid. (It would not necessarily be invalid at the >>> language level -- cf. my previous paragraph --, because passing an >>> otherwise valid EFI_HANDLE to CopyMem, for copying just 1 byte out of >>> the underlying opaque data structure, would not violate the language.) >> >> This is a feature, not a bug! :P >> >> Seriously though, IHANDLE is not even exposed in semi-public headers, >> so any code that's derefing an EFI_HANDLE will need to do something >> like >> >> typedef struct { >> /* ... */ >> } IHANDLE; >> >> EFI_HANDLE Handle = /* ... */; >> >> IHANDLE *HandleImpl = (IHANDLE *) Handle; >> >> and I'm a strong believer in "play stupid games, win stupid prizes". >> You can definitely make an argument for "this should definitely crash" >> instead of just "maybe crashing" (for instance, platforms that still >> map the NULL page (like OVMF!), or handles > 4096), so I'm inclined to >> think that if we indeed go this route, we should set one or two upper >> bits (on 64-bit platforms!) to make handles non-canonical addresses >> and therefore necessarily crash on dereference. >> >>> >>>> I should note that I find it super hard to get a concrete idea on >>>> performance for EFI firmware without adequate tooling - I meant to >>>> write a standalone flamegraph tool a few weeks back (even posted in >>>> edk2-devel), but, as far as I could tell, the architectural timer >>>> protocol couldn't get me the interrupt frame[1]. Until then, whether >>>> any of this radix tree vs RB tree vs flat array stuff really >>>> matters... I find it hard to say. >>>> >>>> [1] x86 has 3 timers (PIT, LAPIC timer, HPET) and performance >>>> monitoring interrupts, and I couldn't freely use any of them :^) >>> >>> Edk2 has some form of profiling already (see >>> "MdePkg/Include/Library/PerformanceLib.h"). Usually one sees core code >>> peppered with PERF_CODE_BEGIN and PERF_CODE_END macros. I *think* there >>> is something like a "display performance" (dp) shell application too, >>> that can show the collected stats. But I've never used these facilities. >>> >>> The wiki seems to have two related articles: >>> >>> https://github.com/tianocore/tianocore.github.io/wiki/Edk2-Performance- >> Infrastructure >>> >>> https://github.com/tianocore/tianocore.github.io/wiki/PerformancePkg >>> >>> The former looks quite comprehensive, at a quick skim. >> >> /me nods >> I've seen those macros around, but I've never used them. >> In any case, this problem has piqued my interest, I'll see if I can >> find some free time this weekend to hack on a test benchmark and a PoC >> :) >> >> -- >> Pedro >> >> >> >> > > > > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113754): https://edk2.groups.io/g/devel/message/113754 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 24742 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 18:56 ` Andrew Fish via groups.io @ 2024-01-12 19:04 ` Michael D Kinney 2024-01-12 19:26 ` Andrew Fish via groups.io 0 siblings, 1 reply; 24+ messages in thread From: Michael D Kinney @ 2024-01-12 19:04 UTC (permalink / raw) To: Andrew (EFI) Fish, edk2-devel-groups-io Cc: pedro.falcato@gmail.com, Laszlo Ersek, nhi@os.amperecomputing.com, ardb+tianocore@kernel.org, Kinney, Michael D [-- Attachment #1: Type: text/plain, Size: 7456 bytes --] Agreed. Basically every API that takes an EF_HANDLE as input calls that API to make sure it is a valid handle. The first question is if we get value from making sure the EFI_HANDLE is a member of the active set of handles. A simple signature check in the EFI_HANDLE may be enough as long as all freed handles clear the signature. Then, the only way that the linked list walk adds value is if there it a call with an invalid handle that happens to have the matching signature. The From: Andrew (EFI) Fish <afish@apple.com> Sent: Friday, January 12, 2024 10:57 AM To: edk2-devel-groups-io <devel@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@intel.com> Cc: pedro.falcato@gmail.com; Laszlo Ersek <lersek@redhat.com>; nhi@os.amperecomputing.com; ardb+tianocore@kernel.org Subject: Re: [edk2-devel] Memory Attribute for depex section On Jan 12, 2024, at 8:37 AM, Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>> wrote: Hi Pedro, Thank you for evaluating this idea change from linked list to improve performance of the handle database. The concept of using integers for an EFI_HANDLE has been considered before. One advantage over pointers is that a guarantee can be made that an EFI_HANDLE value can be guaranteed to be unique. In the current implementation with EFI_HANDLE being a pointer to an allocated buffer, an EFI_HANDLE value could potentially be reused if an EFI_HANDLE is freed and later allocated for a different EFI_HANDLE. If you continue to explore this path I agree that EFI_HANDLE value of 0 should be reserved and never used. I also recommend that new EFI_HANDLE values are always assigned a new unique value that freed EFI_HANDLE values are never reused. The overall linked list performance of the handle database has also been noted before, but has rarely raised to the level where it impacts the overall boot performance relative to other optimization opportunities. I look forward to the performance data. The PERF_X() macros are the right service to use. On x86 it is based on the time stamp counter (TSC) which has very small granularity. Mike, We tracked this a while back with the PERF macros when we had some performance issues running diagnostics on a system with 3,000+ handles. The hot path was CoreValidateHandle(). I think the number of calls to CoreValidateHandle() explodes if you have more handles so it is not just the raw performance of CoreValidateHandle(), but also how many times it gets called. Thanks, Andrew Fish Mike -----Original Message----- From: devel@edk2.groups.io<mailto:devel@edk2.groups.io> <devel@edk2.groups.io<mailto:devel@edk2.groups.io>> On Behalf Of Pedro Falcato Sent: Friday, January 12, 2024 6:47 AM To: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>> Cc: devel@edk2.groups.io<mailto:devel@edk2.groups.io>; nhi@os.amperecomputing.com<mailto:nhi@os.amperecomputing.com>; ardb+tianocore@kernel.org<mailto:ardb+tianocore@kernel.org>; Andrew Fish <afish@apple.com<mailto:afish@apple.com>> Subject: Re: [edk2-devel] Memory Attribute for depex section On Fri, Jan 12, 2024 at 9:35 AM Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>> wrote: On 1/12/24 03:10, Pedro Falcato wrote: My idea was to make each handle an index - like a file descriptor - AFAIK there's no reason why it *needs* to be an actual pointer. We'd allocate indices when creating a handle, and return that (casted to VOID*). Huh, sneaky. I see two potential problems with this. First, per C std, these "pointers" would be invalid (they wouldn't actually point to valid memory), so even just evaluating them (not dereferencing them!) would invoke undefined behavior. May or may not matter in practice. With compilers getting smarter about optimization (and stricter about std conformance), there could be issues, perhaps. This is true. Stashing random integers in pointers is implementation-defined. But it's also super common. Win32 HANDLEs are exactly this, just integers (stashed in VOID*). The Linux kernel world also has a bunch of fun tricks with stashing flags in a pointer's bottom bits, magic pointer values, etc. I severely doubt we can run into issues with this. EDK2 will not exactly run on the C standard's abstract machine anyway ;) The other concern is a bit contrived, but I *guess* there could be code out there that actually dereferences EFI_HANDLEs. That code would crash. May or may not matter, because such code is arguably already semantically invalid. (It would not necessarily be invalid at the language level -- cf. my previous paragraph --, because passing an otherwise valid EFI_HANDLE to CopyMem, for copying just 1 byte out of the underlying opaque data structure, would not violate the language.) This is a feature, not a bug! :P Seriously though, IHANDLE is not even exposed in semi-public headers, so any code that's derefing an EFI_HANDLE will need to do something like typedef struct { /* ... */ } IHANDLE; EFI_HANDLE Handle = /* ... */; IHANDLE *HandleImpl = (IHANDLE *) Handle; and I'm a strong believer in "play stupid games, win stupid prizes". You can definitely make an argument for "this should definitely crash" instead of just "maybe crashing" (for instance, platforms that still map the NULL page (like OVMF!), or handles > 4096), so I'm inclined to think that if we indeed go this route, we should set one or two upper bits (on 64-bit platforms!) to make handles non-canonical addresses and therefore necessarily crash on dereference. I should note that I find it super hard to get a concrete idea on performance for EFI firmware without adequate tooling - I meant to write a standalone flamegraph tool a few weeks back (even posted in edk2-devel), but, as far as I could tell, the architectural timer protocol couldn't get me the interrupt frame[1]. Until then, whether any of this radix tree vs RB tree vs flat array stuff really matters... I find it hard to say. [1] x86 has 3 timers (PIT, LAPIC timer, HPET) and performance monitoring interrupts, and I couldn't freely use any of them :^) Edk2 has some form of profiling already (see "MdePkg/Include/Library/PerformanceLib.h"). Usually one sees core code peppered with PERF_CODE_BEGIN and PERF_CODE_END macros. I *think* there is something like a "display performance" (dp) shell application too, that can show the collected stats. But I've never used these facilities. The wiki seems to have two related articles: https://github.com/tianocore/tianocore.github.io/wiki/Edk2-Performance- Infrastructure https://github.com/tianocore/tianocore.github.io/wiki/PerformancePkg The former looks quite comprehensive, at a quick skim. /me nods I've seen those macros around, but I've never used them. In any case, this problem has piqued my interest, I'll see if I can find some free time this weekend to hack on a test benchmark and a PoC :) -- Pedro -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113755): https://edk2.groups.io/g/devel/message/113755 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 15247 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 19:04 ` Michael D Kinney @ 2024-01-12 19:26 ` Andrew Fish via groups.io 0 siblings, 0 replies; 24+ messages in thread From: Andrew Fish via groups.io @ 2024-01-12 19:26 UTC (permalink / raw) To: Mike Kinney Cc: edk2-devel-groups-io, pedro.falcato@gmail.com, Laszlo Ersek, nhi@os.amperecomputing.com, ardb+tianocore@kernel.org [-- Attachment #1: Type: text/plain, Size: 8674 bytes --] > On Jan 12, 2024, at 11:04 AM, Kinney, Michael D <michael.d.kinney@intel.com> wrote: > > Agreed. Basically every API that takes an EF_HANDLE as input calls that API to make sure it is a valid handle. > > The first question is if we get value from making sure the EFI_HANDLE is a member of the active set of handles. > > A simple signature check in the EFI_HANDLE may be enough as long as all freed handles clear the signature. > > Then, the only way that the linked list walk adds value is if there it a call with an invalid handle that happens to > have the matching signature. > MIke, I seem to remember we ended up fixing the way you describe locally. Even if you convert the list into a tree given the index was a pointer to allocate memory it could get reused. The tree just made the lookup go faster. If we wanted to get tricky on 64-bit systems we could encode a monotonically increasing number in the non-canonical part of the virtual address, or above the max physical address if paging is not enabled. To use the EFI_HNDLE as a pointer you just remove count and replace it with sign extend canonical address (zero in our case). We could probably define a HOB to define the bits to use given the code constructing the page tables for DXE needs to know all the rules here. Thanks, Andrew Fish > The > > From: Andrew (EFI) Fish <afish@apple.com> > Sent: Friday, January 12, 2024 10:57 AM > To: edk2-devel-groups-io <devel@edk2.groups.io>; Kinney, Michael D <michael.d.kinney@intel.com> > Cc: pedro.falcato@gmail.com; Laszlo Ersek <lersek@redhat.com>; nhi@os.amperecomputing.com; ardb+tianocore@kernel.org > Subject: Re: [edk2-devel] Memory Attribute for depex section > > > > > On Jan 12, 2024, at 8:37 AM, Michael D Kinney <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>> wrote: > > Hi Pedro, > > Thank you for evaluating this idea change from linked list to improve > performance of the handle database. > > The concept of using integers for an EFI_HANDLE has been considered before. > One advantage over pointers is that a guarantee can be made that an EFI_HANDLE > value can be guaranteed to be unique. In the current implementation with > EFI_HANDLE being a pointer to an allocated buffer, an EFI_HANDLE value could > potentially be reused if an EFI_HANDLE is freed and later allocated for a > different EFI_HANDLE. > > If you continue to explore this path I agree that EFI_HANDLE value of 0 should > be reserved and never used. I also recommend that new EFI_HANDLE values are > always assigned a new unique value that freed EFI_HANDLE values are never reused. > > The overall linked list performance of the handle database has also been noted > before, but has rarely raised to the level where it impacts the overall boot > performance relative to other optimization opportunities. I look forward to > the performance data. The PERF_X() macros are the right service to use. On > x86 it is based on the time stamp counter (TSC) which has very small granularity. > > > Mike, > > We tracked this a while back with the PERF macros when we had some performance issues running diagnostics on a system with 3,000+ handles. The hot path was CoreValidateHandle(). I think the number of calls to CoreValidateHandle() explodes if you have more handles so it is not just the raw performance of CoreValidateHandle(), but also how many times it gets called. > > Thanks, > > Andrew Fish > > > Mike > > > > -----Original Message----- > From: devel@edk2.groups.io <mailto:devel@edk2.groups.io> <devel@edk2.groups.io <mailto:devel@edk2.groups.io>> On Behalf Of Pedro Falcato > Sent: Friday, January 12, 2024 6:47 AM > To: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>> > Cc: devel@edk2.groups.io <mailto:devel@edk2.groups.io>; nhi@os.amperecomputing.com <mailto:nhi@os.amperecomputing.com>; > ardb+tianocore@kernel.org <mailto:ardb+tianocore@kernel.org>; Andrew Fish <afish@apple.com <mailto:afish@apple.com>> > Subject: Re: [edk2-devel] Memory Attribute for depex section > > On Fri, Jan 12, 2024 at 9:35 AM Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>> wrote: > > > On 1/12/24 03:10, Pedro Falcato wrote: > > My idea was to make each handle an index - like a file descriptor - > AFAIK there's no reason why it *needs* to be an actual pointer. > We'd allocate indices when creating a handle, and return that (casted to > VOID*). > > > Huh, sneaky. > > I see two potential problems with this. > > First, per C std, these "pointers" would be invalid (they wouldn't > actually point to valid memory), so even just evaluating them (not > dereferencing them!) would invoke undefined behavior. May or may not > matter in practice. With compilers getting smarter about optimization > (and stricter about std conformance), there could be issues, perhaps. > > This is true. Stashing random integers in pointers is > implementation-defined. But it's also super common. Win32 HANDLEs are > exactly this, just integers (stashed in VOID*). The Linux kernel world > also has a bunch of fun tricks with stashing flags in a pointer's > bottom bits, magic pointer values, etc. I severely doubt we can run > into issues with this. EDK2 will not exactly run on the C standard's > abstract machine anyway ;) > > > > The other concern is a bit contrived, but I *guess* there could be code > out there that actually dereferences EFI_HANDLEs. That code would crash. > May or may not matter, because such code is arguably already > semantically invalid. (It would not necessarily be invalid at the > language level -- cf. my previous paragraph --, because passing an > otherwise valid EFI_HANDLE to CopyMem, for copying just 1 byte out of > the underlying opaque data structure, would not violate the language.) > > This is a feature, not a bug! :P > > Seriously though, IHANDLE is not even exposed in semi-public headers, > so any code that's derefing an EFI_HANDLE will need to do something > like > > typedef struct { > /* ... */ > } IHANDLE; > > EFI_HANDLE Handle = /* ... */; > > IHANDLE *HandleImpl = (IHANDLE *) Handle; > > and I'm a strong believer in "play stupid games, win stupid prizes". > You can definitely make an argument for "this should definitely crash" > instead of just "maybe crashing" (for instance, platforms that still > map the NULL page (like OVMF!), or handles > 4096), so I'm inclined to > think that if we indeed go this route, we should set one or two upper > bits (on 64-bit platforms!) to make handles non-canonical addresses > and therefore necessarily crash on dereference. > > > > > I should note that I find it super hard to get a concrete idea on > performance for EFI firmware without adequate tooling - I meant to > write a standalone flamegraph tool a few weeks back (even posted in > edk2-devel), but, as far as I could tell, the architectural timer > protocol couldn't get me the interrupt frame[1]. Until then, whether > any of this radix tree vs RB tree vs flat array stuff really > matters... I find it hard to say. > > [1] x86 has 3 timers (PIT, LAPIC timer, HPET) and performance > monitoring interrupts, and I couldn't freely use any of them :^) > > Edk2 has some form of profiling already (see > "MdePkg/Include/Library/PerformanceLib.h"). Usually one sees core code > peppered with PERF_CODE_BEGIN and PERF_CODE_END macros. I *think* there > is something like a "display performance" (dp) shell application too, > that can show the collected stats. But I've never used these facilities. > > The wiki seems to have two related articles: > > https://github.com/tianocore/tianocore.github.io/wiki/Edk2-Performance- > Infrastructure > > > https://github.com/tianocore/tianocore.github.io/wiki/PerformancePkg > > The former looks quite comprehensive, at a quick skim. > > /me nods > I've seen those macros around, but I've never used them. > In any case, this problem has piqued my interest, I'll see if I can > find some free time this weekend to hack on a test benchmark and a PoC > :) > > -- > Pedro > > > > > > > > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113775): https://edk2.groups.io/g/devel/message/113775 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 18617 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 14:46 ` Pedro Falcato 2024-01-12 16:37 ` Michael D Kinney @ 2024-01-12 18:50 ` Andrew Fish via groups.io 1 sibling, 0 replies; 24+ messages in thread From: Andrew Fish via groups.io @ 2024-01-12 18:50 UTC (permalink / raw) To: Pedro Falcato Cc: Laszlo Ersek, edk2-devel-groups-io, nhi, ardb+tianocore@kernel.org > On Jan 12, 2024, at 6:46 AM, Pedro Falcato <pedro.falcato@gmail.com> wrote: > > On Fri, Jan 12, 2024 at 9:35 AM Laszlo Ersek <lersek@redhat.com> wrote: >> >> On 1/12/24 03:10, Pedro Falcato wrote: >>> My idea was to make each handle an index - like a file descriptor - >>> AFAIK there's no reason why it *needs* to be an actual pointer. >>> We'd allocate indices when creating a handle, and return that (casted to VOID*). >> >> Huh, sneaky. >> >> I see two potential problems with this. >> >> First, per C std, these "pointers" would be invalid (they wouldn't >> actually point to valid memory), so even just evaluating them (not >> dereferencing them!) would invoke undefined behavior. May or may not >> matter in practice. With compilers getting smarter about optimization >> (and stricter about std conformance), there could be issues, perhaps. > > This is true. Stashing random integers in pointers is > implementation-defined. But it's also super common. Win32 HANDLEs are > exactly this, just integers (stashed in VOID*). The Linux kernel world > also has a bunch of fun tricks with stashing flags in a pointer's > bottom bits, magic pointer values, etc. I severely doubt we can run > into issues with this. EDK2 will not exactly run on the C standard's > abstract machine anyway ;) > I’d point out that CPUs we support do magic things with bits of pointers. ARMv8.3 defines pointer authentication and on a Mac arm64e is an ABI with pointer authentication enabled. Given call return rules are different and you have to initialize signed pointers it is a different ABI, but never say never that a signed pointer ABI could get added to UEFI. Thanks, Andrew Fish >> >> The other concern is a bit contrived, but I *guess* there could be code >> out there that actually dereferences EFI_HANDLEs. That code would crash. >> May or may not matter, because such code is arguably already >> semantically invalid. (It would not necessarily be invalid at the >> language level -- cf. my previous paragraph --, because passing an >> otherwise valid EFI_HANDLE to CopyMem, for copying just 1 byte out of >> the underlying opaque data structure, would not violate the language.) > > This is a feature, not a bug! :P > > Seriously though, IHANDLE is not even exposed in semi-public headers, > so any code that's derefing an EFI_HANDLE will need to do something > like > > typedef struct { > /* ... */ > } IHANDLE; > > EFI_HANDLE Handle = /* ... */; > > IHANDLE *HandleImpl = (IHANDLE *) Handle; > > and I'm a strong believer in "play stupid games, win stupid prizes". > You can definitely make an argument for "this should definitely crash" > instead of just "maybe crashing" (for instance, platforms that still > map the NULL page (like OVMF!), or handles > 4096), so I'm inclined to > think that if we indeed go this route, we should set one or two upper > bits (on 64-bit platforms!) to make handles non-canonical addresses > and therefore necessarily crash on dereference. > >> >>> I should note that I find it super hard to get a concrete idea on >>> performance for EFI firmware without adequate tooling - I meant to >>> write a standalone flamegraph tool a few weeks back (even posted in >>> edk2-devel), but, as far as I could tell, the architectural timer >>> protocol couldn't get me the interrupt frame[1]. Until then, whether >>> any of this radix tree vs RB tree vs flat array stuff really >>> matters... I find it hard to say. >>> >>> [1] x86 has 3 timers (PIT, LAPIC timer, HPET) and performance >>> monitoring interrupts, and I couldn't freely use any of them :^) >> >> Edk2 has some form of profiling already (see >> "MdePkg/Include/Library/PerformanceLib.h"). Usually one sees core code >> peppered with PERF_CODE_BEGIN and PERF_CODE_END macros. I *think* there >> is something like a "display performance" (dp) shell application too, >> that can show the collected stats. But I've never used these facilities. >> >> The wiki seems to have two related articles: >> >> https://github.com/tianocore/tianocore.github.io/wiki/Edk2-Performance-Infrastructure >> >> https://github.com/tianocore/tianocore.github.io/wiki/PerformancePkg >> >> The former looks quite comprehensive, at a quick skim. > > /me nods > I've seen those macros around, but I've never used them. > In any case, this problem has piqued my interest, I'll see if I can > find some free time this weekend to hack on a test benchmark and a PoC > :) > > -- > Pedro -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113753): https://edk2.groups.io/g/devel/message/113753 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-10 13:45 ` Laszlo Ersek 2024-01-10 21:50 ` Pedro Falcato @ 2024-01-12 2:44 ` Andrew Fish via groups.io 2024-01-12 9:45 ` Laszlo Ersek 1 sibling, 1 reply; 24+ messages in thread From: Andrew Fish via groups.io @ 2024-01-12 2:44 UTC (permalink / raw) To: edk2-devel-groups-io, Laszlo Ersek; +Cc: nhi, ardb+tianocore@kernel.org [-- Attachment #1: Type: text/plain, Size: 4964 bytes --] Sorry need some more time to digest this…. First thoughts. 1) The actual performance issue we hit was the explosion of CoreValidateHandle() calls as the number of protocols got large for some diags. The newer handles tended to be at the end of the list if I remember correctly. a) It looks like CoreValidateHandle() is the only place gHandleList was walked, as the handle info is crossed referenced in the protocol database. So that is why we changed that. 2) If the issue at hand is MM why not just drop the optimizations? b) If we have so many MM protocols and handles that seems like its own problem? 3) The issue is patching the grammar in place, why can’t we just make a copy for the dispatcher grammer, and operate on the copy. Maybe via a copy on 1st update strategy? Thanks, Andrew Fish > On Jan 10, 2024, at 5:45 AM, Laszlo Ersek <lersek@redhat.com> wrote: > > (+ Andrew) > > On 1/10/24 14:09, Laszlo Ersek wrote: > >> I think that keeping the depex section read-only is valuable, so I'd >> rule out #2. I'd also not start with option #1 -- copying the depex to >> heap memory, just so this optimization can succeed. I'd simply start by >> removing the optimization, and measuring how much driver dispatch slows >> down in practice, on various platforms. >> >> Can you try this? (I have only build-tested and "uncrustified" it.) >> >> The patch removes the EFI_DEP_REPLACE_TRUE handling altogether, plus it >> CONST-ifies the Iterator pointer (which points into the DEPEX section), >> so that the compiler catch any possible accesses at *build time* that >> would write to the write-protected DEPEX memory area. > > On a tangent: the optimization in question highlights a more general > problem, namely that the DXE (and possibly MM/SMM) protocol databases > are considered slow, for some access patterns. > > Edk2 implements those protocol databases with linked lists, where lookup > costs O(n) operations (average and worst cases). And protocol lookups > are quite frequent (for example, in depex evaluations, they could be > considered "particularly frequent"). > > (1) The "Tasks" wiki page mentions something *similar* (but not the > same); see > > https://github.com/tianocore/tianocore.github.io/wiki/Tasks#datahub--gcd-scalability > > The description is: "The DXE core's DataHub and GCD (Global Coherency > Domain) layers don't scale well as the number of data items gets large, > since they are based on simple linked lists. Find better data structures." > > The same might apply more or less to the protocol database implementation. > > (2) More to the point, Andrew Fish reported earlier that at Apple, they > had rewritten the DXE protocol database, using the red-black tree > OrderedCollectionLib that I had contributed previously to edk2 -- and > they saw significant performance improvements. > > So upstreaming that feature to edk2 could be very valuable. (Red-black > trees have O(log(n)) time cost (worst case) for lookup, insertion and > deletion, and O(n) cost for iterating through the whole data structure.) > > Let me see if I can find the bugzilla ticket... > > Ah, I got it. Apologies, I misremembered: Andrew's comment was not about > the protocol database, but about the handle database. Here it is: > > https://bugzilla.tianocore.org/show_bug.cgi?id=988#c7 > > (the BZ is still CONFIRMED btw...) > > Still, I think it must be related in a way. Namely, an EFI handle exists > if and only if at least one protocol interface is installed on it. If > you uninstall the last protocol interface from a handle, then the handle > is destroyed -- in fact that's the *only* way to destroy a handle, to my > understanding. See EFI_BOOT_SERVICES.UninstallProtocolInterface() in the > UEFI spec: "If the last protocol interface is removed from a handle, the > handle is freed and is no longer valid". Furthermore, calling > InstallProtocolInterface() and InstallMultipleProtocolInterfaces() is > how one *creates* new handles. > > So given how handles and protocol interfaces are conceptually > interlinked, an rbtree-based protocol DB might have to maintain multiple > rbtrees internally (for the ability to search the database quickly with > different types of "keys"). I don't have a design ready in my mind at > all (I'm not that familiar with the *current*, list-based implementation > to begin with!). Upstreaming Apple's (experimental?) code could prove > very helpful. > > Laszlo > > > > > > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113644): https://edk2.groups.io/g/devel/message/113644 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 6116 bytes --] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 2:44 ` Andrew Fish via groups.io @ 2024-01-12 9:45 ` Laszlo Ersek 2024-01-15 13:07 ` Nhi Pham via groups.io 0 siblings, 1 reply; 24+ messages in thread From: Laszlo Ersek @ 2024-01-12 9:45 UTC (permalink / raw) To: Andrew (EFI) Fish, edk2-devel-groups-io; +Cc: nhi, ardb+tianocore@kernel.org On 1/12/24 03:44, Andrew (EFI) Fish wrote: > Sorry need some more time to digest this…. First thoughts. > > 1) The actual performance issue we hit was the explosion > of CoreValidateHandle() calls as the number of protocols got large for > some diags. The newer handles tended to be at the end of the list if I > remember correctly. > a) It looks like CoreValidateHandle() is the only > place *gHandleList* was walked, as the handle info is crossed > referenced in the protocol database. So that is why we changed that. > 2) If the issue at hand is MM why not just drop the optimizations? Yes, that's the first (and easy) idea. But I guess it needs some measurements (no perf regression). It would be nice to know whether Intel (?) measured any serious perf gains when they originally implemented (in the 2000s?) the optimization. > b) If we have so many MM protocols and handles that seems like its own > problem? I would agree; however, IIRC, the depex evaluator for the MM drivers also searches the UEFI (DXE) protocol database, not just the MM protocol database. (Independently: I think that's a valid thing to do for *SMM* drivers, because the entry point functions of those drivers are permitted to use both SMM and DXE/UEFI protocols. But whether the same is valid for the *standalone* MM drivers -- that looks questionable. Standalone MM drivers should not depend on UEFI/DXE protocols ever, IIUC.) > 3) The issue is patching the grammar in place, why can’t we just make a > copy for the dispatcher grammer, and operate on the copy. Maybe via a > copy on 1st update strategy? Yes, copying the depex to the heap, and patching it there, was Nhi's #1 fix proposal. I think that could be made work. But I'm not sure if the perf savings are worth the additional complexity. The heap allocation (where the writeable depex would exist) would have to be permanently associated with the loaded PE image -- because the dispatcher might need to reevaluate the depex across multiple rounds of dispatching. So that's a new field in some image-related structure, it also needs to be freed upon unload (?), what if the memory allocation fails during depex eval (just consider the depex to eval to FALSE?), etc. Doable, but hairy; not sure if the perf is worth that effort. Thanks! Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113704): https://edk2.groups.io/g/devel/message/113704 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-12 9:45 ` Laszlo Ersek @ 2024-01-15 13:07 ` Nhi Pham via groups.io 2024-01-15 14:04 ` Ard Biesheuvel 0 siblings, 1 reply; 24+ messages in thread From: Nhi Pham via groups.io @ 2024-01-15 13:07 UTC (permalink / raw) To: Laszlo Ersek, Andrew (EFI) Fish, edk2-devel-groups-io Cc: ardb+tianocore@kernel.org On 1/12/2024 4:45 PM, Laszlo Ersek wrote: > (Independently: I think that's a valid thing to do for *SMM* drivers, > because the entry point functions of those drivers are permitted to use > both SMM and DXE/UEFI protocols. But whether the same is valid for the > *standalone* MM drivers -- that looks questionable. Standalone MM > drivers should not depend on UEFI/DXE protocols ever, IIUC.) > >> 3) The issue is patching the grammar in place, why can’t we just make a >> copy for the dispatcher grammer, and operate on the copy. Maybe via a >> copy on 1st update strategy? > > Yes, copying the depex to the heap, and patching it there, was Nhi's #1 > fix proposal. I think that could be made work. But I'm not sure if the > perf savings are worth the additional complexity. The heap allocation > (where the writeable depex would exist) would have to be permanently > associated with the loaded PE image -- because the dispatcher might need > to reevaluate the depex across multiple rounds of dispatching. So that's > a new field in some image-related structure, it also needs to be freed > upon unload (?), what if the memory allocation fails during depex eval > (just consider the depex to eval to FALSE?), etc. Doable, but hairy; not > sure if the perf is worth that effort. > Thanks so much, Laszlo for your valuable insights. The approach #1 works for me. I will do further check for your concerns above. I'm trying your suggested patch and investigating the performance being discussed here. Regards, Nhi -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113821): https://edk2.groups.io/g/devel/message/113821 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-15 13:07 ` Nhi Pham via groups.io @ 2024-01-15 14:04 ` Ard Biesheuvel 2024-01-15 19:00 ` Laszlo Ersek 0 siblings, 1 reply; 24+ messages in thread From: Ard Biesheuvel @ 2024-01-15 14:04 UTC (permalink / raw) To: Nhi Pham Cc: Laszlo Ersek, Andrew (EFI) Fish, edk2-devel-groups-io, ardb+tianocore@kernel.org On Mon, 15 Jan 2024 at 14:07, Nhi Pham <nhi@os.amperecomputing.com> wrote: > > On 1/12/2024 4:45 PM, Laszlo Ersek wrote: > > (Independently: I think that's a valid thing to do for *SMM* drivers, > > because the entry point functions of those drivers are permitted to use > > both SMM and DXE/UEFI protocols. But whether the same is valid for the > > *standalone* MM drivers -- that looks questionable. Standalone MM > > drivers should not depend on UEFI/DXE protocols ever, IIUC.) > > > >> 3) The issue is patching the grammar in place, why can’t we just make a > >> copy for the dispatcher grammer, and operate on the copy. Maybe via a > >> copy on 1st update strategy? > > > > Yes, copying the depex to the heap, and patching it there, was Nhi's #1 > > fix proposal. I think that could be made work. But I'm not sure if the > > perf savings are worth the additional complexity. The heap allocation > > (where the writeable depex would exist) would have to be permanently > > associated with the loaded PE image -- because the dispatcher might need > > to reevaluate the depex across multiple rounds of dispatching. So that's > > a new field in some image-related structure, it also needs to be freed > > upon unload (?), what if the memory allocation fails during depex eval > > (just consider the depex to eval to FALSE?), etc. Doable, but hairy; not > > sure if the perf is worth that effort. > > > > Thanks so much, Laszlo for your valuable insights. > > The approach #1 works for me. I will do further check for your concerns > above. > > I'm trying your suggested patch and investigating the performance being > discussed here. > Not sure what approach #1 means, but I'd prefer to just remove this optimization from standalone MM, given that not only a) it shouldn't have to deal with a large number of protocol GUIDs, but also b) the driver dispatch is much more straight-forward. (Typically, StMM drivers can be dispatched in the order they appear in the firmware volume, in which case each DEPEX is evaluated only once anyway) -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113822): https://edk2.groups.io/g/devel/message/113822 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-15 14:04 ` Ard Biesheuvel @ 2024-01-15 19:00 ` Laszlo Ersek 2024-01-18 6:00 ` Nhi Pham via groups.io 0 siblings, 1 reply; 24+ messages in thread From: Laszlo Ersek @ 2024-01-15 19:00 UTC (permalink / raw) To: Ard Biesheuvel, Nhi Pham Cc: Andrew (EFI) Fish, edk2-devel-groups-io, ardb+tianocore@kernel.org On 1/15/24 15:04, Ard Biesheuvel wrote: > On Mon, 15 Jan 2024 at 14:07, Nhi Pham <nhi@os.amperecomputing.com> wrote: >> >> On 1/12/2024 4:45 PM, Laszlo Ersek wrote: >>> (Independently: I think that's a valid thing to do for *SMM* drivers, >>> because the entry point functions of those drivers are permitted to use >>> both SMM and DXE/UEFI protocols. But whether the same is valid for the >>> *standalone* MM drivers -- that looks questionable. Standalone MM >>> drivers should not depend on UEFI/DXE protocols ever, IIUC.) >>> >>>> 3) The issue is patching the grammar in place, why can’t we just make a >>>> copy for the dispatcher grammer, and operate on the copy. Maybe via a >>>> copy on 1st update strategy? >>> >>> Yes, copying the depex to the heap, and patching it there, was Nhi's #1 >>> fix proposal. I think that could be made work. But I'm not sure if the >>> perf savings are worth the additional complexity. The heap allocation >>> (where the writeable depex would exist) would have to be permanently >>> associated with the loaded PE image -- because the dispatcher might need >>> to reevaluate the depex across multiple rounds of dispatching. So that's >>> a new field in some image-related structure, it also needs to be freed >>> upon unload (?), what if the memory allocation fails during depex eval >>> (just consider the depex to eval to FALSE?), etc. Doable, but hairy; not >>> sure if the perf is worth that effort. >>> >> >> Thanks so much, Laszlo for your valuable insights. >> >> The approach #1 works for me. I will do further check for your concerns >> above. >> >> I'm trying your suggested patch and investigating the performance being >> discussed here. >> > > Not sure what approach #1 means, (copying the depex to the heap, and maintaining it there, so that it can be patched) > but I'd prefer to just remove this > optimization from standalone MM, given that not only a) it shouldn't > have to deal with a large number of protocol GUIDs, but also b) the > driver dispatch is much more straight-forward. (Typically, StMM > drivers can be dispatched in the order they appear in the firmware > volume, in which case each DEPEX is evaluated only once anyway) Sounds like a promising basis for removing the optimization indeed! Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113844): https://edk2.groups.io/g/devel/message/113844 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-15 19:00 ` Laszlo Ersek @ 2024-01-18 6:00 ` Nhi Pham via groups.io 2024-01-18 14:49 ` Laszlo Ersek 0 siblings, 1 reply; 24+ messages in thread From: Nhi Pham via groups.io @ 2024-01-18 6:00 UTC (permalink / raw) To: Laszlo Ersek, Ard Biesheuvel Cc: Andrew (EFI) Fish, edk2-devel-groups-io, ardb+tianocore@kernel.org Hi Laszlo, On 1/16/2024 2:00 AM, Laszlo Ersek wrote: > On 1/15/24 15:04, Ard Biesheuvel wrote: >> On Mon, 15 Jan 2024 at 14:07, Nhi Pham <nhi@os.amperecomputing.com> wrote: >>> >>> On 1/12/2024 4:45 PM, Laszlo Ersek wrote: >>>> (Independently: I think that's a valid thing to do for *SMM* drivers, >>>> because the entry point functions of those drivers are permitted to use >>>> both SMM and DXE/UEFI protocols. But whether the same is valid for the >>>> *standalone* MM drivers -- that looks questionable. Standalone MM >>>> drivers should not depend on UEFI/DXE protocols ever, IIUC.) >>>> >>>>> 3) The issue is patching the grammar in place, why can’t we just make a >>>>> copy for the dispatcher grammer, and operate on the copy. Maybe via a >>>>> copy on 1st update strategy? >>>> >>>> Yes, copying the depex to the heap, and patching it there, was Nhi's #1 >>>> fix proposal. I think that could be made work. But I'm not sure if the >>>> perf savings are worth the additional complexity. The heap allocation >>>> (where the writeable depex would exist) would have to be permanently >>>> associated with the loaded PE image -- because the dispatcher might need >>>> to reevaluate the depex across multiple rounds of dispatching. So that's >>>> a new field in some image-related structure, it also needs to be freed >>>> upon unload (?), what if the memory allocation fails during depex eval >>>> (just consider the depex to eval to FALSE?), etc. Doable, but hairy; not >>>> sure if the perf is worth that effort. >>>> >>> >>> Thanks so much, Laszlo for your valuable insights. >>> >>> The approach #1 works for me. I will do further check for your concerns >>> above. >>> >>> I'm trying your suggested patch and investigating the performance being >>> discussed here. >>> >> >> Not sure what approach #1 means, > > (copying the depex to the heap, and maintaining it there, so that it can > be patched) Thanks! > >> but I'd prefer to just remove this >> optimization from standalone MM, given that not only a) it shouldn't >> have to deal with a large number of protocol GUIDs, but also b) the >> driver dispatch is much more straight-forward. (Typically, StMM >> drivers can be dispatched in the order they appear in the firmware >> volume, in which case each DEPEX is evaluated only once anyway) > > Sounds like a promising basis for removing the optimization indeed! > Your patch suggested earlier works for me. And I don't see significant performance reduction compared with keeping optimization. I don't have strong reason on removing the optimization, but I think it would be simply good for now. Could you post your patch to edk2-devel for review and merge? Thanks, Nhi -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113986): https://edk2.groups.io/g/devel/message/113986 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-18 6:00 ` Nhi Pham via groups.io @ 2024-01-18 14:49 ` Laszlo Ersek 2024-01-19 4:43 ` Nhi Pham via groups.io 0 siblings, 1 reply; 24+ messages in thread From: Laszlo Ersek @ 2024-01-18 14:49 UTC (permalink / raw) To: Nhi Pham, Ard Biesheuvel Cc: Andrew (EFI) Fish, edk2-devel-groups-io, ardb+tianocore@kernel.org On 1/18/24 07:00, Nhi Pham wrote: > Hi Laszlo, > > On 1/16/2024 2:00 AM, Laszlo Ersek wrote: >> On 1/15/24 15:04, Ard Biesheuvel wrote: >>> On Mon, 15 Jan 2024 at 14:07, Nhi Pham <nhi@os.amperecomputing.com> >>> wrote: >>>> >>>> On 1/12/2024 4:45 PM, Laszlo Ersek wrote: >>>>> (Independently: I think that's a valid thing to do for *SMM* drivers, >>>>> because the entry point functions of those drivers are permitted to >>>>> use >>>>> both SMM and DXE/UEFI protocols. But whether the same is valid for the >>>>> *standalone* MM drivers -- that looks questionable. Standalone MM >>>>> drivers should not depend on UEFI/DXE protocols ever, IIUC.) >>>>> >>>>>> 3) The issue is patching the grammar in place, why can’t we just >>>>>> make a >>>>>> copy for the dispatcher grammer, and operate on the copy. Maybe via a >>>>>> copy on 1st update strategy? >>>>> >>>>> Yes, copying the depex to the heap, and patching it there, was >>>>> Nhi's #1 >>>>> fix proposal. I think that could be made work. But I'm not sure if the >>>>> perf savings are worth the additional complexity. The heap allocation >>>>> (where the writeable depex would exist) would have to be permanently >>>>> associated with the loaded PE image -- because the dispatcher might >>>>> need >>>>> to reevaluate the depex across multiple rounds of dispatching. So >>>>> that's >>>>> a new field in some image-related structure, it also needs to be freed >>>>> upon unload (?), what if the memory allocation fails during depex eval >>>>> (just consider the depex to eval to FALSE?), etc. Doable, but >>>>> hairy; not >>>>> sure if the perf is worth that effort. >>>>> >>>> >>>> Thanks so much, Laszlo for your valuable insights. >>>> >>>> The approach #1 works for me. I will do further check for your concerns >>>> above. >>>> >>>> I'm trying your suggested patch and investigating the performance being >>>> discussed here. >>>> >>> >>> Not sure what approach #1 means, >> >> (copying the depex to the heap, and maintaining it there, so that it can >> be patched) > > Thanks! > >> >>> but I'd prefer to just remove this >>> optimization from standalone MM, given that not only a) it shouldn't >>> have to deal with a large number of protocol GUIDs, but also b) the >>> driver dispatch is much more straight-forward. (Typically, StMM >>> drivers can be dispatched in the order they appear in the firmware >>> volume, in which case each DEPEX is evaluated only once anyway) >> >> Sounds like a promising basis for removing the optimization indeed! >> > > Your patch suggested earlier works for me. And I don't see significant > performance reduction compared with keeping optimization. > > I don't have strong reason on removing the optimization, but I think it > would be simply good for now. Could you post your patch to edk2-devel > for review and merge? That wouldn't be correct; I don't have any platform for testing StMM. I proposed the patch purely based on code analysis. I prefer not to post untested patches, if I can avoid it. You can however post my patch; simply add your S-o-b at the bottom. You can also preserve my authorship on the patch with --author=... on git-commit; but even that is unnecessary for such a simple patch (you don't even have to pick the patch up from the email, it's trivial to reimplement from scratch, just reading the email). Thanks Laszlo -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#113994): https://edk2.groups.io/g/devel/message/113994 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-18 14:49 ` Laszlo Ersek @ 2024-01-19 4:43 ` Nhi Pham via groups.io 2024-01-19 13:54 ` Laszlo Ersek 0 siblings, 1 reply; 24+ messages in thread From: Nhi Pham via groups.io @ 2024-01-19 4:43 UTC (permalink / raw) To: Laszlo Ersek, Ard Biesheuvel Cc: Andrew (EFI) Fish, edk2-devel-groups-io, ardb+tianocore@kernel.org On 1/18/2024 9:49 PM, Laszlo Ersek wrote: >>>> but I'd prefer to just remove this >>>> optimization from standalone MM, given that not only a) it shouldn't >>>> have to deal with a large number of protocol GUIDs, but also b) the >>>> driver dispatch is much more straight-forward. (Typically, StMM >>>> drivers can be dispatched in the order they appear in the firmware >>>> volume, in which case each DEPEX is evaluated only once anyway) >>> >>> Sounds like a promising basis for removing the optimization indeed! >>> >> >> Your patch suggested earlier works for me. And I don't see significant >> performance reduction compared with keeping optimization. >> >> I don't have strong reason on removing the optimization, but I think it >> would be simply good for now. Could you post your patch to edk2-devel >> for review and merge? > > That wouldn't be correct; I don't have any platform for testing StMM. I > proposed the patch purely based on code analysis. I prefer not to post > untested patches, if I can avoid it. I got it, thanks! I thought I could give Tested-by tag when you post the patch since I already verified the patch on a StMM platform > > You can however post my patch; simply add your S-o-b at the bottom. You > can also preserve my authorship on the patch with --author=... on > git-commit; but even that is unnecessary for such a simple patch (you > don't even have to pick the patch up from the email, it's trivial to > reimplement from scratch, just reading the email). I'm going to send the patch to edk2-devel and keep your authorship on the patch because there is no change compared with your suggestion in the email. Thanks, Nhi -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#114024): https://edk2.groups.io/g/devel/message/114024 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [edk2-devel] Memory Attribute for depex section 2024-01-19 4:43 ` Nhi Pham via groups.io @ 2024-01-19 13:54 ` Laszlo Ersek 0 siblings, 0 replies; 24+ messages in thread From: Laszlo Ersek @ 2024-01-19 13:54 UTC (permalink / raw) To: Nhi Pham, Ard Biesheuvel Cc: Andrew (EFI) Fish, edk2-devel-groups-io, ardb+tianocore@kernel.org On 1/19/24 05:43, Nhi Pham wrote: > On 1/18/2024 9:49 PM, Laszlo Ersek wrote: >>>>> but I'd prefer to just remove this >>>>> optimization from standalone MM, given that not only a) it shouldn't >>>>> have to deal with a large number of protocol GUIDs, but also b) the >>>>> driver dispatch is much more straight-forward. (Typically, StMM >>>>> drivers can be dispatched in the order they appear in the firmware >>>>> volume, in which case each DEPEX is evaluated only once anyway) >>>> >>>> Sounds like a promising basis for removing the optimization indeed! >>>> >>> >>> Your patch suggested earlier works for me. And I don't see significant >>> performance reduction compared with keeping optimization. >>> >>> I don't have strong reason on removing the optimization, but I think it >>> would be simply good for now. Could you post your patch to edk2-devel >>> for review and merge? >> >> That wouldn't be correct; I don't have any platform for testing StMM. I >> proposed the patch purely based on code analysis. I prefer not to post >> untested patches, if I can avoid it. > > I got it, thanks! > > I thought I could give Tested-by tag when you post the patch since I > already verified the patch on a StMM platform Oh, that seems to have eluded me, sorry -- it's not uncommon that a patch submitter needs help with testing (external testing) on *some* platforms, beside his or her main platform(s) of interest. What's quite uncommon though is total untestability for the submitter; I admit I drew a blank there. What you mention would have been viable. Thank you for posting the patch, ultimately! Laszlo > >> >> You can however post my patch; simply add your S-o-b at the bottom. You >> can also preserve my authorship on the patch with --author=... on >> git-commit; but even that is unnecessary for such a simple patch (you >> don't even have to pick the patch up from the email, it's trivial to >> reimplement from scratch, just reading the email). > > I'm going to send the patch to edk2-devel and keep your authorship on > the patch because there is no change compared with your suggestion in > the email. > > Thanks, > Nhi > -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#114048): https://edk2.groups.io/g/devel/message/114048 Mute This Topic: https://groups.io/mt/103594587/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/leave/12367111/7686176/1913456212/xyzzy [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-01-19 13:55 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-08 10:11 [edk2-devel] Memory Attribute for depex section Nhi Pham via groups.io 2024-01-10 0:48 ` Nhi Pham via groups.io 2024-01-10 13:09 ` Laszlo Ersek 2024-01-10 13:45 ` Laszlo Ersek 2024-01-10 21:50 ` Pedro Falcato 2024-01-11 8:46 ` Laszlo Ersek 2024-01-11 11:03 ` Laszlo Ersek 2024-01-12 2:10 ` Pedro Falcato 2024-01-12 9:35 ` Laszlo Ersek 2024-01-12 14:46 ` Pedro Falcato 2024-01-12 16:37 ` Michael D Kinney 2024-01-12 18:56 ` Andrew Fish via groups.io 2024-01-12 19:04 ` Michael D Kinney 2024-01-12 19:26 ` Andrew Fish via groups.io 2024-01-12 18:50 ` Andrew Fish via groups.io 2024-01-12 2:44 ` Andrew Fish via groups.io 2024-01-12 9:45 ` Laszlo Ersek 2024-01-15 13:07 ` Nhi Pham via groups.io 2024-01-15 14:04 ` Ard Biesheuvel 2024-01-15 19:00 ` Laszlo Ersek 2024-01-18 6:00 ` Nhi Pham via groups.io 2024-01-18 14:49 ` Laszlo Ersek 2024-01-19 4:43 ` Nhi Pham via groups.io 2024-01-19 13:54 ` Laszlo Ersek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox