* [RFC] Incorrect memory ordering in ReleaseSpinLock() @ 2021-01-06 11:29 sunguk-bin 2021-01-06 13:27 ` Ard Biesheuvel 0 siblings, 1 reply; 3+ messages in thread From: sunguk-bin @ 2021-01-06 11:29 UTC (permalink / raw) To: ard.biesheuvel@arm.com, devel@edk2.groups.io Cc: gaoliming@byosoft.com.cn, Bin, Sung-Uk (Bin), Villatel, Maugan, Collison, Sean [-- Attachment #1: Type: text/plain, Size: 4916 bytes --] Dear, Ard and maintainers We are concerning that ReleaseSpinLock() does not have a memory barrier. This is reported to https://bugzilla.tianocore.org/show_bug.cgi?id=3005<https://bugzilla.tianocore.org/show_bug.cgi?id=3005>. We’d like to hear from you whether current implementation needs improvement or not. The concern comes from 'weak memory ordering' and multi-core. (we are using AARCH64) And the scenario that we’re concerning is like below: AcquireSpinLock(); // contains ‘dmb sy’ and prevents "a = *b" from moving up (and unnecessarily prevents other things from moving down) a = *b; a = a + 1; *b = a; ReleaseSpinLock(); // No write barrier here, so "*b = a" can move down. Another core acquires the spinlock and can read stale data Please let me know if it would be helpful to add MemoryFence like below: SPIN_LOCK * EFIAPI ReleaseSpinLock ( IN OUT SPIN_LOCK *SpinLock ) { SPIN_LOCK LockValue; ASSERT (SpinLock != NULL); MemoryFence(); LockValue = *SpinLock; ASSERT (SPIN_LOCK_ACQUIRED == LockValue || SPIN_LOCK_RELEASED == LockValue); *SpinLock = SPIN_LOCK_RELEASED; return SpinLock; } MemoryFence is implemented with 'dmb', but I just wonder if it is okay to not implement it with 'dsb'. * Attaching linux documentation describing SMP barrier pairing https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt SMP BARRIER PAIRING ------------------- When dealing with CPU-CPU interactions, certain types of memory barrier should always be paired. A lack of appropriate pairing is almost certainly an error. General barriers pair with each other, though they also pair with most other types of barriers, albeit without multicopy atomicity. An acquire barrier pairs with a release barrier, but both may also pair with other barriers, including of course general barriers. A write barrier pairs with a data dependency barrier, a control dependency, an acquire barrier, a release barrier, a read barrier, or a general barrier. Similarly a read barrier, control dependency, or a data dependency barrier pairs with a write barrier, an acquire barrier, a release barrier, or a general barrier: CPU 1 CPU 2 =============== =============== WRITE_ONCE(a, 1); <write barrier> WRITE_ONCE(b, 2); x = READ_ONCE(b); <read barrier> y = READ_ONCE(a); Or: CPU 1 CPU 2 =============== =============================== a = 1; <write barrier> WRITE_ONCE(b, &a); x = READ_ONCE(b); <data dependency barrier> y = *x; Or even: CPU 1 CPU 2 =============== =============================== r1 = READ_ONCE(y); <general barrier> WRITE_ONCE(x, 1); if (r2 = READ_ONCE(x)) { <implicit control dependency> WRITE_ONCE(y, 1); } assert(r1 == 0 || r2 == 0); Basically, the read barrier always has to be there, even though it can be of the "weaker" type. [!] Note that the stores before the write barrier would normally be expected to match the loads after the read barrier or the data dependency barrier, and vice versa: CPU 1 CPU 2 =================== =================== WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c); WRITE_ONCE(b, 2); } \ / { w = READ_ONCE(d); <write barrier> \ <read barrier> WRITE_ONCE(c, 3); } / \ { x = READ_ONCE(a); WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b); Thanks, Bin From: bugzilla-daemon@bugzilla.tianocore.org <bugzilla-daemon@bugzilla.tianocore.org> Sent: Wednesday, November 4, 2020 10:44 AM To: Bin, Sung-Uk (Bin) <sunguk-bin@hp.com> Subject: [Bug 3005] ReleaseSpinLock() requires a barrier at the beginning https://bugzilla.tianocore.org/show_bug.cgi?id=3005<https://bugzilla.tianocore.org/show_bug.cgi?id=3005> gaoliming@byosoft.com.cn<mailto:gaoliming@byosoft.com.cn> changed: What |Removed |Added ---------------------------------------------------------------------------- Priority|Lowest |Normal Status|UNCONFIRMED |CONFIRMED CC| |leif@nuviainc.com<mailto:|leif@nuviainc.com> Assignee|unassigned@tianocore.org<mailto:Assignee|unassigned@tianocore.org> |ard.biesheuvel@arm.com<mailto:|ard.biesheuvel@arm.com> Ever confirmed|0 |1 --- Comment #5 from gaoliming@byosoft.com.cn<mailto:gaoliming@byosoft.com.cn> --- Ard: can you help check it? This issue in AARCH64. -- You are receiving this mail because: You reported the bug. [-- Attachment #2: Type: text/html, Size: 36664 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] Incorrect memory ordering in ReleaseSpinLock() 2021-01-06 11:29 [RFC] Incorrect memory ordering in ReleaseSpinLock() sunguk-bin @ 2021-01-06 13:27 ` Ard Biesheuvel 2021-01-07 0:02 ` Bin, Sung-Uk (Bin) 0 siblings, 1 reply; 3+ messages in thread From: Ard Biesheuvel @ 2021-01-06 13:27 UTC (permalink / raw) To: Bin, Sung-Uk (Bin), devel@edk2.groups.io Cc: gaoliming@byosoft.com.cn, Villatel, Maugan, Collison, Sean On 1/6/21 12:29 PM, Bin, Sung-Uk (Bin) wrote: > Dear, Ard and maintainers > > > > We are concerning that ReleaseSpinLock() does not have a memory barrier. > This is reported to https://bugzilla.tianocore.org/show_bug.cgi?id=3005. > We’d like to hear from you whether current implementation needs > improvement or not. > I think you are correct that the current implementation is insufficient. However, I would prefer for someone to do a comprehensive audit of all the locking primitives for concurrency problems. > > > The concern comes from *'weak memory ordering' and multi-core.* (we are > using AARCH64) And the scenario that we’re concerning is like below: > When does UEFI run multi-core on a AArch64 system? The UEFI spec does not permit SMP at boot time, and at runtime, the runtime services are not reentrant, in which case we should be able to rely on barriers in the OS's critical section code to ensure visibility when several cores compete for the UEFI runtime services from the OS. > > > AcquireSpinLock(); // contains ‘dmb sy’ and prevents "a = *b" from > moving up (and unnecessarily prevents other things from moving down) > > a = *b; > > a = a + 1; > > *b = a; > > *ReleaseSpinLock(); // No write barrier here, so "*b = a" can move down. > Another core acquires the spinlock and can read stale data* > > > > > > Please let me know if it would be helpful to add MemoryFence like below: > For symmetry, I'd prefer it if we could simply implement the release side in terms of InterlockedCompareExchangePointer(), and ASSERT() on the output. *However*, looking at the current code, there seems to be something seriously wrong: ReleaseSpinLock() has ASSERT (SPIN_LOCK_ACQUIRED == LockValue || SPIN_LOCK_RELEASED == LockValue); which means you can release a released spinlock even on a DEBUG build without a diagnostic being printed - that seems like a bug to me. > > > SPIN_LOCK * > > EFIAPI > > ReleaseSpinLock ( > > IN OUT SPIN_LOCK *SpinLock > > ) > > { > > SPIN_LOCK LockValue; > > > > ASSERT (SpinLock != NULL); > > > > * MemoryFence(); * > > > > LockValue = *SpinLock; > > ASSERT (SPIN_LOCK_ACQUIRED == LockValue || SPIN_LOCK_RELEASED == > LockValue); > > > > *SpinLock = SPIN_LOCK_RELEASED; > > return SpinLock; > > } > > * * > > *MemoryFence is implemented with 'dmb', but I just wonder if it is okay > to not implement it with 'dsb'.* > DSB is for cache and TLB maintenance, not for memory ordering. DMB should be sufficient here. And actually, we don't need a system wide DMB here, an inner shareable DMB should be sufficient (given that we don't share spinlocks with DMA masters) > > > * Attaching linux documentation describing SMP barrier pairing > > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt > > > > SMP BARRIER PAIRING > > > > ------------------- > > > > > > > > When dealing with CPU-CPU interactions, certain types of memory barrier > should > > > > always be paired. A lack of appropriate pairing is almost certainly an > error. > > > > > > > > General barriers pair with each other, though they also pair with most > > > > other types of barriers, albeit without multicopy atomicity. An acquire > > > > barrier pairs with a release barrier, but both may also pair with other > > > > barriers, including of course general barriers. A write barrier pairs > > > > with a data dependency barrier, a control dependency, an acquire barrier, > > > > a release barrier, a read barrier, or a general barrier. Similarly a > > > > read barrier, control dependency, or a data dependency barrier pairs > > > > with a write barrier, an acquire barrier, a release barrier, or a > > > > general barrier: > > > > > > > > CPU 1 CPU 2 > > > > =============== =============== > > > > WRITE_ONCE(a, 1); > > > > <write barrier> > > > > WRITE_ONCE(b, 2); x = READ_ONCE(b); > > > > <read barrier> > > > > y = READ_ONCE(a); > > > > > > > > Or: > > > > > > > > CPU 1 CPU 2 > > > > =============== =============================== > > > > a = 1; > > > > <write barrier> > > > > WRITE_ONCE(b, &a); x = READ_ONCE(b); > > > > <data dependency barrier> > > > > y = *x; > > > > > > > > Or even: > > > > > > > > CPU 1 CPU 2 > > > > =============== =============================== > > > > r1 = READ_ONCE(y); > > > > <general barrier> > > > > WRITE_ONCE(x, 1); if (r2 = READ_ONCE(x)) { > > > > <implicit control dependency> > > > > WRITE_ONCE(y, 1); > > > > } > > > > > > > > assert(r1 == 0 || r2 == 0); > > > > > > > > Basically, the read barrier always has to be there, even though it can be of > > > > the "weaker" type. > > > > > > > > [!] Note that the stores before the write barrier would normally be > expected to > > > > match the loads after the read barrier or the data dependency barrier, > and vice > > > > versa: > > > > > > > > CPU 1 CPU 2 > > > > =================== =================== > > > > WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c); > > > > WRITE_ONCE(b, 2); } \ / { w = READ_ONCE(d); > > > > <write barrier> \ <read barrier> > > > > WRITE_ONCE(c, 3); } / \ { x = READ_ONCE(a); > > > > WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b); > > > > > > > > > > > > Thanks, > > Bin > > > > *From:* bugzilla-daemon@bugzilla.tianocore.org > <bugzilla-daemon@bugzilla.tianocore.org> > *Sent:* Wednesday, November 4, 2020 10:44 AM > *To:* Bin, Sung-Uk (Bin) <sunguk-bin@hp.com> > *Subject:* [Bug 3005] ReleaseSpinLock() requires a barrier at the beginning > > > > https://bugzilla.tianocore.org/show_bug.cgi?id=3005 > > gaoliming@byosoft.com.cn <mailto:gaoliming@byosoft.com.cn> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Priority|Lowest |Normal > Status|UNCONFIRMED |CONFIRMED > CC| |leif@nuviainc.com <mailto:|leif@nuviainc.com> > Assignee|unassigned@tianocore.org > <mailto:Assignee|unassigned@tianocore.org> |ard.biesheuvel@arm.com > <mailto:|ard.biesheuvel@arm.com> > Ever confirmed|0 |1 > > --- Comment #5 from gaoliming@byosoft.com.cn > <mailto:gaoliming@byosoft.com.cn> --- > Ard: can you help check it? This issue in AARCH64. > > -- > You are receiving this mail because: > You reported the bug. > ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [RFC] Incorrect memory ordering in ReleaseSpinLock() 2021-01-06 13:27 ` Ard Biesheuvel @ 2021-01-07 0:02 ` Bin, Sung-Uk (Bin) 0 siblings, 0 replies; 3+ messages in thread From: Bin, Sung-Uk (Bin) @ 2021-01-07 0:02 UTC (permalink / raw) To: Ard Biesheuvel, devel@edk2.groups.io Cc: gaoliming@byosoft.com.cn, Villatel, Maugan, Collison, Sean, Yeon, Jooyoung (연주영 (S/W Dev. Lab.)), Bin, Sung-Uk (Bin) [-- Attachment #1: Type: text/plain, Size: 8723 bytes --] According to Ard's explanation, it seems that we do not need to worry about multi-core issues. However, it is assumed that spinlocks are not shared with DMA masters. Maugan, Sean, If you have any other comments or concerns about this, please leave a comment. Following Ard's suggestion, we may improve the code as below if needed. SPIN_LOCK * EFIAPI ReleaseSpinLock ( IN OUT SPIN_LOCK *SpinLock ) { SPIN_LOCK LockValue; ASSERT (SpinLock != NULL); LockValue = *SpinLock; ASSERT (SPIN_LOCK_ACQUIRED == LockValue || SPIN_LOCK_RELEASED == LockValue); InterlockedCompareExchangePointer ( (VOID**)SpinLock, (VOID*)SPIN_LOCK_ACQUIRED, (VOID*)SPIN_LOCK_RELEASED ); return SpinLock; } --Bin From: Ard Biesheuvel <ard.biesheuvel@arm.com> Sent: Wednesday, January 6, 2021 10:28 PM To: Bin, Sung-Uk (Bin) <sunguk-bin@hp.com>; devel@edk2.groups.io Cc: gaoliming@byosoft.com.cn; Villatel, Maugan <maugan.villatel@hp.com>; Collison, Sean <scollison@hp.com> Subject: Re: [RFC] Incorrect memory ordering in ReleaseSpinLock() On 1/6/21 12:29 PM, Bin, Sung-Uk (Bin) wrote: > Dear, Ard and maintainers > > > > We are concerning that ReleaseSpinLock() does not have a memory barrier. > This is reported to https://bugzilla.tianocore.org/show_bug.cgi?id=3005<https://bugzilla.tianocore.org/show_bug.cgi?id=3005>. > We’d like to hear from you whether current implementation needs > improvement or not. > I think you are correct that the current implementation is insufficient. However, I would prefer for someone to do a comprehensive audit of all the locking primitives for concurrency problems. > > > The concern comes from *'weak memory ordering' and multi-core.* (we are > using AARCH64) And the scenario that we’re concerning is like below: > When does UEFI run multi-core on a AArch64 system? The UEFI spec does not permit SMP at boot time, and at runtime, the runtime services are not reentrant, in which case we should be able to rely on barriers in the OS's critical section code to ensure visibility when several cores compete for the UEFI runtime services from the OS. > > > AcquireSpinLock(); // contains ‘dmb sy’ and prevents "a = *b" from > moving up (and unnecessarily prevents other things from moving down) > > a = *b; > > a = a + 1; > > *b = a; > > *ReleaseSpinLock(); // No write barrier here, so "*b = a" can move down. > Another core acquires the spinlock and can read stale data* > > > > > > Please let me know if it would be helpful to add MemoryFence like below: > For symmetry, I'd prefer it if we could simply implement the release side in terms of InterlockedCompareExchangePointer(), and ASSERT() on the output. *However*, looking at the current code, there seems to be something seriously wrong: ReleaseSpinLock() has ASSERT (SPIN_LOCK_ACQUIRED == LockValue || SPIN_LOCK_RELEASED == LockValue); which means you can release a released spinlock even on a DEBUG build without a diagnostic being printed - that seems like a bug to me. > > > SPIN_LOCK * > > EFIAPI > > ReleaseSpinLock ( > > IN OUT SPIN_LOCK *SpinLock > > ) > > { > > SPIN_LOCK LockValue; > > > > ASSERT (SpinLock != NULL); > > > > * MemoryFence(); * > > > > LockValue = *SpinLock; > > ASSERT (SPIN_LOCK_ACQUIRED == LockValue || SPIN_LOCK_RELEASED == > LockValue); > > > > *SpinLock = SPIN_LOCK_RELEASED; > > return SpinLock; > > } > > * * > > *MemoryFence is implemented with 'dmb', but I just wonder if it is okay > to not implement it with 'dsb'.* > DSB is for cache and TLB maintenance, not for memory ordering. DMB should be sufficient here. And actually, we don't need a system wide DMB here, an inner shareable DMB should be sufficient (given that we don't share spinlocks with DMA masters) > > > * Attaching linux documentation describing SMP barrier pairing > > https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt<https://github.com/torvalds/linux/blob/master/Documentation/memory-barriers.txt> > > > > SMP BARRIER PAIRING > > > > ------------------- > > > > > > > > When dealing with CPU-CPU interactions, certain types of memory barrier > should > > > > always be paired. A lack of appropriate pairing is almost certainly an > error. > > > > > > > > General barriers pair with each other, though they also pair with most > > > > other types of barriers, albeit without multicopy atomicity. An acquire > > > > barrier pairs with a release barrier, but both may also pair with other > > > > barriers, including of course general barriers. A write barrier pairs > > > > with a data dependency barrier, a control dependency, an acquire barrier, > > > > a release barrier, a read barrier, or a general barrier. Similarly a > > > > read barrier, control dependency, or a data dependency barrier pairs > > > > with a write barrier, an acquire barrier, a release barrier, or a > > > > general barrier: > > > > > > > > CPU 1 CPU 2 > > > > =============== =============== > > > > WRITE_ONCE(a, 1); > > > > <write barrier> > > > > WRITE_ONCE(b, 2); x = READ_ONCE(b); > > > > <read barrier> > > > > y = READ_ONCE(a); > > > > > > > > Or: > > > > > > > > CPU 1 CPU 2 > > > > =============== =============================== > > > > a = 1; > > > > <write barrier> > > > > WRITE_ONCE(b, &a); x = READ_ONCE(b); > > > > <data dependency barrier> > > > > y = *x; > > > > > > > > Or even: > > > > > > > > CPU 1 CPU 2 > > > > =============== =============================== > > > > r1 = READ_ONCE(y); > > > > <general barrier> > > > > WRITE_ONCE(x, 1); if (r2 = READ_ONCE(x)) { > > > > <implicit control dependency> > > > > WRITE_ONCE(y, 1); > > > > } > > > > > > > > assert(r1 == 0 || r2 == 0); > > > > > > > > Basically, the read barrier always has to be there, even though it can be of > > > > the "weaker" type. > > > > > > > > [!] Note that the stores before the write barrier would normally be > expected to > > > > match the loads after the read barrier or the data dependency barrier, > and vice > > > > versa: > > > > > > > > CPU 1 CPU 2 > > > > =================== =================== > > > > WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c); > > > > WRITE_ONCE(b, 2); } \ / { w = READ_ONCE(d); > > > > <write barrier> \ <read barrier> > > > > WRITE_ONCE(c, 3); } / \ { x = READ_ONCE(a); > > > > WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b); > > > > > > > > > > > > Thanks, > > Bin > > > > *From:* bugzilla-daemon@bugzilla.tianocore.org<mailto:bugzilla-daemon@bugzilla.tianocore.org> > <bugzilla-daemon@bugzilla.tianocore.org<mailto:bugzilla-daemon@bugzilla.tianocore.org>> > *Sent:* Wednesday, November 4, 2020 10:44 AM > *To:* Bin, Sung-Uk (Bin) <sunguk-bin@hp.com<mailto:sunguk-bin@hp.com>> > *Subject:* [Bug 3005] ReleaseSpinLock() requires a barrier at the beginning > > > > https://bugzilla.tianocore.org/show_bug.cgi?id=3005<https://bugzilla.tianocore.org/show_bug.cgi?id=3005> > > gaoliming@byosoft.com.cn<mailto:gaoliming@byosoft.com.cn> <mailto:gaoliming@byosoft.com.cn> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Priority|Lowest |Normal > Status|UNCONFIRMED |CONFIRMED > CC| |leif@nuviainc.com<mailto:|leif@nuviainc.com> <mailto:|leif@nuviainc.com> > Assignee|unassigned@tianocore.org<mailto:Assignee|unassigned@tianocore.org> > <mailto:Assignee|unassigned@tianocore.org> |ard.biesheuvel@arm.com<mailto:|ard.biesheuvel@arm.com> > <mailto:|ard.biesheuvel@arm.com> > Ever confirmed|0 |1 > > --- Comment #5 from gaoliming@byosoft.com.cn<mailto:gaoliming@byosoft.com.cn> > <mailto:gaoliming@byosoft.com.cn> --- > Ard: can you help check it? This issue in AARCH64. > > -- > You are receiving this mail because: > You reported the bug. > [-- Attachment #2: Type: text/html, Size: 20052 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-01-07 0:03 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-01-06 11:29 [RFC] Incorrect memory ordering in ReleaseSpinLock() sunguk-bin 2021-01-06 13:27 ` Ard Biesheuvel 2021-01-07 0:02 ` Bin, Sung-Uk (Bin)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox