* [PATCH v2 0/2] Add volatile for mNumberToFinish
@ 2016-11-15 14:08 Jeff Fan
2016-11-15 14:08 ` [PATCH v2 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish Jeff Fan
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Jeff Fan @ 2016-11-15 14:08 UTC (permalink / raw)
To: edk2-devel
Cc: Paolo Bonzini, Laszlo Ersek, Jiewen Yao, Feng Tian,
Michael D Kinney
v2:
Add patch #1 per Laszlo's comments
at https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html
About the comments updated SynchronizationLib to add volatile for
input parameter, I will send in another serial patches.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
Jeff Fan (2):
UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
4 files changed, 6 insertions(+), 6 deletions(-)
--
2.9.3.windows.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
2016-11-15 14:08 [PATCH v2 0/2] Add volatile for mNumberToFinish Jeff Fan
@ 2016-11-15 14:08 ` Jeff Fan
2016-11-15 14:08 ` [PATCH v2 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish Jeff Fan
2016-11-15 16:10 ` [PATCH v2 0/2] " Laszlo Ersek
2 siblings, 0 replies; 10+ messages in thread
From: Jeff Fan @ 2016-11-15 14:08 UTC (permalink / raw)
To: edk2-devel
Cc: Paolo Bonzini, Laszlo Ersek, Jiewen Yao, Feng Tian,
Michael D Kinney
Adding *volatile* type for NumberToFinish parameter of TransferApToSafeState().
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
index 9760373..22e2d9a 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c
@@ -107,12 +107,12 @@ VOID
TransferApToSafeState (
IN UINT32 ApHltLoopCode,
IN UINT32 TopOfStack,
- IN UINT32 *NumberToFinish
+ IN volatile UINT32 *NumberToFinish
)
{
SwitchStack (
(SWITCH_STACK_ENTRY_POINT) (UINTN) ApHltLoopCode,
- NumberToFinish,
+ (UINT32 *) NumberToFinish,
NULL,
(VOID *) (UINTN) TopOfStack
);
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
index 88d9c85..6688f88 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h
@@ -837,7 +837,7 @@ VOID
TransferApToSafeState (
IN UINT32 ApHltLoopCode,
IN UINT32 TopOfStack,
- IN UINT32 *NumberToFinish
+ IN volatile UINT32 *NumberToFinish
);
#endif
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
index 6844c3f..8f0935d 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c
@@ -112,7 +112,7 @@ VOID
TransferApToSafeState (
IN UINT32 ApHltLoopCode,
IN UINT32 TopOfStack,
- IN UINT32 *NumberToFinish
+ IN volatile UINT32 *NumberToFinish
)
{
AsmDisablePaging64 (
--
2.9.3.windows.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
2016-11-15 14:08 [PATCH v2 0/2] Add volatile for mNumberToFinish Jeff Fan
2016-11-15 14:08 ` [PATCH v2 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish Jeff Fan
@ 2016-11-15 14:08 ` Jeff Fan
2016-11-15 16:10 ` [PATCH v2 0/2] " Laszlo Ersek
2 siblings, 0 replies; 10+ messages in thread
From: Jeff Fan @ 2016-11-15 14:08 UTC (permalink / raw)
To: edk2-devel
Cc: Paolo Bonzini, Laszlo Ersek, Jiewen Yao, Feng Tian,
Michael D Kinney
The GCC 5.4 will optimize mNumberToFinish in EarlyInitializeCpu(). It will cause
S3 resume failure.
Adding *volatile* could make sure compiler does not so such optimization.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jiewen Yao <jiewen.yao@intel.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Michael D Kinney <michael.d.kinney@intel.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com>
---
UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
index 3fb6864..6324d7e 100644
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c
@@ -55,7 +55,7 @@ AsmGetAddressMap (
#define LEGACY_REGION_BASE (0xA0000 - LEGACY_REGION_SIZE)
ACPI_CPU_DATA mAcpiCpuData;
-UINT32 mNumberToFinish;
+volatile UINT32 mNumberToFinish;
MP_CPU_EXCHANGE_INFO *mExchangeInfo;
BOOLEAN mRestoreSmmConfigurationInS3 = FALSE;
VOID *mGdtForAp = NULL;
@@ -371,7 +371,7 @@ EarlyMPRendezvousProcedure (
//
// Count down the number with lock mechanism.
//
- InterlockedDecrement (&mNumberToFinish);
+ InterlockedDecrement ((UINT32 *) &mNumberToFinish);
}
/**
--
2.9.3.windows.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
2016-11-15 14:08 [PATCH v2 0/2] Add volatile for mNumberToFinish Jeff Fan
2016-11-15 14:08 ` [PATCH v2 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish Jeff Fan
2016-11-15 14:08 ` [PATCH v2 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish Jeff Fan
@ 2016-11-15 16:10 ` Laszlo Ersek
2016-11-16 18:18 ` Kinney, Michael D
2 siblings, 1 reply; 10+ messages in thread
From: Laszlo Ersek @ 2016-11-15 16:10 UTC (permalink / raw)
To: Jeff Fan, edk2-devel
Cc: Paolo Bonzini, Jiewen Yao, Feng Tian, Michael D Kinney
Jeff,
On 11/15/16 15:08, Jeff Fan wrote:
> v2:
> Add patch #1 per Laszlo's comments
> at https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html
>
> About the comments updated SynchronizationLib to add volatile for
> input parameter, I will send in another serial patches.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Jiewen Yao <jiewen.yao@intel.com>
> Cc: Feng Tian <feng.tian@intel.com>
> Cc: Michael D Kinney <michael.d.kinney@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
>
> Jeff Fan (2):
> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
>
> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
> 4 files changed, 6 insertions(+), 6 deletions(-)
>
if you want to keep GCC5.4 from optimizing away the access, the
synchronization object itself, and all pointers to it must remain
volatile. Wherever you cast away the volatile qualifier, for example in
a function call, GCC can break code on the next level, even if you don't
actually access the object through that pointer (i.e., if you cast the
pointer back to volatile just in time for the access).
So, the safe way to go about this is to change function prototypes from
callee towards callers -- first change the callee (because both volatile
and non-volatile can be accepted as volatile), then change the caller
(make sure what you pass in is volatile, and propagate it outwards).
It is also okay to convert the original volatile pointer to UINTN, and
to pass it to assembly code like that, or to convert it back to a
volatile pointer from UINTN before use.
>From the C99 standard:
6.3 Conversions
6.3.2.3 Pointers
2 For any qualifier q, a pointer to a non-q-qualified type may be
converted to a pointer to the q-qualified version of the type; the
values stored in the original and converted pointers shall compare
equal.
5 An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might
not be correctly aligned, might not point to an entity of the
referenced type, and might be a trap representation.
6 Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
6.7.3 Type qualifiers, paragraph 5:
If an attempt is made to modify an object defined with a
const-qualified type through use of an lvalue with
non-const-qualified type, the behavior is undefined. If an attempt
is made to refer to an object defined with a volatile-qualified
type through use of an lvalue with non-volatile-qualified type, the
behavior is undefined.
In summary:
- casting away "volatile" even just temporarily (without actual
accesses) may give gcc license to break the code (6.3.2.3 p2)
- accessing without volatile is known to break the code (6.7.3 p5)
- you can always cast from non-volatile to volatile (6.3.2.3 p2),
but not the other way around!
- you can cast from (volatile VOID *) to UINTN and back as much as
you want (6.3.2.3 p5 p6), because our execution environment makes
that safe ("implementation-defined")
We might want to play loose with 6.3.2.3 p2 -- that is, cast away
volatile from the pointer only temporarily, and cast it back just before
accessing the object through the pointer --, but that could be unsafe in
the long term. The *really* safe method is to cast it to UINTN, and then
back the same way.
Yes, this would affect functions like SwitchStack() too -- the Context1
and Context2 parameters would have to change their types to UINTN.
I think what we should discuss at this point is whether we'd like to
care about 6.3.2.3 p2; that is, whether we consider casting away
volatile temporarily.
The direction I've been experiencing with GCC is that its optimization
methods are becoming more aggressive. For some optimizations, there are
flags that disable them; I'm not sure they provide a specific flag for
preventing GCC from exploiting violations of 6.3.2.3 p2.
Thanks
Laszlo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
2016-11-15 16:10 ` [PATCH v2 0/2] " Laszlo Ersek
@ 2016-11-16 18:18 ` Kinney, Michael D
2016-11-16 19:21 ` Laszlo Ersek
2016-11-16 19:40 ` Andrew Fish
0 siblings, 2 replies; 10+ messages in thread
From: Kinney, Michael D @ 2016-11-16 18:18 UTC (permalink / raw)
To: Laszlo Ersek, Fan, Jeff, edk2-devel@ml01.01.org,
Kinney, Michael D
Cc: Paolo Bonzini, Yao, Jiewen, Tian, Feng
Laszlo,
Thanks for the details from and ANSI C spec.
For this compiler issue, are there more details on the
assembly code generated by the GCC 5.4 compiler in the
failing mode?
I also see Liming's observation that the internal
implementation of the SynchronizationLib adds volatile
in some internal worker functions. Other implementation
artifacts include the use of a read/write memory barrier
to prevent the optimizing compiler from optimizing
across a boundary. These read/write barriers are used
in the spin lock functions, but not the Interlocked*()
functions.
I want to make sure we have studied the code generation
to see if the issue is related to volatile or a read/write
memory barrier. It could be adding volatile forced the
compiler to internally add read/write barrier.
Also, given the implementation I see in the SynchronizationLib
I am not sure the cast to (UINT32 *) is required in the
proposed patch. Do we get compiler warnings/errors if those
casts are not included?
The second topic is what the SynchronizationLib API interfaces
should have been from the beginning. In retrospect, I think
they should have been defined with volatile pointers. The spin
lock APIs do use volatile pointers, but that is embedded in the
typedef for SPIN_LOCK, so it is not as obvious.
Thanks,
Mike
> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Tuesday, November 15, 2016 8:10 AM
> To: Fan, Jeff <jeff.fan@intel.com>; edk2-devel@ml01.01.org
> Cc: Paolo Bonzini <pbonzini@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Tian,
> Feng <feng.tian@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>
> Subject: Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
>
> Jeff,
>
> On 11/15/16 15:08, Jeff Fan wrote:
> > v2:
> > Add patch #1 per Laszlo's comments
> > at https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html
> >
> > About the comments updated SynchronizationLib to add volatile for
> > input parameter, I will send in another serial patches.
> >
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Laszlo Ersek <lersek@redhat.com>
> > Cc: Jiewen Yao <jiewen.yao@intel.com>
> > Cc: Feng Tian <feng.tian@intel.com>
> > Cc: Michael D Kinney <michael.d.kinney@intel.com>
> > Contributed-under: TianoCore Contribution Agreement 1.0
> > Signed-off-by: Jeff Fan <jeff.fan@intel.com>
> >
> > Jeff Fan (2):
> > UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
> > UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
> >
> > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
> > UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
> > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
> > UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
> > 4 files changed, 6 insertions(+), 6 deletions(-)
> >
>
> if you want to keep GCC5.4 from optimizing away the access, the
> synchronization object itself, and all pointers to it must remain
> volatile. Wherever you cast away the volatile qualifier, for example in
> a function call, GCC can break code on the next level, even if you don't
> actually access the object through that pointer (i.e., if you cast the
> pointer back to volatile just in time for the access).
>
> So, the safe way to go about this is to change function prototypes from
> callee towards callers -- first change the callee (because both volatile
> and non-volatile can be accepted as volatile), then change the caller
> (make sure what you pass in is volatile, and propagate it outwards).
>
> It is also okay to convert the original volatile pointer to UINTN, and
> to pass it to assembly code like that, or to convert it back to a
> volatile pointer from UINTN before use.
>
> From the C99 standard:
>
> 6.3 Conversions
> 6.3.2.3 Pointers
>
> 2 For any qualifier q, a pointer to a non-q-qualified type may be
> converted to a pointer to the q-qualified version of the type; the
> values stored in the original and converted pointers shall compare
> equal.
>
> 5 An integer may be converted to any pointer type. Except as
> previously specified, the result is implementation-defined, might
> not be correctly aligned, might not point to an entity of the
> referenced type, and might be a trap representation.
>
> 6 Any pointer type may be converted to an integer type. Except as
> previously specified, the result is implementation-defined. If the
> result cannot be represented in the integer type, the behavior is
> undefined. The result need not be in the range of values of any
> integer type.
>
> 6.7.3 Type qualifiers, paragraph 5:
>
> If an attempt is made to modify an object defined with a
> const-qualified type through use of an lvalue with
> non-const-qualified type, the behavior is undefined. If an attempt
> is made to refer to an object defined with a volatile-qualified
> type through use of an lvalue with non-volatile-qualified type, the
> behavior is undefined.
>
> In summary:
>
> - casting away "volatile" even just temporarily (without actual
> accesses) may give gcc license to break the code (6.3.2.3 p2)
>
> - accessing without volatile is known to break the code (6.7.3 p5)
>
> - you can always cast from non-volatile to volatile (6.3.2.3 p2),
> but not the other way around!
>
> - you can cast from (volatile VOID *) to UINTN and back as much as
> you want (6.3.2.3 p5 p6), because our execution environment makes
> that safe ("implementation-defined")
>
> We might want to play loose with 6.3.2.3 p2 -- that is, cast away
> volatile from the pointer only temporarily, and cast it back just before
> accessing the object through the pointer --, but that could be unsafe in
> the long term. The *really* safe method is to cast it to UINTN, and then
> back the same way.
>
> Yes, this would affect functions like SwitchStack() too -- the Context1
> and Context2 parameters would have to change their types to UINTN.
>
> I think what we should discuss at this point is whether we'd like to
> care about 6.3.2.3 p2; that is, whether we consider casting away
> volatile temporarily.
>
> The direction I've been experiencing with GCC is that its optimization
> methods are becoming more aggressive. For some optimizations, there are
> flags that disable them; I'm not sure they provide a specific flag for
> preventing GCC from exploiting violations of 6.3.2.3 p2.
>
> Thanks
> Laszlo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
2016-11-16 18:18 ` Kinney, Michael D
@ 2016-11-16 19:21 ` Laszlo Ersek
2016-11-16 21:14 ` Andrew Fish
2016-11-16 19:40 ` Andrew Fish
1 sibling, 1 reply; 10+ messages in thread
From: Laszlo Ersek @ 2016-11-16 19:21 UTC (permalink / raw)
To: Kinney, Michael D, Fan, Jeff, edk2-devel@ml01.01.org
Cc: Paolo Bonzini, Yao, Jiewen, Tian, Feng
On 11/16/16 19:18, Kinney, Michael D wrote:
> Laszlo,
>
> Thanks for the details from and ANSI C spec.
>
> For this compiler issue, are there more details on the
> assembly code generated by the GCC 5.4 compiler in the
> failing mode?
Unfortunately, I have little first-hand experience with this kind of
failure, I've just read different accounts of it. Perhaps Paolo can
comment -- my experience with contacting GCC maintainers directly hasn't
been stellar.
> I also see Liming's observation that the internal
> implementation of the SynchronizationLib adds volatile
> in some internal worker functions. Other implementation
> artifacts include the use of a read/write memory barrier
> to prevent the optimizing compiler from optimizing
> across a boundary. These read/write barriers are used
> in the spin lock functions, but not the Interlocked*()
> functions.
>
> I want to make sure we have studied the code generation
> to see if the issue is related to volatile or a read/write
> memory barrier. It could be adding volatile forced the
> compiler to internally add read/write barrier.
In my understanding, MemoryFence()
[MdePkg/Library/BaseLib/X64/GccInline.c] is just a compiler-level
barrier, not a processor level barrier.
__asm__ __volatile__ ("":::"memory");
Processor level barriers shouldn't be a problem here; as far as I recall
from the SDM from a few days ago, the XCHG instruction "engages the
CPU's locking protocol even without the LOCK prefix", plus I see an
explicit LOCK in InternalSyncDecrement()
[MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c] for example.
I'm pretty sure the compiler opimized away the access because it saw no
intervening access (it doesn't know about the code running on other
CPUs). I think "volatile" and MemoryFence() -- with the latter telling
GCC that all memory should be considered clobbered -- should fix the
issue just the same, in this situation.
> Also, given the implementation I see in the SynchronizationLib
> I am not sure the cast to (UINT32 *) is required in the
> proposed patch. Do we get compiler warnings/errors if those
> casts are not included?
Yes.
Passing an argument to a function whose prototype is visible invokes
type conversion just like in assignment:
6.5.2.2 Function calls
7 If the expression that denotes the called function has a type that
does include a prototype, the arguments are implicitly converted,
as if by assignment, to the types of the corresponding parameters,
taking the type of each parameter to be the unqualified version
of its declared type. [...]
(IMPORTANT: the de-qualification mentioned above refers to the pointer
object itself, not to the pointed-to object! It just means if you have a
function parameter "const long x", and pass the argument 5 (which has
type int), then it is converted to "long", and not "const long".)
So, passing a (volatile UINT32 *) argument to a (UINT32 *) parameter is
the same as the similar assignment
volatile UINT32 *x;
UINT32 *y;
y = x;
Then, regarding assignment, we have
6.5.16.1 Simple assignment
Constraints
1 One of the following shall hold:
[...]
- both operands are pointers to qualified or unqualified versions
of compatible types, and the type pointed to by the left has all
the qualifiers of the type pointed to by the right;
[...]
Finally we have diagnostics,
5.1.1.3 Diagnostics
1 A conforming implementation shall produce at least one diagnostic
message (identified in an implementation-defined manner) if a
preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is
also explicitly specified as undefined or implementation-defined.
[...]
In summary, without adding the explicit cast, we break the constraint on
simple assignment, which requires the compiler to emit a diagnostic message.
(In turn if we add the explicit cast, we break other requirements, as
discussed in my other email -- and here we should investigate whether we
care about breaking that exact rule.)
> The second topic is what the SynchronizationLib API interfaces
> should have been from the beginning. In retrospect, I think
> they should have been defined with volatile pointers.
I agree!
> The spin
> lock APIs do use volatile pointers, but that is embedded in the
> typedef for SPIN_LOCK, so it is not as obvious.
Thanks
Laszlo
>> -----Original Message-----
>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>> Sent: Tuesday, November 15, 2016 8:10 AM
>> To: Fan, Jeff <jeff.fan@intel.com>; edk2-devel@ml01.01.org
>> Cc: Paolo Bonzini <pbonzini@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Tian,
>> Feng <feng.tian@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>
>> Subject: Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
>>
>> Jeff,
>>
>> On 11/15/16 15:08, Jeff Fan wrote:
>>> v2:
>>> Add patch #1 per Laszlo's comments
>>> at https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html
>>>
>>> About the comments updated SynchronizationLib to add volatile for
>>> input parameter, I will send in another serial patches.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Laszlo Ersek <lersek@redhat.com>
>>> Cc: Jiewen Yao <jiewen.yao@intel.com>
>>> Cc: Feng Tian <feng.tian@intel.com>
>>> Cc: Michael D Kinney <michael.d.kinney@intel.com>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
>>>
>>> Jeff Fan (2):
>>> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
>>> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
>>>
>>> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
>>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
>>> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
>>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
>>> 4 files changed, 6 insertions(+), 6 deletions(-)
>>>
>>
>> if you want to keep GCC5.4 from optimizing away the access, the
>> synchronization object itself, and all pointers to it must remain
>> volatile. Wherever you cast away the volatile qualifier, for example in
>> a function call, GCC can break code on the next level, even if you don't
>> actually access the object through that pointer (i.e., if you cast the
>> pointer back to volatile just in time for the access).
>>
>> So, the safe way to go about this is to change function prototypes from
>> callee towards callers -- first change the callee (because both volatile
>> and non-volatile can be accepted as volatile), then change the caller
>> (make sure what you pass in is volatile, and propagate it outwards).
>>
>> It is also okay to convert the original volatile pointer to UINTN, and
>> to pass it to assembly code like that, or to convert it back to a
>> volatile pointer from UINTN before use.
>>
>> From the C99 standard:
>>
>> 6.3 Conversions
>> 6.3.2.3 Pointers
>>
>> 2 For any qualifier q, a pointer to a non-q-qualified type may be
>> converted to a pointer to the q-qualified version of the type; the
>> values stored in the original and converted pointers shall compare
>> equal.
>>
>> 5 An integer may be converted to any pointer type. Except as
>> previously specified, the result is implementation-defined, might
>> not be correctly aligned, might not point to an entity of the
>> referenced type, and might be a trap representation.
>>
>> 6 Any pointer type may be converted to an integer type. Except as
>> previously specified, the result is implementation-defined. If the
>> result cannot be represented in the integer type, the behavior is
>> undefined. The result need not be in the range of values of any
>> integer type.
>>
>> 6.7.3 Type qualifiers, paragraph 5:
>>
>> If an attempt is made to modify an object defined with a
>> const-qualified type through use of an lvalue with
>> non-const-qualified type, the behavior is undefined. If an attempt
>> is made to refer to an object defined with a volatile-qualified
>> type through use of an lvalue with non-volatile-qualified type, the
>> behavior is undefined.
>>
>> In summary:
>>
>> - casting away "volatile" even just temporarily (without actual
>> accesses) may give gcc license to break the code (6.3.2.3 p2)
>>
>> - accessing without volatile is known to break the code (6.7.3 p5)
>>
>> - you can always cast from non-volatile to volatile (6.3.2.3 p2),
>> but not the other way around!
>>
>> - you can cast from (volatile VOID *) to UINTN and back as much as
>> you want (6.3.2.3 p5 p6), because our execution environment makes
>> that safe ("implementation-defined")
>>
>> We might want to play loose with 6.3.2.3 p2 -- that is, cast away
>> volatile from the pointer only temporarily, and cast it back just before
>> accessing the object through the pointer --, but that could be unsafe in
>> the long term. The *really* safe method is to cast it to UINTN, and then
>> back the same way.
>>
>> Yes, this would affect functions like SwitchStack() too -- the Context1
>> and Context2 parameters would have to change their types to UINTN.
>>
>> I think what we should discuss at this point is whether we'd like to
>> care about 6.3.2.3 p2; that is, whether we consider casting away
>> volatile temporarily.
>>
>> The direction I've been experiencing with GCC is that its optimization
>> methods are becoming more aggressive. For some optimizations, there are
>> flags that disable them; I'm not sure they provide a specific flag for
>> preventing GCC from exploiting violations of 6.3.2.3 p2.
>>
>> Thanks
>> Laszlo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
2016-11-16 18:18 ` Kinney, Michael D
2016-11-16 19:21 ` Laszlo Ersek
@ 2016-11-16 19:40 ` Andrew Fish
1 sibling, 0 replies; 10+ messages in thread
From: Andrew Fish @ 2016-11-16 19:40 UTC (permalink / raw)
To: Mike Kinney
Cc: Laszlo Ersek, Fan, Jeff, edk2-devel@ml01.01.org, Paolo Bonzini,
Tian, Feng, Yao, Jiewen
> On Nov 16, 2016, at 10:18 AM, Kinney, Michael D <michael.d.kinney@intel.com> wrote:
>
> Laszlo,
>
> Thanks for the details from and ANSI C spec.
>
> For this compiler issue, are there more details on the
> assembly code generated by the GCC 5.4 compiler in the
> failing mode?
>
> I also see Liming's observation that the internal
> implementation of the SynchronizationLib adds volatile
> in some internal worker functions. Other implementation
> artifacts include the use of a read/write memory barrier
> to prevent the optimizing compiler from optimizing
> across a boundary. These read/write barriers are used
> in the spin lock functions, but not the Interlocked*()
> functions.
>
> I want to make sure we have studied the code generation
> to see if the issue is related to volatile or a read/write
> memory barrier. It could be adding volatile forced the
> compiler to internally add read/write barrier.
>
Mike,
I don't think volatile variables imply memory barriers as they are different constraints placed on the compiler.
The volatile variable forces the memory operations to alway happen to memory. Thus the compiler can not optimize out a read or write. If two values are written in a row (the 1st write could be optimized away). The compiler also can not cache the variable in a register, or imply it is a constant (last value written etc.).
The memory barrier (also called fence) just forces the compiler to complete load store operations before the barrier before it can perform operations after the barrier. Lots of optimization and reordering can happen between the barriers. You are still running the optimizer so code can get optimized away and reordered, only the scope of how much code can be optimized was changed.
Thus the volatile is a constraint placed on an individual variable, and the barrier is a constraint places on how big a chunk of code can be optimized. I guess if you placed a barrier between every access to a variable you could simulate volatile behavior, but with a lot more side effects (less optimization than just using volatile).
Here is a simple example:
a) Nothing() writes 3 to 0x12345678 and returns constant 3
b) Barrier() writes 2 to 0x12345678, then writes 3 and returns constant 3
c) Volatile() writes 1 to 0x12345678, writes 2, writes 3, then reads 3 from memory to return it.
d) AllBarrier() writes 1 to 0x12345678, writes 2, writes 3, then reads 3 from memory to return it.
$ cat barriers.c
#define _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
int Nothing ()
{
int *ptr = (int *)0x12345678;
*ptr = 1;
*ptr = 2;
*ptr = 3;
return *ptr;
}
int Barrier ()
{
int *ptr = (int *)0x12345678;
*ptr = 1;
*ptr = 2;
_ReadWriteBarrier();
*ptr = 3;
return *ptr;
}
int Volatile ()
{
volatile int *ptr = (int *)0x12345678;
*ptr = 1;
*ptr = 2;
*ptr = 3;
return *ptr;
}
int AllBarrier ()
{
int *ptr = (int *)0x12345678;
*ptr = 1;
_ReadWriteBarrier();
*ptr = 2;
_ReadWriteBarrier();
*ptr = 3;
_ReadWriteBarrier();
return *ptr;
}
$ clang -Os -S barriers.c
$ cat barriers.S
.globl _Nothing
_Nothing: ## @Nothing
pushq %rbp
movq %rsp, %rbp
movl $3, 305419896
movl $3, %eax
popq %rbp
retq
.globl _Barrier
_Barrier: ## @Barrier
pushq %rbp
movq %rsp, %rbp
movl $2, 305419896
## InlineAsm Start
## InlineAsm End
movl $3, 305419896
movl $3, %eax
popq %rbp
retq
.globl _Volatile
_Volatile: ## @Volatile
pushq %rbp
movq %rsp, %rbp
movl $1, 305419896
movl $2, 305419896
movl $3, 305419896
movl 305419896, %eax
popq %rbp
retq
_AllBarrier: ## @AllBarrier
pushq %rbp
movq %rsp, %rbp
movl $1, 305419896
## InlineAsm Start
## InlineAsm End
movl $2, 305419896
## InlineAsm Start
## InlineAsm End
movl $3, 305419896
## InlineAsm Start
## InlineAsm End
movl 305419896, %eax
popq %rbp
retq
Thanks,
Andrew Fish
PS These are just X86 examples on a architecture that required synchronizing instructions things are a little more complex and there could be more difference between volatile and barriers. But this topic was complex enough ....
> Also, given the implementation I see in the SynchronizationLib
> I am not sure the cast to (UINT32 *) is required in the
> proposed patch. Do we get compiler warnings/errors if those
> casts are not included?
>
> The second topic is what the SynchronizationLib API interfaces
> should have been from the beginning. In retrospect, I think
> they should have been defined with volatile pointers. The spin
> lock APIs do use volatile pointers, but that is embedded in the
> typedef for SPIN_LOCK, so it is not as obvious.
>
> Thanks,
>
> Mike
>
>> -----Original Message-----
>> From: Laszlo Ersek [mailto:lersek@redhat.com]
>> Sent: Tuesday, November 15, 2016 8:10 AM
>> To: Fan, Jeff <jeff.fan@intel.com>; edk2-devel@ml01.01.org
>> Cc: Paolo Bonzini <pbonzini@redhat.com>; Yao, Jiewen <jiewen.yao@intel.com>; Tian,
>> Feng <feng.tian@intel.com>; Kinney, Michael D <michael.d.kinney@intel.com>
>> Subject: Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
>>
>> Jeff,
>>
>> On 11/15/16 15:08, Jeff Fan wrote:
>>> v2:
>>> Add patch #1 per Laszlo's comments
>>> at https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html
>>>
>>> About the comments updated SynchronizationLib to add volatile for
>>> input parameter, I will send in another serial patches.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Laszlo Ersek <lersek@redhat.com>
>>> Cc: Jiewen Yao <jiewen.yao@intel.com>
>>> Cc: Feng Tian <feng.tian@intel.com>
>>> Cc: Michael D Kinney <michael.d.kinney@intel.com>
>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>> Signed-off-by: Jeff Fan <jeff.fan@intel.com>
>>>
>>> Jeff Fan (2):
>>> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
>>> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
>>>
>>> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
>>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
>>> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
>>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
>>> 4 files changed, 6 insertions(+), 6 deletions(-)
>>>
>>
>> if you want to keep GCC5.4 from optimizing away the access, the
>> synchronization object itself, and all pointers to it must remain
>> volatile. Wherever you cast away the volatile qualifier, for example in
>> a function call, GCC can break code on the next level, even if you don't
>> actually access the object through that pointer (i.e., if you cast the
>> pointer back to volatile just in time for the access).
>>
>> So, the safe way to go about this is to change function prototypes from
>> callee towards callers -- first change the callee (because both volatile
>> and non-volatile can be accepted as volatile), then change the caller
>> (make sure what you pass in is volatile, and propagate it outwards).
>>
>> It is also okay to convert the original volatile pointer to UINTN, and
>> to pass it to assembly code like that, or to convert it back to a
>> volatile pointer from UINTN before use.
>>
>> From the C99 standard:
>>
>> 6.3 Conversions
>> 6.3.2.3 Pointers
>>
>> 2 For any qualifier q, a pointer to a non-q-qualified type may be
>> converted to a pointer to the q-qualified version of the type; the
>> values stored in the original and converted pointers shall compare
>> equal.
>>
>> 5 An integer may be converted to any pointer type. Except as
>> previously specified, the result is implementation-defined, might
>> not be correctly aligned, might not point to an entity of the
>> referenced type, and might be a trap representation.
>>
>> 6 Any pointer type may be converted to an integer type. Except as
>> previously specified, the result is implementation-defined. If the
>> result cannot be represented in the integer type, the behavior is
>> undefined. The result need not be in the range of values of any
>> integer type.
>>
>> 6.7.3 Type qualifiers, paragraph 5:
>>
>> If an attempt is made to modify an object defined with a
>> const-qualified type through use of an lvalue with
>> non-const-qualified type, the behavior is undefined. If an attempt
>> is made to refer to an object defined with a volatile-qualified
>> type through use of an lvalue with non-volatile-qualified type, the
>> behavior is undefined.
>>
>> In summary:
>>
>> - casting away "volatile" even just temporarily (without actual
>> accesses) may give gcc license to break the code (6.3.2.3 p2)
>>
>> - accessing without volatile is known to break the code (6.7.3 p5)
>>
>> - you can always cast from non-volatile to volatile (6.3.2.3 p2),
>> but not the other way around!
>>
>> - you can cast from (volatile VOID *) to UINTN and back as much as
>> you want (6.3.2.3 p5 p6), because our execution environment makes
>> that safe ("implementation-defined")
>>
>> We might want to play loose with 6.3.2.3 p2 -- that is, cast away
>> volatile from the pointer only temporarily, and cast it back just before
>> accessing the object through the pointer --, but that could be unsafe in
>> the long term. The *really* safe method is to cast it to UINTN, and then
>> back the same way.
>>
>> Yes, this would affect functions like SwitchStack() too -- the Context1
>> and Context2 parameters would have to change their types to UINTN.
>>
>> I think what we should discuss at this point is whether we'd like to
>> care about 6.3.2.3 p2; that is, whether we consider casting away
>> volatile temporarily.
>>
>> The direction I've been experiencing with GCC is that its optimization
>> methods are becoming more aggressive. For some optimizations, there are
>> flags that disable them; I'm not sure they provide a specific flag for
>> preventing GCC from exploiting violations of 6.3.2.3 p2.
>>
>> Thanks
>> Laszlo
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
2016-11-16 19:21 ` Laszlo Ersek
@ 2016-11-16 21:14 ` Andrew Fish
2016-11-17 1:10 ` Kinney, Michael D
0 siblings, 1 reply; 10+ messages in thread
From: Andrew Fish @ 2016-11-16 21:14 UTC (permalink / raw)
To: Laszlo Ersek
Cc: Mike Kinney, Fan, Jeff, edk2-devel@ml01.01.org, Paolo Bonzini,
Tian, Feng, Yao, Jiewen
> On Nov 16, 2016, at 11:21 AM, Laszlo Ersek <lersek@redhat.com> wrote:
>
> On 11/16/16 19:18, Kinney, Michael D wrote:
>> Laszlo,
>>
>> Thanks for the details from and ANSI C spec.
>>
>> For this compiler issue, are there more details on the
>> assembly code generated by the GCC 5.4 compiler in the
>> failing mode?
>
> Unfortunately, I have little first-hand experience with this kind of
> failure, I've just read different accounts of it. Perhaps Paolo can
> comment -- my experience with contacting GCC maintainers directly hasn't
> been stellar.
>
If it reproduces in clang I've had a lot of luck getting help from those folks.
In general the best thing to do if you think you have a compiler bug it to write a small command line program that reproduces the issues. If it is a code generation issue, you don't even need to make the code link or run, you can just compile the code directly to assembler and write up in the bug report why you think the code generation is in error. Basically like the examples I post to this list from time to time. Well unless it is a link time optimization issue...
>> I also see Liming's observation that the internal
>> implementation of the SynchronizationLib adds volatile
>> in some internal worker functions. Other implementation
>> artifacts include the use of a read/write memory barrier
>> to prevent the optimizing compiler from optimizing
>> across a boundary. These read/write barriers are used
>> in the spin lock functions, but not the Interlocked*()
>> functions.
>>
>> I want to make sure we have studied the code generation
>> to see if the issue is related to volatile or a read/write
>> memory barrier. It could be adding volatile forced the
>> compiler to internally add read/write barrier.
>
> In my understanding, MemoryFence()
> [MdePkg/Library/BaseLib/X64/GccInline.c] is just a compiler-level
> barrier, not a processor level barrier.
>
> __asm__ __volatile__ ("":::"memory");
>
> Processor level barriers shouldn't be a problem here; as far as I recall
> from the SDM from a few days ago, the XCHG instruction "engages the
> CPU's locking protocol even without the LOCK prefix", plus I see an
> explicit LOCK in InternalSyncDecrement()
> [MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c] for example.
>
> I'm pretty sure the compiler opimized away the access because it saw no
> intervening access (it doesn't know about the code running on other
> CPUs). I think "volatile" and MemoryFence() -- with the latter telling
> GCC that all memory should be considered clobbered -- should fix the
> issue just the same, in this situation.
>
There is a good chance it is similar to my example from my previous email when the return value is a constant since the compiler wrote the value and remember the value it wrote. You will need to make the global volatile or place a barrier between the write and the read to tell the compiler what optimizations are legal.
There can also be other insidious things in place that hide bugs. For example in this code:
mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
mExchangeInfo->ApFunction = (VOID *) (UINTN) EarlyMPRendezvousProcedure;
//
// Send INIT IPI - SIPI to all APs
//
SendInitSipiSipiAllExcludingSelf ((UINT32)mAcpiCpuData.StartupVector);
while (mNumberToFinish > 0) {
CpuPause ();
}
What if some versions of SendInitSipiSipiAllExcludingSelf() have side effects that make it hard to optimize and some do not. You could get a different result...
Anyway I think this optimization is more subtle than my previous examples. If you just compile this code to assembler it looks correct (you see a loop). But if you do link time optimization the compiler will make more assumptions about how mNumberToFinish is used globally and be more aggressive about optimizing it out. You can get the same behavior from the assembler generation if you make mNumberToFinish a static local.
~/work/Compiler>cat barriers.c
#define _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
unsigned int mNumberToFinish = 0;
void SendInit2()
{
*(int *)0x12345678 = 0;
}
int main()
{
mNumberToFinish = 3;
SendInit2();
while (mNumberToFinish > 0) {
;
}
return 0;
}
~/work/Compiler>clang -Os -flto -g barriers.c
~/work/Compiler>lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) dis -n main -b
a.out`main:
a.out[0x100000fa7] <+0>: 55 pushq %rbp
a.out[0x100000fa8] <+1>: 48 89 e5 movq %rsp, %rbp
a.out[0x100000fab] <+4>: c7 04 25 78 56 34 12 00 00 00 00 movl $0x0, 0x12345678
a.out[0x100000fb6] <+15>: eb fe jmp 0x100000fb6 ; <+15> at barriers.c:16
~/work/Compiler>clang -Os -S barriers.c
~/work/Compiler>cat barriers.s
.globl _SendInit2
_SendInit2: ## @SendInit2
pushq %rbp
movq %rsp, %rbp
movl $0, 305419896
popq %rbp
retq
.globl _main
_main: ## @main
pushq %rbp
movq %rsp, %rbp
movl $3, _mNumberToFinish(%rip)
xorl %eax, %eax
movl %eax, 305419896
cmpl %eax, _mNumberToFinish(%rip)
je LBB1_2
LBB1_1: ## =>This Inner Loop Header: Depth=1
jmp LBB1_1
LBB1_2:
xorl %eax, %eax
popq %rbp
retq
.globl _mNumberToFinish ## @mNumberToFinish
.zerofill __DATA,__common,_mNumberToFinish,4,2
Live in fear of the optimizer as it is getting smarter every day. Sorry if I picked the wrong bit of code as the example.
Thanks,
Andrew Fish
PS This is a good example of how barriers don't do what you may think. Notice how the barrier only adds a single read and then a return or an infinite loop. It makes sense if you think about it.... After the barrier the compiler is clueless as to what is going on so it reads the global. Given the global can't change it decides to return or fall into the infinite loop. This is really good example of the difference between a barrier and a volatile variable. This really highlights that barrier is a one shot vs. the volatile is a property for the variable.
~/work/Compiler>cat barriers.c
#define _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
unsigned int mNumberToFinish = 0;
void SendInit2()
{
*(int *)0x12345678 = 0;
}
int main()
{
mNumberToFinish = 3;
SendInit2();
_ReadWriteBarrier();
while (mNumberToFinish > 0) {
;
}
return 0;
}
~/work/Compiler>clang -Os -flto -g barriers.c
~/work/Compiler>lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) dis -n main -b
a.out`main:
a.out[0x100000f8d] <+0>: 55 pushq %rbp
a.out[0x100000f8e] <+1>: 48 89 e5 movq %rsp, %rbp
a.out[0x100000f91] <+4>: c6 05 68 00 00 00 01 movb $0x1, 0x68(%rip)
a.out[0x100000f98] <+11>: c7 04 25 78 56 34 12 00 00 00 00 movl $0x0, 0x12345678
a.out[0x100000fa3] <+22>: 0f b6 05 56 00 00 00 movzbl 0x56(%rip), %eax ; mNumberToFinish
a.out[0x100000faa] <+29>: 83 e0 01 andl $0x1, %eax
a.out[0x100000fad] <+32>: 83 f8 01 cmpl $0x1, %eax
a.out[0x100000fb0] <+35>: 75 02 jne 0x100000fb4 ; <+39> at barriers.c:21
a.out[0x100000fb2] <+37>: eb fe jmp 0x100000fb2 ; <+37> at barriers.c:17
a.out[0x100000fb4] <+39>: 31 c0 xorl %eax, %eax
a.out[0x100000fb6] <+41>: 5d popq %rbp
a.out[0x100000fb7] <+42>: c3 retq
>> Also, given the implementation I see in the SynchronizationLib
>> I am not sure the cast to (UINT32 *) is required in the
>> proposed patch. Do we get compiler warnings/errors if those
>> casts are not included?
>
> Yes.
>
> Passing an argument to a function whose prototype is visible invokes
> type conversion just like in assignment:
>
> 6.5.2.2 Function calls
>
> 7 If the expression that denotes the called function has a type that
> does include a prototype, the arguments are implicitly converted,
> as if by assignment, to the types of the corresponding parameters,
> taking the type of each parameter to be the unqualified version
> of its declared type. [...]
>
> (IMPORTANT: the de-qualification mentioned above refers to the pointer
> object itself, not to the pointed-to object! It just means if you have a
> function parameter "const long x", and pass the argument 5 (which has
> type int), then it is converted to "long", and not "const long".)
>
> So, passing a (volatile UINT32 *) argument to a (UINT32 *) parameter is
> the same as the similar assignment
>
> volatile UINT32 *x;
> UINT32 *y;
> y = x;
>
> Then, regarding assignment, we have
>
> 6.5.16.1 Simple assignment
>
> Constraints
>
> 1 One of the following shall hold:
>
> [...]
> - both operands are pointers to qualified or unqualified versions
> of compatible types, and the type pointed to by the left has all
> the qualifiers of the type pointed to by the right;
> [...]
>
> Finally we have diagnostics,
>
> 5.1.1.3 Diagnostics
>
> 1 A conforming implementation shall produce at least one diagnostic
> message (identified in an implementation-defined manner) if a
> preprocessing translation unit or translation unit contains a
> violation of any syntax rule or constraint, even if the behavior is
> also explicitly specified as undefined or implementation-defined.
> [...]
>
> In summary, without adding the explicit cast, we break the constraint on
> simple assignment, which requires the compiler to emit a diagnostic message.
>
> (In turn if we add the explicit cast, we break other requirements, as
> discussed in my other email -- and here we should investigate whether we
> care about breaking that exact rule.)
>
>> The second topic is what the SynchronizationLib API interfaces
>> should have been from the beginning. In retrospect, I think
>> they should have been defined with volatile pointers.
>
> I agree!
>
>> The spin
>> lock APIs do use volatile pointers, but that is embedded in the
>> typedef for SPIN_LOCK, so it is not as obvious.
>
> Thanks
> Laszlo
>
>>> -----Original Message-----
>>> From: Laszlo Ersek [mailto:lersek@redhat.com <mailto:lersek@redhat.com>]
>>> Sent: Tuesday, November 15, 2016 8:10 AM
>>> To: Fan, Jeff <jeff.fan@intel.com <mailto:jeff.fan@intel.com>>; edk2-devel@ml01.01.org <mailto:edk2-devel@ml01.01.org>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com <mailto:pbonzini@redhat.com>>; Yao, Jiewen <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>; Tian,
>>> Feng <feng.tian@intel.com <mailto:feng.tian@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>>
>>> Subject: Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
>>>
>>> Jeff,
>>>
>>> On 11/15/16 15:08, Jeff Fan wrote:
>>>> v2:
>>>> Add patch #1 per Laszlo's comments
>>>> at https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html <https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html>
>>>>
>>>> About the comments updated SynchronizationLib to add volatile for
>>>> input parameter, I will send in another serial patches.
>>>>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com <mailto:pbonzini@redhat.com>>
>>>> Cc: Laszlo Ersek <lersek@redhat.com <mailto:lersek@redhat.com>>
>>>> Cc: Jiewen Yao <jiewen.yao@intel.com <mailto:jiewen.yao@intel.com>>
>>>> Cc: Feng Tian <feng.tian@intel.com <mailto:feng.tian@intel.com>>
>>>> Cc: Michael D Kinney <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>>
>>>> Contributed-under: TianoCore Contribution Agreement 1.0
>>>> Signed-off-by: Jeff Fan <jeff.fan@intel.com <mailto:jeff.fan@intel.com>>
>>>>
>>>> Jeff Fan (2):
>>>> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
>>>> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
>>>>
>>>> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
>>>> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
>>>> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
>>>> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
>>>> 4 files changed, 6 insertions(+), 6 deletions(-)
>>>>
>>>
>>> if you want to keep GCC5.4 from optimizing away the access, the
>>> synchronization object itself, and all pointers to it must remain
>>> volatile. Wherever you cast away the volatile qualifier, for example in
>>> a function call, GCC can break code on the next level, even if you don't
>>> actually access the object through that pointer (i.e., if you cast the
>>> pointer back to volatile just in time for the access).
>>>
>>> So, the safe way to go about this is to change function prototypes from
>>> callee towards callers -- first change the callee (because both volatile
>>> and non-volatile can be accepted as volatile), then change the caller
>>> (make sure what you pass in is volatile, and propagate it outwards).
>>>
>>> It is also okay to convert the original volatile pointer to UINTN, and
>>> to pass it to assembly code like that, or to convert it back to a
>>> volatile pointer from UINTN before use.
>>>
>>> From the C99 standard:
>>>
>>> 6.3 Conversions
>>> 6.3.2.3 Pointers
>>>
>>> 2 For any qualifier q, a pointer to a non-q-qualified type may be
>>> converted to a pointer to the q-qualified version of the type; the
>>> values stored in the original and converted pointers shall compare
>>> equal.
>>>
>>> 5 An integer may be converted to any pointer type. Except as
>>> previously specified, the result is implementation-defined, might
>>> not be correctly aligned, might not point to an entity of the
>>> referenced type, and might be a trap representation.
>>>
>>> 6 Any pointer type may be converted to an integer type. Except as
>>> previously specified, the result is implementation-defined. If the
>>> result cannot be represented in the integer type, the behavior is
>>> undefined. The result need not be in the range of values of any
>>> integer type.
>>>
>>> 6.7.3 Type qualifiers, paragraph 5:
>>>
>>> If an attempt is made to modify an object defined with a
>>> const-qualified type through use of an lvalue with
>>> non-const-qualified type, the behavior is undefined. If an attempt
>>> is made to refer to an object defined with a volatile-qualified
>>> type through use of an lvalue with non-volatile-qualified type, the
>>> behavior is undefined.
>>>
>>> In summary:
>>>
>>> - casting away "volatile" even just temporarily (without actual
>>> accesses) may give gcc license to break the code (6.3.2.3 p2)
>>>
>>> - accessing without volatile is known to break the code (6.7.3 p5)
>>>
>>> - you can always cast from non-volatile to volatile (6.3.2.3 p2),
>>> but not the other way around!
>>>
>>> - you can cast from (volatile VOID *) to UINTN and back as much as
>>> you want (6.3.2.3 p5 p6), because our execution environment makes
>>> that safe ("implementation-defined")
>>>
>>> We might want to play loose with 6.3.2.3 p2 -- that is, cast away
>>> volatile from the pointer only temporarily, and cast it back just before
>>> accessing the object through the pointer --, but that could be unsafe in
>>> the long term. The *really* safe method is to cast it to UINTN, and then
>>> back the same way.
>>>
>>> Yes, this would affect functions like SwitchStack() too -- the Context1
>>> and Context2 parameters would have to change their types to UINTN.
>>>
>>> I think what we should discuss at this point is whether we'd like to
>>> care about 6.3.2.3 p2; that is, whether we consider casting away
>>> volatile temporarily.
>>>
>>> The direction I've been experiencing with GCC is that its optimization
>>> methods are becoming more aggressive. For some optimizations, there are
>>> flags that disable them; I'm not sure they provide a specific flag for
>>> preventing GCC from exploiting violations of 6.3.2.3 p2.
>>>
>>> Thanks
>>> Laszlo
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel <https://lists.01.org/mailman/listinfo/edk2-devel>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
2016-11-16 21:14 ` Andrew Fish
@ 2016-11-17 1:10 ` Kinney, Michael D
2016-11-19 0:34 ` Andrew Fish
0 siblings, 1 reply; 10+ messages in thread
From: Kinney, Michael D @ 2016-11-17 1:10 UTC (permalink / raw)
To: afish@apple.com, Laszlo Ersek, Kinney, Michael D
Cc: Fan, Jeff, edk2-devel@ml01.01.org, Paolo Bonzini, Tian, Feng,
Yao, Jiewen
Hi Andrew,
Thanks for all the examples!
I have done a more detailed review of the BaseSynchronizationLib. I do not think there are any missing memory barriers in the implementation.
I have also experimented with adding volatile qualifier to parameters in the Interlocked*() functions, and for the builds I have done so far, that appears to be a compatible change. I did have to update the internal functions in that lib to also consistently use volatile, but that makes sense based on the ANSI C spec items that Laszlo quoted.
I have also been looking at Jeff's patch to CpuS3.c to add volatile to the mNumberToFinish global variable, and I have a slightly different patch I have evaluated to avoid the type case from a volatile variable to a non-volatile variable in all cases. It does require that the calls to SwitchStack() and DisablePaging64() treat the parameter as a UINTN address of the semaphore, instead of a volatile pointer to the semaphore.
I will send some patches soon for review.
Mike
From: afish@apple.com [mailto:afish@apple.com]
Sent: Wednesday, November 16, 2016 1:14 PM
To: Laszlo Ersek <lersek@redhat.com>
Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Fan, Jeff <jeff.fan@intel.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Tian, Feng <feng.tian@intel.com>; Yao, Jiewen <jiewen.yao@intel.com>
Subject: Re: [edk2] [PATCH v2 0/2] Add volatile for mNumberToFinish
On Nov 16, 2016, at 11:21 AM, Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>> wrote:
On 11/16/16 19:18, Kinney, Michael D wrote:
Laszlo,
Thanks for the details from and ANSI C spec.
For this compiler issue, are there more details on the
assembly code generated by the GCC 5.4 compiler in the
failing mode?
Unfortunately, I have little first-hand experience with this kind of
failure, I've just read different accounts of it. Perhaps Paolo can
comment -- my experience with contacting GCC maintainers directly hasn't
been stellar.
If it reproduces in clang I've had a lot of luck getting help from those folks.
In general the best thing to do if you think you have a compiler bug it to write a small command line program that reproduces the issues. If it is a code generation issue, you don't even need to make the code link or run, you can just compile the code directly to assembler and write up in the bug report why you think the code generation is in error. Basically like the examples I post to this list from time to time. Well unless it is a link time optimization issue...
I also see Liming's observation that the internal
implementation of the SynchronizationLib adds volatile
in some internal worker functions. Other implementation
artifacts include the use of a read/write memory barrier
to prevent the optimizing compiler from optimizing
across a boundary. These read/write barriers are used
in the spin lock functions, but not the Interlocked*()
functions.
I want to make sure we have studied the code generation
to see if the issue is related to volatile or a read/write
memory barrier. It could be adding volatile forced the
compiler to internally add read/write barrier.
In my understanding, MemoryFence()
[MdePkg/Library/BaseLib/X64/GccInline.c] is just a compiler-level
barrier, not a processor level barrier.
__asm__ __volatile__ ("":::"memory");
Processor level barriers shouldn't be a problem here; as far as I recall
from the SDM from a few days ago, the XCHG instruction "engages the
CPU's locking protocol even without the LOCK prefix", plus I see an
explicit LOCK in InternalSyncDecrement()
[MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c] for example.
I'm pretty sure the compiler opimized away the access because it saw no
intervening access (it doesn't know about the code running on other
CPUs). I think "volatile" and MemoryFence() -- with the latter telling
GCC that all memory should be considered clobbered -- should fix the
issue just the same, in this situation.
There is a good chance it is similar to my example from my previous email when the return value is a constant since the compiler wrote the value and remember the value it wrote. You will need to make the global volatile or place a barrier between the write and the read to tell the compiler what optimizations are legal.
There can also be other insidious things in place that hide bugs. For example in this code:
mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
mExchangeInfo->ApFunction = (VOID *) (UINTN) EarlyMPRendezvousProcedure;
//
// Send INIT IPI - SIPI to all APs
//
SendInitSipiSipiAllExcludingSelf ((UINT32)mAcpiCpuData.StartupVector);
while (mNumberToFinish > 0) {
CpuPause ();
}
What if some versions of SendInitSipiSipiAllExcludingSelf() have side effects that make it hard to optimize and some do not. You could get a different result...
Anyway I think this optimization is more subtle than my previous examples. If you just compile this code to assembler it looks correct (you see a loop). But if you do link time optimization the compiler will make more assumptions about how mNumberToFinish is used globally and be more aggressive about optimizing it out. You can get the same behavior from the assembler generation if you make mNumberToFinish a static local.
~/work/Compiler>cat barriers.c
#define _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
unsigned int mNumberToFinish = 0;
void SendInit2()
{
*(int *)0x12345678 = 0;
}
int main()
{
mNumberToFinish = 3;
SendInit2();
while (mNumberToFinish > 0) {
;
}
return 0;
}
~/work/Compiler>clang -Os -flto -g barriers.c
~/work/Compiler>lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) dis -n main -b
a.out`main:
a.out[0x100000fa7] <+0>: 55 pushq %rbp
a.out[0x100000fa8] <+1>: 48 89 e5 movq %rsp, %rbp
a.out[0x100000fab] <+4>: c7 04 25 78 56 34 12 00 00 00 00 movl $0x0, 0x12345678
a.out[0x100000fb6] <+15>: eb fe jmp 0x100000fb6 ; <+15> at barriers.c:16
~/work/Compiler>clang -Os -S barriers.c
~/work/Compiler>cat barriers.s
.globl _SendInit2
_SendInit2: ## @SendInit2
pushq %rbp
movq %rsp, %rbp
movl $0, 305419896
popq %rbp
retq
.globl _main
_main: ## @main
pushq %rbp
movq %rsp, %rbp
movl $3, _mNumberToFinish(%rip)
xorl %eax, %eax
movl %eax, 305419896
cmpl %eax, _mNumberToFinish(%rip)
je LBB1_2
LBB1_1: ## =>This Inner Loop Header: Depth=1
jmp LBB1_1
LBB1_2:
xorl %eax, %eax
popq %rbp
retq
.globl _mNumberToFinish ## @mNumberToFinish
.zerofill __DATA,__common,_mNumberToFinish,4,2
Live in fear of the optimizer as it is getting smarter every day. Sorry if I picked the wrong bit of code as the example.
Thanks,
Andrew Fish
PS This is a good example of how barriers don't do what you may think. Notice how the barrier only adds a single read and then a return or an infinite loop. It makes sense if you think about it.... After the barrier the compiler is clueless as to what is going on so it reads the global. Given the global can't change it decides to return or fall into the infinite loop. This is really good example of the difference between a barrier and a volatile variable. This really highlights that barrier is a one shot vs. the volatile is a property for the variable.
~/work/Compiler>cat barriers.c
#define _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
unsigned int mNumberToFinish = 0;
void SendInit2()
{
*(int *)0x12345678 = 0;
}
int main()
{
mNumberToFinish = 3;
SendInit2();
_ReadWriteBarrier();
while (mNumberToFinish > 0) {
;
}
return 0;
}
~/work/Compiler>clang -Os -flto -g barriers.c
~/work/Compiler>lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) dis -n main -b
a.out`main:
a.out[0x100000f8d] <+0>: 55 pushq %rbp
a.out[0x100000f8e] <+1>: 48 89 e5 movq %rsp, %rbp
a.out[0x100000f91] <+4>: c6 05 68 00 00 00 01 movb $0x1, 0x68(%rip)
a.out[0x100000f98] <+11>: c7 04 25 78 56 34 12 00 00 00 00 movl $0x0, 0x12345678
a.out[0x100000fa3] <+22>: 0f b6 05 56 00 00 00 movzbl 0x56(%rip), %eax ; mNumberToFinish
a.out[0x100000faa] <+29>: 83 e0 01 andl $0x1, %eax
a.out[0x100000fad] <+32>: 83 f8 01 cmpl $0x1, %eax
a.out[0x100000fb0] <+35>: 75 02 jne 0x100000fb4 ; <+39> at barriers.c:21
a.out[0x100000fb2] <+37>: eb fe jmp 0x100000fb2 ; <+37> at barriers.c:17
a.out[0x100000fb4] <+39>: 31 c0 xorl %eax, %eax
a.out[0x100000fb6] <+41>: 5d popq %rbp
a.out[0x100000fb7] <+42>: c3 retq
Also, given the implementation I see in the SynchronizationLib
I am not sure the cast to (UINT32 *) is required in the
proposed patch. Do we get compiler warnings/errors if those
casts are not included?
Yes.
Passing an argument to a function whose prototype is visible invokes
type conversion just like in assignment:
6.5.2.2 Function calls
7 If the expression that denotes the called function has a type that
does include a prototype, the arguments are implicitly converted,
as if by assignment, to the types of the corresponding parameters,
taking the type of each parameter to be the unqualified version
of its declared type. [...]
(IMPORTANT: the de-qualification mentioned above refers to the pointer
object itself, not to the pointed-to object! It just means if you have a
function parameter "const long x", and pass the argument 5 (which has
type int), then it is converted to "long", and not "const long".)
So, passing a (volatile UINT32 *) argument to a (UINT32 *) parameter is
the same as the similar assignment
volatile UINT32 *x;
UINT32 *y;
y = x;
Then, regarding assignment, we have
6.5.16.1 Simple assignment
Constraints
1 One of the following shall hold:
[...]
- both operands are pointers to qualified or unqualified versions
of compatible types, and the type pointed to by the left has all
the qualifiers of the type pointed to by the right;
[...]
Finally we have diagnostics,
5.1.1.3 Diagnostics
1 A conforming implementation shall produce at least one diagnostic
message (identified in an implementation-defined manner) if a
preprocessing translation unit or translation unit contains a
violation of any syntax rule or constraint, even if the behavior is
also explicitly specified as undefined or implementation-defined.
[...]
In summary, without adding the explicit cast, we break the constraint on
simple assignment, which requires the compiler to emit a diagnostic message.
(In turn if we add the explicit cast, we break other requirements, as
discussed in my other email -- and here we should investigate whether we
care about breaking that exact rule.)
The second topic is what the SynchronizationLib API interfaces
should have been from the beginning. In retrospect, I think
they should have been defined with volatile pointers.
I agree!
The spin
lock APIs do use volatile pointers, but that is embedded in the
typedef for SPIN_LOCK, so it is not as obvious.
Thanks
Laszlo
-----Original Message-----
From: Laszlo Ersek [mailto:lersek@redhat.com]
Sent: Tuesday, November 15, 2016 8:10 AM
To: Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>
Cc: Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Tian,
Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Subject: Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
Jeff,
On 11/15/16 15:08, Jeff Fan wrote:
v2:
Add patch #1 per Laszlo's comments
at https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html
About the comments updated SynchronizationLib to add volatile for
input parameter, I will send in another serial patches.
Cc: Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>
Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
Cc: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
Jeff Fan (2):
UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
4 files changed, 6 insertions(+), 6 deletions(-)
if you want to keep GCC5.4 from optimizing away the access, the
synchronization object itself, and all pointers to it must remain
volatile. Wherever you cast away the volatile qualifier, for example in
a function call, GCC can break code on the next level, even if you don't
actually access the object through that pointer (i.e., if you cast the
pointer back to volatile just in time for the access).
So, the safe way to go about this is to change function prototypes from
callee towards callers -- first change the callee (because both volatile
and non-volatile can be accepted as volatile), then change the caller
(make sure what you pass in is volatile, and propagate it outwards).
It is also okay to convert the original volatile pointer to UINTN, and
to pass it to assembly code like that, or to convert it back to a
volatile pointer from UINTN before use.
>From the C99 standard:
6.3 Conversions
6.3.2.3 Pointers
2 For any qualifier q, a pointer to a non-q-qualified type may be
converted to a pointer to the q-qualified version of the type; the
values stored in the original and converted pointers shall compare
equal.
5 An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might
not be correctly aligned, might not point to an entity of the
referenced type, and might be a trap representation.
6 Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
6.7.3 Type qualifiers, paragraph 5:
If an attempt is made to modify an object defined with a
const-qualified type through use of an lvalue with
non-const-qualified type, the behavior is undefined. If an attempt
is made to refer to an object defined with a volatile-qualified
type through use of an lvalue with non-volatile-qualified type, the
behavior is undefined.
In summary:
- casting away "volatile" even just temporarily (without actual
accesses) may give gcc license to break the code (6.3.2.3 p2)
- accessing without volatile is known to break the code (6.7.3 p5)
- you can always cast from non-volatile to volatile (6.3.2.3 p2),
but not the other way around!
- you can cast from (volatile VOID *) to UINTN and back as much as
you want (6.3.2.3 p5 p6), because our execution environment makes
that safe ("implementation-defined")
We might want to play loose with 6.3.2.3 p2 -- that is, cast away
volatile from the pointer only temporarily, and cast it back just before
accessing the object through the pointer --, but that could be unsafe in
the long term. The *really* safe method is to cast it to UINTN, and then
back the same way.
Yes, this would affect functions like SwitchStack() too -- the Context1
and Context2 parameters would have to change their types to UINTN.
I think what we should discuss at this point is whether we'd like to
care about 6.3.2.3 p2; that is, whether we consider casting away
volatile temporarily.
The direction I've been experiencing with GCC is that its optimization
methods are becoming more aggressive. For some optimizations, there are
flags that disable them; I'm not sure they provide a specific flag for
preventing GCC from exploiting violations of 6.3.2.3 p2.
Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
https://lists.01.org/mailman/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
2016-11-17 1:10 ` Kinney, Michael D
@ 2016-11-19 0:34 ` Andrew Fish
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Fish @ 2016-11-19 0:34 UTC (permalink / raw)
To: Mike Kinney
Cc: Laszlo Ersek, Paolo Bonzini, edk2-devel@ml01.01.org, Tian, Feng,
Yao, Jiewen, Fan, Jeff
> On Nov 16, 2016, at 5:10 PM, Kinney, Michael D <michael.d.kinney@intel.com> wrote:
>
> Hi Andrew,
>
> Thanks for all the examples!
>
Mike,
Thanks for driving the fixes.
I find the examples make it more clear what is going on, and make the language make more sense. It will be good to have these examples on the mailing list for future reference.
Thanks,
Andrew Fish
> I have done a more detailed review of the BaseSynchronizationLib. I do not think there are any missing memory barriers in the implementation.
>
> I have also experimented with adding volatile qualifier to parameters in the Interlocked*() functions, and for the builds I have done so far, that appears to be a compatible change. I did have to update the internal functions in that lib to also consistently use volatile, but that makes sense based on the ANSI C spec items that Laszlo quoted.
>
> I have also been looking at Jeff's patch to CpuS3.c to add volatile to the mNumberToFinish global variable, and I have a slightly different patch I have evaluated to avoid the type case from a volatile variable to a non-volatile variable in all cases. It does require that the calls to SwitchStack() and DisablePaging64() treat the parameter as a UINTN address of the semaphore, instead of a volatile pointer to the semaphore.
>
> I will send some patches soon for review.
>
> Mike
>
> From: afish@apple.com [mailto:afish@apple.com]
> Sent: Wednesday, November 16, 2016 1:14 PM
> To: Laszlo Ersek <lersek@redhat.com>
> Cc: Kinney, Michael D <michael.d.kinney@intel.com>; Fan, Jeff <jeff.fan@intel.com>; edk2-devel@ml01.01.org; Paolo Bonzini <pbonzini@redhat.com>; Tian, Feng <feng.tian@intel.com>; Yao, Jiewen <jiewen.yao@intel.com>
> Subject: Re: [edk2] [PATCH v2 0/2] Add volatile for mNumberToFinish
>
>
> On Nov 16, 2016, at 11:21 AM, Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>> wrote:
>
> On 11/16/16 19:18, Kinney, Michael D wrote:
>
> Laszlo,
>
> Thanks for the details from and ANSI C spec.
>
> For this compiler issue, are there more details on the
> assembly code generated by the GCC 5.4 compiler in the
> failing mode?
>
> Unfortunately, I have little first-hand experience with this kind of
> failure, I've just read different accounts of it. Perhaps Paolo can
> comment -- my experience with contacting GCC maintainers directly hasn't
> been stellar.
>
>
> If it reproduces in clang I've had a lot of luck getting help from those folks.
>
> In general the best thing to do if you think you have a compiler bug it to write a small command line program that reproduces the issues. If it is a code generation issue, you don't even need to make the code link or run, you can just compile the code directly to assembler and write up in the bug report why you think the code generation is in error. Basically like the examples I post to this list from time to time. Well unless it is a link time optimization issue...
>
>
> I also see Liming's observation that the internal
> implementation of the SynchronizationLib adds volatile
> in some internal worker functions. Other implementation
> artifacts include the use of a read/write memory barrier
> to prevent the optimizing compiler from optimizing
> across a boundary. These read/write barriers are used
> in the spin lock functions, but not the Interlocked*()
> functions.
>
> I want to make sure we have studied the code generation
> to see if the issue is related to volatile or a read/write
> memory barrier. It could be adding volatile forced the
> compiler to internally add read/write barrier.
>
> In my understanding, MemoryFence()
> [MdePkg/Library/BaseLib/X64/GccInline.c] is just a compiler-level
> barrier, not a processor level barrier.
>
> __asm__ __volatile__ ("":::"memory");
>
> Processor level barriers shouldn't be a problem here; as far as I recall
> from the SDM from a few days ago, the XCHG instruction "engages the
> CPU's locking protocol even without the LOCK prefix", plus I see an
> explicit LOCK in InternalSyncDecrement()
> [MdePkg/Library/BaseSynchronizationLib/X64/GccInline.c] for example.
>
> I'm pretty sure the compiler opimized away the access because it saw no
> intervening access (it doesn't know about the code running on other
> CPUs). I think "volatile" and MemoryFence() -- with the latter telling
> GCC that all memory should be considered clobbered -- should fix the
> issue just the same, in this situation.
>
>
> There is a good chance it is similar to my example from my previous email when the return value is a constant since the compiler wrote the value and remember the value it wrote. You will need to make the global volatile or place a barrier between the write and the read to tell the compiler what optimizations are legal.
>
> There can also be other insidious things in place that hide bugs. For example in this code:
>
> mNumberToFinish = mAcpiCpuData.NumberOfCpus - 1;
> mExchangeInfo->ApFunction = (VOID *) (UINTN) EarlyMPRendezvousProcedure;
>
> //
> // Send INIT IPI - SIPI to all APs
> //
> SendInitSipiSipiAllExcludingSelf ((UINT32)mAcpiCpuData.StartupVector);
>
> while (mNumberToFinish > 0) {
> CpuPause ();
> }
>
> What if some versions of SendInitSipiSipiAllExcludingSelf() have side effects that make it hard to optimize and some do not. You could get a different result...
>
> Anyway I think this optimization is more subtle than my previous examples. If you just compile this code to assembler it looks correct (you see a loop). But if you do link time optimization the compiler will make more assumptions about how mNumberToFinish is used globally and be more aggressive about optimizing it out. You can get the same behavior from the assembler generation if you make mNumberToFinish a static local.
>
> ~/work/Compiler>cat barriers.c
>
> #define _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
>
> unsigned int mNumberToFinish = 0;
>
> void SendInit2()
> {
> *(int *)0x12345678 = 0;
> }
>
> int main()
> {
> mNumberToFinish = 3;
>
> SendInit2();
> while (mNumberToFinish > 0) {
> ;
> }
>
> return 0;
> }
>
>
> ~/work/Compiler>clang -Os -flto -g barriers.c
> ~/work/Compiler>lldb a.out
> (lldb) target create "a.out"
> Current executable set to 'a.out' (x86_64).
> (lldb) dis -n main -b
> a.out`main:
> a.out[0x100000fa7] <+0>: 55 pushq %rbp
> a.out[0x100000fa8] <+1>: 48 89 e5 movq %rsp, %rbp
> a.out[0x100000fab] <+4>: c7 04 25 78 56 34 12 00 00 00 00 movl $0x0, 0x12345678
> a.out[0x100000fb6] <+15>: eb fe jmp 0x100000fb6 ; <+15> at barriers.c:16
>
>
> ~/work/Compiler>clang -Os -S barriers.c
> ~/work/Compiler>cat barriers.s
>
> .globl _SendInit2
> _SendInit2: ## @SendInit2
> pushq %rbp
> movq %rsp, %rbp
> movl $0, 305419896
> popq %rbp
> retq
>
>
> .globl _main
> _main: ## @main
> pushq %rbp
> movq %rsp, %rbp
> movl $3, _mNumberToFinish(%rip)
> xorl %eax, %eax
> movl %eax, 305419896
> cmpl %eax, _mNumberToFinish(%rip)
> je LBB1_2
> LBB1_1: ## =>This Inner Loop Header: Depth=1
> jmp LBB1_1
> LBB1_2:
> xorl %eax, %eax
> popq %rbp
> retq
>
> .globl _mNumberToFinish ## @mNumberToFinish
> .zerofill __DATA,__common,_mNumberToFinish,4,2
>
>
> Live in fear of the optimizer as it is getting smarter every day. Sorry if I picked the wrong bit of code as the example.
>
> Thanks,
>
> Andrew Fish
>
> PS This is a good example of how barriers don't do what you may think. Notice how the barrier only adds a single read and then a return or an infinite loop. It makes sense if you think about it.... After the barrier the compiler is clueless as to what is going on so it reads the global. Given the global can't change it decides to return or fall into the infinite loop. This is really good example of the difference between a barrier and a volatile variable. This really highlights that barrier is a one shot vs. the volatile is a property for the variable.
>
> ~/work/Compiler>cat barriers.c
>
> #define _ReadWriteBarrier() do { __asm__ __volatile__ ("": : : "memory"); } while(0)
>
> unsigned int mNumberToFinish = 0;
>
> void SendInit2()
> {
> *(int *)0x12345678 = 0;
> }
>
> int main()
> {
> mNumberToFinish = 3;
>
> SendInit2();
> _ReadWriteBarrier();
> while (mNumberToFinish > 0) {
> ;
> }
>
> return 0;
> }
>
>
> ~/work/Compiler>clang -Os -flto -g barriers.c
> ~/work/Compiler>lldb a.out
> (lldb) target create "a.out"
> Current executable set to 'a.out' (x86_64).
> (lldb) dis -n main -b
> a.out`main:
> a.out[0x100000f8d] <+0>: 55 pushq %rbp
> a.out[0x100000f8e] <+1>: 48 89 e5 movq %rsp, %rbp
> a.out[0x100000f91] <+4>: c6 05 68 00 00 00 01 movb $0x1, 0x68(%rip)
> a.out[0x100000f98] <+11>: c7 04 25 78 56 34 12 00 00 00 00 movl $0x0, 0x12345678
> a.out[0x100000fa3] <+22>: 0f b6 05 56 00 00 00 movzbl 0x56(%rip), %eax ; mNumberToFinish
> a.out[0x100000faa] <+29>: 83 e0 01 andl $0x1, %eax
> a.out[0x100000fad] <+32>: 83 f8 01 cmpl $0x1, %eax
> a.out[0x100000fb0] <+35>: 75 02 jne 0x100000fb4 ; <+39> at barriers.c:21
> a.out[0x100000fb2] <+37>: eb fe jmp 0x100000fb2 ; <+37> at barriers.c:17
> a.out[0x100000fb4] <+39>: 31 c0 xorl %eax, %eax
> a.out[0x100000fb6] <+41>: 5d popq %rbp
> a.out[0x100000fb7] <+42>: c3 retq
>
>
>
>
>
> Also, given the implementation I see in the SynchronizationLib
> I am not sure the cast to (UINT32 *) is required in the
> proposed patch. Do we get compiler warnings/errors if those
> casts are not included?
>
> Yes.
>
> Passing an argument to a function whose prototype is visible invokes
> type conversion just like in assignment:
>
> 6.5.2.2 Function calls
>
> 7 If the expression that denotes the called function has a type that
> does include a prototype, the arguments are implicitly converted,
> as if by assignment, to the types of the corresponding parameters,
> taking the type of each parameter to be the unqualified version
> of its declared type. [...]
>
> (IMPORTANT: the de-qualification mentioned above refers to the pointer
> object itself, not to the pointed-to object! It just means if you have a
> function parameter "const long x", and pass the argument 5 (which has
> type int), then it is converted to "long", and not "const long".)
>
> So, passing a (volatile UINT32 *) argument to a (UINT32 *) parameter is
> the same as the similar assignment
>
> volatile UINT32 *x;
> UINT32 *y;
> y = x;
>
> Then, regarding assignment, we have
>
> 6.5.16.1 Simple assignment
>
> Constraints
>
> 1 One of the following shall hold:
>
> [...]
> - both operands are pointers to qualified or unqualified versions
> of compatible types, and the type pointed to by the left has all
> the qualifiers of the type pointed to by the right;
> [...]
>
> Finally we have diagnostics,
>
> 5.1.1.3 Diagnostics
>
> 1 A conforming implementation shall produce at least one diagnostic
> message (identified in an implementation-defined manner) if a
> preprocessing translation unit or translation unit contains a
> violation of any syntax rule or constraint, even if the behavior is
> also explicitly specified as undefined or implementation-defined.
> [...]
>
> In summary, without adding the explicit cast, we break the constraint on
> simple assignment, which requires the compiler to emit a diagnostic message.
>
> (In turn if we add the explicit cast, we break other requirements, as
> discussed in my other email -- and here we should investigate whether we
> care about breaking that exact rule.)
>
>
> The second topic is what the SynchronizationLib API interfaces
> should have been from the beginning. In retrospect, I think
> they should have been defined with volatile pointers.
>
> I agree!
>
>
> The spin
> lock APIs do use volatile pointers, but that is embedded in the
> typedef for SPIN_LOCK, so it is not as obvious.
>
> Thanks
> Laszlo
>
>
> -----Original Message-----
> From: Laszlo Ersek [mailto:lersek@redhat.com]
> Sent: Tuesday, November 15, 2016 8:10 AM
> To: Fan, Jeff <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>; edk2-devel@ml01.01.org<mailto:edk2-devel@ml01.01.org>
> Cc: Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>; Yao, Jiewen <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>; Tian,
> Feng <feng.tian@intel.com<mailto:feng.tian@intel.com>>; Kinney, Michael D <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Subject: Re: [PATCH v2 0/2] Add volatile for mNumberToFinish
>
> Jeff,
>
> On 11/15/16 15:08, Jeff Fan wrote:
>
> v2:
> Add patch #1 per Laszlo's comments
> at https://lists.01.org/pipermail/edk2-devel/2016-November/004697.html
>
> About the comments updated SynchronizationLib to add volatile for
> input parameter, I will send in another serial patches.
>
> Cc: Paolo Bonzini <pbonzini@redhat.com<mailto:pbonzini@redhat.com>>
> Cc: Laszlo Ersek <lersek@redhat.com<mailto:lersek@redhat.com>>
> Cc: Jiewen Yao <jiewen.yao@intel.com<mailto:jiewen.yao@intel.com>>
> Cc: Feng Tian <feng.tian@intel.com<mailto:feng.tian@intel.com>>
> Cc: Michael D Kinney <michael.d.kinney@intel.com<mailto:michael.d.kinney@intel.com>>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Jeff Fan <jeff.fan@intel.com<mailto:jeff.fan@intel.com>>
>
> Jeff Fan (2):
> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish
> UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish
>
> UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 4 ++--
> UefiCpuPkg/PiSmmCpuDxeSmm/Ia32/SmmFuncsArch.c | 4 ++--
> UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 2 +-
> UefiCpuPkg/PiSmmCpuDxeSmm/X64/SmmFuncsArch.c | 2 +-
> 4 files changed, 6 insertions(+), 6 deletions(-)
>
> if you want to keep GCC5.4 from optimizing away the access, the
> synchronization object itself, and all pointers to it must remain
> volatile. Wherever you cast away the volatile qualifier, for example in
> a function call, GCC can break code on the next level, even if you don't
> actually access the object through that pointer (i.e., if you cast the
> pointer back to volatile just in time for the access).
>
> So, the safe way to go about this is to change function prototypes from
> callee towards callers -- first change the callee (because both volatile
> and non-volatile can be accepted as volatile), then change the caller
> (make sure what you pass in is volatile, and propagate it outwards).
>
> It is also okay to convert the original volatile pointer to UINTN, and
> to pass it to assembly code like that, or to convert it back to a
> volatile pointer from UINTN before use.
>
> From the C99 standard:
>
> 6.3 Conversions
> 6.3.2.3 Pointers
>
> 2 For any qualifier q, a pointer to a non-q-qualified type may be
> converted to a pointer to the q-qualified version of the type; the
> values stored in the original and converted pointers shall compare
> equal.
>
> 5 An integer may be converted to any pointer type. Except as
> previously specified, the result is implementation-defined, might
> not be correctly aligned, might not point to an entity of the
> referenced type, and might be a trap representation.
>
> 6 Any pointer type may be converted to an integer type. Except as
> previously specified, the result is implementation-defined. If the
> result cannot be represented in the integer type, the behavior is
> undefined. The result need not be in the range of values of any
> integer type.
>
> 6.7.3 Type qualifiers, paragraph 5:
>
> If an attempt is made to modify an object defined with a
> const-qualified type through use of an lvalue with
> non-const-qualified type, the behavior is undefined. If an attempt
> is made to refer to an object defined with a volatile-qualified
> type through use of an lvalue with non-volatile-qualified type, the
> behavior is undefined.
>
> In summary:
>
> - casting away "volatile" even just temporarily (without actual
> accesses) may give gcc license to break the code (6.3.2.3 p2)
>
> - accessing without volatile is known to break the code (6.7.3 p5)
>
> - you can always cast from non-volatile to volatile (6.3.2.3 p2),
> but not the other way around!
>
> - you can cast from (volatile VOID *) to UINTN and back as much as
> you want (6.3.2.3 p5 p6), because our execution environment makes
> that safe ("implementation-defined")
>
> We might want to play loose with 6.3.2.3 p2 -- that is, cast away
> volatile from the pointer only temporarily, and cast it back just before
> accessing the object through the pointer --, but that could be unsafe in
> the long term. The *really* safe method is to cast it to UINTN, and then
> back the same way.
>
> Yes, this would affect functions like SwitchStack() too -- the Context1
> and Context2 parameters would have to change their types to UINTN.
>
> I think what we should discuss at this point is whether we'd like to
> care about 6.3.2.3 p2; that is, whether we consider casting away
> volatile temporarily.
>
> The direction I've been experiencing with GCC is that its optimization
> methods are becoming more aggressive. For some optimizations, there are
> flags that disable them; I'm not sure they provide a specific flag for
> preventing GCC from exploiting violations of 6.3.2.3 p2.
>
> Thanks
> Laszlo
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org<mailto:edk2-devel@lists.01.org>
> https://lists.01.org/mailman/listinfo/edk2-devel
>
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-11-19 0:34 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-15 14:08 [PATCH v2 0/2] Add volatile for mNumberToFinish Jeff Fan
2016-11-15 14:08 ` [PATCH v2 1/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for parameter NumberToFinish Jeff Fan
2016-11-15 14:08 ` [PATCH v2 2/2] UefiCpuPkg/PiSmmCpuDxeSmm: Add volatile for mNumberToFinish Jeff Fan
2016-11-15 16:10 ` [PATCH v2 0/2] " Laszlo Ersek
2016-11-16 18:18 ` Kinney, Michael D
2016-11-16 19:21 ` Laszlo Ersek
2016-11-16 21:14 ` Andrew Fish
2016-11-17 1:10 ` Kinney, Michael D
2016-11-19 0:34 ` Andrew Fish
2016-11-16 19:40 ` Andrew Fish
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox