From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Tue, 30 Apr 2019 03:03:40 -0700 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BBFD6D2EE5; Tue, 30 Apr 2019 10:03:39 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-121-42.rdu2.redhat.com [10.10.121.42]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8E6866B90B; Tue, 30 Apr 2019 10:03:38 +0000 (UTC) Subject: Re: [edk2-devel] [Patch 1/4] MdePkg/BaseLib: Verify SSE2 support in IA32 AsmLfence() To: "Kinney, Michael D" , "Gao, Liming" , "devel@edk2.groups.io" , "brian.johnson@hpe.com" References: <20190425175334.5944-1-michael.d.kinney@intel.com> <20190425175334.5944-2-michael.d.kinney@intel.com> <7b0ca337-3642-2070-0fff-515704a24bdf@hpe.com> <5858ea67-9618-3345-5151-f107d190eb5f@redhat.com> <4A89E2EF3DFEDB4C8BFDE51014F606A14E43F599@SHSMSX104.ccr.corp.intel.com> From: "Laszlo Ersek" Message-ID: <778c34ed-41b3-b523-36ed-b6c4d8fac52d@redhat.com> Date: Tue, 30 Apr 2019 12:03:37 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 30 Apr 2019 10:03:39 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 04/29/19 20:23, Kinney, Michael D wrote: > Hi, >=20 > Thanks for the detailed feedback. >=20 > I did consider moving the CPUID check up a level so > AsmLfence() would always do the requested instruction. > This would have been a larger patch set that would > introduce an IA32 and X64 specific version of > SpeculationBarrier(). >=20 > I agree that a build time PCD would is a better > approach. It should have a default value of enabled, > and only platforms that use CPUs that do not support > LFENCE would avoid the call to AsmLfence() and use an > alternate serializing instruction. >=20 > Here is the proposed PCD: >=20 > [PcdsFixedAtBuild] > ## Indicates the type of instruction sequence to use > # for a speculation barrier. The default instruction > # sequence is LFENCE.
> # 0x00 - No operation.
> # 0x01 - LFENCE (IA32/X64).
> # 0x02 - CPUID (IA32/X64).
> # Other - reserved > # @Prompt Speculation Barrier Type. > gEfiMdePkgTokenSpaceGuid.PcdSpeculationBarrierType|0x01|UINT8|0x300010= 18 Looks great to me. Thanks, Laszlo >=20 > Best regards, >=20 > Mike >=20 >> -----Original Message----- >> From: Gao, Liming >> Sent: Monday, April 29, 2019 7:10 AM >> To: devel@edk2.groups.io; lersek@redhat.com; >> brian.johnson@hpe.com; Kinney, Michael D >> >> Subject: RE: [edk2-devel] [Patch 1/4] MdePkg/BaseLib: >> Verify SSE2 support in IA32 AsmLfence() >> >>> -----Original Message----- >>> From: devel@edk2.groups.io >> [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek >>> Sent: Monday, April 29, 2019 7:15 PM >>> To: devel@edk2.groups.io; brian.johnson@hpe.com; >> Kinney, Michael D >>> Cc: Gao, Liming >>> Subject: Re: [edk2-devel] [Patch 1/4] MdePkg/BaseLib: >> Verify SSE2 support in IA32 AsmLfence() >>> >>> On 04/26/19 23:08, Brian J. Johnson wrote: >>>> On 4/26/19 3:27 PM, Laszlo Ersek wrote: >>>>> On 04/25/19 19:53, Michael D Kinney wrote: >>>>>> Use CPUID in IA32 implementation of AsmLfence() to >> verify >>>>>> that SSE2 is supported before using the LFENCE >> instruction. >>>>>> >>>>>> Cc: Liming Gao >>>>>> Signed-off-by: Michael D Kinney >> >>>>>> --- >>>>>> =C2=A0 MdePkg/Library/BaseLib/Ia32/Lfence.nasm | 14 >> +++++++++++++- >>>>>> =C2=A0 1 file changed, 13 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git >> a/MdePkg/Library/BaseLib/Ia32/Lfence.nasm >>>>>> b/MdePkg/Library/BaseLib/Ia32/Lfence.nasm >>>>>> index 44478be35f..0a60ae1d57 100644 >>>>>> --- a/MdePkg/Library/BaseLib/Ia32/Lfence.nasm >>>>>> +++ b/MdePkg/Library/BaseLib/Ia32/Lfence.nasm >>>>>> @@ -1,5 +1,5 @@ >>>>>> >>>>>> ;------------------------------------------------- >> ----------------------------- >>>>>> ; >>>>>> -; Copyright (c) 2018, Intel Corporation. All >> rights reserved.
>>>>>> +; Copyright (c) 2018 - 2019, Intel Corporation. >> All rights >>>>>> reserved.
>>>>>> =C2=A0 ; SPDX-License-Identifier: BSD-2-Clause-Patent >>>>>> =C2=A0 ; >>>>>> =C2=A0 ; Module Name: >>>>>> @@ -26,5 +26,17 @@ >>>>>> >>>>>> ;------------------------------------------------- >> ----------------------------- >>>>>> >>>>>> =C2=A0 global ASM_PFX(AsmLfence) >>>>>> =C2=A0 ASM_PFX(AsmLfence): >>>>>> +=C2=A0=C2=A0=C2=A0 ; >>>>>> +=C2=A0=C2=A0=C2=A0 ; Use CPUID instruction >> (CPUID.01H:EDX.SSE2[bit 26] =3D 1) to test >>>>>> whether the >>>>>> +=C2=A0=C2=A0=C2=A0 ; processor supports SSE2 >> instruction.=C2=A0 Save/restore >>>>>> non-volatile register >>>>>> +=C2=A0=C2=A0=C2=A0 ; EBX that is modified by CPUID >>>>>> +=C2=A0=C2=A0=C2=A0 ; >>>>>> +=C2=A0=C2=A0=C2=A0 push=C2=A0=C2=A0=C2=A0 ebx >>>>>> +=C2=A0=C2=A0=C2=A0 mov=C2=A0=C2=A0=C2=A0=C2=A0 eax, 1 >>>>>> +=C2=A0=C2=A0=C2=A0 cpuid >>>>>> +=C2=A0=C2=A0=C2=A0 bt=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 edx, 26 >>>>>> +=C2=A0=C2=A0=C2=A0 jnc=C2=A0=C2=A0=C2=A0=C2=A0 Done >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 lfence >>>>>> +Done: >>>>>> +=C2=A0=C2=A0=C2=A0 pop=C2=A0=C2=A0=C2=A0=C2=A0 ebx >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret >>>>>> >>>>> >>>>> The SDM seems to confirm that lfence depends on >> SSE2. >>>>> >>>>> However, that raises another question: >>>>> >>>>> AsmLfence() is used for implementing >> SpeculationBarrier() on IA32/X64. >>>>> And so I wonder: the plaforms where lfence is >> unavailable, do they *not* >>>>> need a speculation barrier at all, or do they need >> a *different* >>>>> implementation? >>>>> >>>>> Thanks, >>>>> Laszlo >>>> >>>> Several issues: >>>> >>>> This patch introduces stronger fencing than is >> required.=C2=A0 The SDM says, >>>> "Reads or writes cannot be reordered with I/O >> instructions, locked >>>> instructions, or serializing instructions." (vol 3a, >> sec 8.2.2.)=C2=A0 The >>>> cpuid instruction is a "serializing instruction" >> (sec 8.3).=C2=A0 So the >>>> cpuid is essentially a load fence plus a store fence >> (plus other >>>> functionality, such as draining the memory >> queues.)=C2=A0 That makes the >>>> lfence following it redundant. >>>> >>>> Cpuid is a fairly heavyweight instruction due to its >> serializing >>>> behavior.=C2=A0 It provides the load fencing semantics of >> lfence, but can >>>> introduce a significant performance hit if it's >> called often.=C2=A0 Lfence is >>>> a lot lighter weight instruction.=C2=A0 So using cpuid in >> AsmLfence may make >>>> it a lot slower than the caller expects. >>>> >>>> Also, skipping a fencing instruction if it's not >> supported doesn't seem >>>> like the right approach in any case.=C2=A0 The caller >> expects AsmLfence to >>>> provide certain fencing semantics... the >> implementation isn't free to >>>> just do nothing (unless it's on an architecture >> where load fencing is >>>> not required, since memory is always >> ordered.)=C2=A0 Plus, the routine is >>>> called "AsmLfence", so I'd expect it to always use >> lfence, and cause an >>>> exception if the instruction isn't implemented.=C2=A0 I'd >> think the callers >>>> should be modified to call AsmLfence or routines >> using other >>>> instructions (such as cpuid) to provide fencing >> according to the CPU >>>> architecture they are on. >>> >>> That appears the right approach to me -- let AsmLfence >> do what its name >>> says, and let platforms pick the right implementation >> for >>> SpeculationBarrier -- possibly at build time, possibly >> at boot time. >>> >>>> Is there a way to make this a compile-time >> decision?=C2=A0 I haven't tried to >>>> track down all the callers of AsmLfence.... >>> >>> Right now AsmLfence is only called from >>> "MdePkg/Library/BaseLib/X86SpeculationBarrier.c". >>> >> This is a good suggestion to decide it at build time. >> One PCD can be introduced to control its logic. >>> SpeculationBarrier() is called in SMM drivers / lib >> instances: >>> - FaultTolerantWriteSmm, >> FaultTolerantWriteStandaloneMm >>> - SmmLockBoxLib >>> - VariableSmm, VariableStandaloneMm >>> - PiSmmCpuDxeSmm >>> >>> Given that there's a mode switch anyway, the runtime >> cost of a simple >>> CPUID-based implementation might be tolerable. >>> >>> Thanks, >>> Laszlo >>> >>>=20 >=20