From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Mon, 29 Apr 2019 04:15:29 -0700 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 050353199379; Mon, 29 Apr 2019 11:15:29 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-121-31.rdu2.redhat.com [10.10.121.31]) by smtp.corp.redhat.com (Postfix) with ESMTP id B48FF60C62; Mon, 29 Apr 2019 11:15:27 +0000 (UTC) Subject: Re: [edk2-devel] [Patch 1/4] MdePkg/BaseLib: Verify SSE2 support in IA32 AsmLfence() To: devel@edk2.groups.io, brian.johnson@hpe.com, michael.d.kinney@intel.com Cc: Liming Gao References: <20190425175334.5944-1-michael.d.kinney@intel.com> <20190425175334.5944-2-michael.d.kinney@intel.com> <7b0ca337-3642-2070-0fff-515704a24bdf@hpe.com> From: "Laszlo Ersek" Message-ID: <5858ea67-9618-3345-5151-f107d190eb5f@redhat.com> Date: Mon, 29 Apr 2019 13:15:26 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <7b0ca337-3642-2070-0fff-515704a24bdf@hpe.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 29 Apr 2019 11:15:29 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 04/26/19 23:08, Brian J. Johnson wrote: > On 4/26/19 3:27 PM, Laszlo Ersek wrote: >> On 04/25/19 19:53, Michael D Kinney wrote: >>> Use CPUID in IA32 implementation of AsmLfence() to verify >>> that SSE2 is supported before using the LFENCE instruction. >>> >>> Cc: Liming Gao >>> Signed-off-by: Michael D Kinney >>> --- >>> =C2=A0 MdePkg/Library/BaseLib/Ia32/Lfence.nasm | 14 +++++++++++++- >>> =C2=A0 1 file changed, 13 insertions(+), 1 deletion(-) >>> >>> diff --git a/MdePkg/Library/BaseLib/Ia32/Lfence.nasm >>> b/MdePkg/Library/BaseLib/Ia32/Lfence.nasm >>> index 44478be35f..0a60ae1d57 100644 >>> --- a/MdePkg/Library/BaseLib/Ia32/Lfence.nasm >>> +++ b/MdePkg/Library/BaseLib/Ia32/Lfence.nasm >>> @@ -1,5 +1,5 @@ >>> =C2=A0 >>> ;--------------------------------------------------------------------= ---------- >>> ; >>> -; Copyright (c) 2018, Intel Corporation. All rights reserved.
>>> +; Copyright (c) 2018 - 2019, Intel Corporation. All rights >>> reserved.
>>> =C2=A0 ; SPDX-License-Identifier: BSD-2-Clause-Patent >>> =C2=A0 ; >>> =C2=A0 ; Module Name: >>> @@ -26,5 +26,17 @@ >>> =C2=A0 >>> ;--------------------------------------------------------------------= ---------- >>> >>> =C2=A0 global ASM_PFX(AsmLfence) >>> =C2=A0 ASM_PFX(AsmLfence): >>> +=C2=A0=C2=A0=C2=A0 ; >>> +=C2=A0=C2=A0=C2=A0 ; Use CPUID instruction (CPUID.01H:EDX.SSE2[bit 2= 6] =3D 1) to test >>> whether the >>> +=C2=A0=C2=A0=C2=A0 ; processor supports SSE2 instruction.=C2=A0 Save= /restore >>> non-volatile register >>> +=C2=A0=C2=A0=C2=A0 ; EBX that is modified by CPUID >>> +=C2=A0=C2=A0=C2=A0 ; >>> +=C2=A0=C2=A0=C2=A0 push=C2=A0=C2=A0=C2=A0 ebx >>> +=C2=A0=C2=A0=C2=A0 mov=C2=A0=C2=A0=C2=A0=C2=A0 eax, 1 >>> +=C2=A0=C2=A0=C2=A0 cpuid >>> +=C2=A0=C2=A0=C2=A0 bt=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 edx, 26 >>> +=C2=A0=C2=A0=C2=A0 jnc=C2=A0=C2=A0=C2=A0=C2=A0 Done >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 lfence >>> +Done: >>> +=C2=A0=C2=A0=C2=A0 pop=C2=A0=C2=A0=C2=A0=C2=A0 ebx >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret >>> >> >> The SDM seems to confirm that lfence depends on SSE2. >> >> However, that raises another question: >> >> AsmLfence() is used for implementing SpeculationBarrier() on IA32/X64. >> And so I wonder: the plaforms where lfence is unavailable, do they *no= t* >> need a speculation barrier at all, or do they need a *different* >> implementation? >> >> Thanks, >> Laszlo >=20 > Several issues: >=20 > This patch introduces stronger fencing than is required.=C2=A0 The SDM = says, > "Reads or writes cannot be reordered with I/O instructions, locked > instructions, or serializing instructions." (vol 3a, sec 8.2.2.)=C2=A0 = The > cpuid instruction is a "serializing instruction" (sec 8.3).=C2=A0 So th= e > cpuid is essentially a load fence plus a store fence (plus other > functionality, such as draining the memory queues.)=C2=A0 That makes th= e > lfence following it redundant. >=20 > Cpuid is a fairly heavyweight instruction due to its serializing > behavior.=C2=A0 It provides the load fencing semantics of lfence, but c= an > introduce a significant performance hit if it's called often.=C2=A0 Lfe= nce is > a lot lighter weight instruction.=C2=A0 So using cpuid in AsmLfence may= make > it a lot slower than the caller expects. >=20 > Also, skipping a fencing instruction if it's not supported doesn't seem > like the right approach in any case.=C2=A0 The caller expects AsmLfence= to > provide certain fencing semantics... the implementation isn't free to > just do nothing (unless it's on an architecture where load fencing is > not required, since memory is always ordered.)=C2=A0 Plus, the routine = is > called "AsmLfence", so I'd expect it to always use lfence, and cause an > exception if the instruction isn't implemented.=C2=A0 I'd think the cal= lers > should be modified to call AsmLfence or routines using other > instructions (such as cpuid) to provide fencing according to the CPU > architecture they are on. That appears the right approach to me -- let AsmLfence do what its name says, and let platforms pick the right implementation for SpeculationBarrier -- possibly at build time, possibly at boot time. > Is there a way to make this a compile-time decision?=C2=A0 I haven't tr= ied to > track down all the callers of AsmLfence.... Right now AsmLfence is only called from "MdePkg/Library/BaseLib/X86SpeculationBarrier.c". SpeculationBarrier() is called in SMM drivers / lib instances: - FaultTolerantWriteSmm, FaultTolerantWriteStandaloneMm - SmmLockBoxLib - VariableSmm, VariableStandaloneMm - PiSmmCpuDxeSmm Given that there's a mode switch anyway, the runtime cost of a simple CPUID-based implementation might be tolerable. Thanks, Laszlo