From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=lersek@redhat.com; receiver=edk2-devel@lists.01.org Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id EED0E21B02822 for ; Mon, 15 Oct 2018 08:51:59 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8CD1EC0BEAA3; Mon, 15 Oct 2018 15:51:58 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-121-109.rdu2.redhat.com [10.10.121.109]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5258D77100; Mon, 15 Oct 2018 15:51:57 +0000 (UTC) To: Eric Dong , edk2-devel@lists.01.org Cc: Ruiyu Ni References: <20181015024948.228-1-eric.dong@intel.com> From: Laszlo Ersek Message-ID: <69fcdb2b-26ca-d786-f3e9-2080305afe02@redhat.com> Date: Mon, 15 Oct 2018 17:51:51 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20181015024948.228-1-eric.dong@intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Mon, 15 Oct 2018 15:51:58 +0000 (UTC) Subject: Re: [Patch 0/4] Fix performance issue caused by Set MSR task. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2018 15:52:00 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Hi Eric, On 10/15/18 04:49, Eric Dong wrote: > In a system which has multiple cores, current set register value task costs huge times. > After investigation, current set MSR task costs most of the times. Current logic uses > SpinLock to let set MSR task as an single thread task for all cores. Because MSR has > scope attribute which may cause GP fault if multiple APs set MSR at the same time, > current logic use an easiest solution (use SpinLock) to avoid this issue, but it will > cost huge times. > > In order to fix this performance issue, new solution will set MSRs base on their scope > attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised > which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A > must been set after MSR B has been set. Also MSR B is package scope level and MSR A is > thread scope level. If system has multiple threads, Thread 1 needs to set the thread level > MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1 > and thread 2 like below: > > Thread 1 Thread 2 > MSR B N Y > MSR A Y Y > > If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but > at this time, MSR B not been executed yet by thread 2. system may trig exception at this > time. > > In order to fix the above issue, driver introduces semaphore logic to control the MSR > execute sequence. For the above case, a semaphore will be add between MSR A and B for > all threads. Semaphore has scope info for it. The possible scope value is core or package. > For each thread, when it meets a semaphore during it set registers, it will 1) release > semaphore (+1) for each threads in this core or package(based on the scope info for this > semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based > on the scope info for this semaphore). With these two steps, driver can control MSR > sequence. Sample code logic like below: > > // > // First increase semaphore count by 1 for processors in this package. > // > for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) { > LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]); > } > // > // Second, check whether the count has reach the check number. > // > for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) { > LibWaitForSemaphore (&SemaphorePtr[ApOffset]); > } > > Platform Requirement: > 1. This change requires register MSR setting base on MSR scope info. If still register MSR > for all threads, exception may raised. Do you mean that platforms are responsible for updating their register tables in: - ACPI_CPU_DATA.PreSmmInitRegisterTable, - ACPI_CPU_DATA.RegisterTable so that the tables utilize the new Semaphore REGISTER_TYPE as appropriate? > > Known limitation: > 1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic > requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature > PEI instance not works after this change. We plan to support async mode for PEI in phase > 2 for this task. > 2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm driver and > RegisterCpuFeaturesLib library because the schedule limitation. I don't understand what you mean by "schedule limitation". Are you alluding to the upcoming edk2 stable tag (in November), or some other schedule? > Will merge the code to > RegisterCpuFeaturesLib and export as an API in phase 2 for this task. While I agree that common code (especially complex code like this) should belong to libraries, there are platforms that consume PiSmmCpuDxeSmm, but don't consume RegisterCpuFeaturesLib in any way. Do you plan to add the new function(s) to a RegisterCpuFeaturesLib instance, and make PiSmmCpuDxeSmm dependent on RegisterCpuFeaturesLib? If so, I think it can work, but then the RegisterCpuFeaturesLib instance in question should do nothing at all in the constructor. On platforms that don't use this feature at all -- i.e., the Semaphore REGISTER_TYPE --, there should be no impact. (BTW, DxeRegisterCpuFeaturesLib is currently restricted to DXE_DRIVER modules.) > Extra Notes: > I will send the other patch to set MSR base on scope info and check in it before check in > this serial. I don't understand. I assume that you are referring to some concrete platform (?) where the Semaphore REGISTER_TYPE *must* be used, in order to successfully boot (and/or perform S3), if this series is applied. What platform is that? And, if that other patch is indeed a pre-requisite for *this* set (on some specific platform anyway), then people on that platform will not be able to test this series until you post those patches. My point here is that, on that platform, the testing cannot be performed in separation, so it's not enough to establish the right dependency order *just* before check-in. It should be offered on the list as well. Thanks, Laszlo > > Cc: Ruiyu Ni > Cc: Laszlo Ersek > Contributed-under: TianoCore Contribution Agreement 1.1 > Signed-off-by: Eric Dong > > Eric Dong (4): > UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information. > UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types. > UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore > type. > UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type. > > UefiCpuPkg/Include/AcpiCpuData.h | 23 +- > .../Include/Library/RegisterCpuFeaturesLib.h | 25 +- > .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 324 ++++++++++++--- > .../DxeRegisterCpuFeaturesLib.c | 71 +++- > .../DxeRegisterCpuFeaturesLib.inf | 3 + > .../PeiRegisterCpuFeaturesLib.c | 55 ++- > .../PeiRegisterCpuFeaturesLib.inf | 1 + > .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h | 51 ++- > .../RegisterCpuFeaturesLib.c | 452 ++++++++++++++++++--- > UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 316 +++++++------- > UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 3 - > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 3 +- > 12 files changed, 1063 insertions(+), 264 deletions(-) >