From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=eric.dong@intel.com; receiver=edk2-devel@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 8A47221163DE9 for ; Thu, 18 Oct 2018 19:06:44 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Oct 2018 19:06:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,398,1534834800"; d="scan'208";a="266942995" Received: from ydong10-win10.ccr.corp.intel.com ([10.239.9.125]) by orsmga005.jf.intel.com with ESMTP; 18 Oct 2018 19:06:27 -0700 From: Eric Dong To: edk2-devel@lists.01.org Cc: Ruiyu Ni , Laszlo Ersek Date: Fri, 19 Oct 2018 10:06:16 +0800 Message-Id: <20181019020622.6864-1-eric.dong@intel.com> X-Mailer: git-send-email 2.15.0.windows.1 Subject: [Patch v4 0/6] Fix performance issue caused by Set MSR task X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Oct 2018 02:06:44 -0000 V4 changes include: 1. Add SpinLock to serial console log for different threads when set register table. 2. For PiSmmCpuDxeSmm driver, check the AcpiCpuData structure data before use it to avoid potential assert case. V3 changes include: 1. Add comments for some complicate logic. 2. Use global variable instead of internal function to return string for register type and dependence type. 3. Verify the solution with internal server platform(2 socket/56 Core/112 thread), set MSRs costs time change from 2849ms to 87ms. V2 changes include: 1. Include the change for CpuCommonFeaturesLib which used to set MSR base on its scope info. 2. Include the change for CpuS3DataDxe driver which also handle the AcpiCpuData data. 3. Update code base on feedback for V1 changes. V1 changes include: In a system which has multiple cores, current set register value task costs huge times. After investigation, current set MSR task costs most of the times. Current logic uses SpinLock to let set MSR task as an single thread task for all cores. Because MSR has scope attribute which may cause GP fault if multiple APs set MSR at the same time, current logic use an easiest solution (use SpinLock) to avoid this issue, but it will cost huge times. In order to fix this performance issue, new solution will set MSRs base on their scope attribute. After this, the SpinLock will not needed. Without SpinLock, new issue raised which is caused by MSR dependence. For example, MSR A depends on MSR B which means MSR A must been set after MSR B has been set. Also MSR B is package scope level and MSR A is thread scope level. If system has multiple threads, Thread 1 needs to set the thread level MSRs and thread 2 needs to set thread and package level MSRs. Set MSRs task for thread 1 and thread 2 like below: Thread 1 Thread 2 MSR B N Y MSR A Y Y If driver don't control execute MSR order, for thread 1, it will execute MSR A first, but at this time, MSR B not been executed yet by thread 2. system may trig exception at this time. In order to fix the above issue, driver introduces semaphore logic to control the MSR execute sequence. For the above case, a semaphore will be add between MSR A and B for all threads. Semaphore has scope info for it. The possible scope value is core or package. For each thread, when it meets a semaphore during it set registers, it will 1) release semaphore (+1) for each threads in this core or package(based on the scope info for this semaphore) 2) acquire semaphore (-1) for all the threads in this core or package(based on the scope info for this semaphore). With these two steps, driver can control MSR sequence. Sample code logic like below: // // First increase semaphore count by 1 for processors in this package. // for (ProcessorIndex = 0; ProcessorIndex < PackageThreadsCount ; ProcessorIndex ++) { LibReleaseSemaphore ((UINT32 *) &SemaphorePtr[PackageOffset + ProcessorIndex]); } // // Second, check whether the count has reach the check number. // for (ProcessorIndex = 0; ProcessorIndex < ValidApCount; ProcessorIndex ++) { LibWaitForSemaphore (&SemaphorePtr[ApOffset]); } Platform Requirement: 1. This change requires register MSR setting base on MSR scope info. If still register MSR for all threads, exception may raised. Known limitation: 1. Current CpuFeatures driver supports DXE instance and PEI instance. But semaphore logic requires Aps execute in async mode which is not supported by PEI driver. So CpuFeature PEI instance not works after this change. We plan to support async mode for PEI in phase 2 for this task. 2. Current execute MSR task code in duplicated in PiSmmCpuDxeSmm driver and RegisterCpuFeaturesLib library because the schedule limitation. Will merge the code to RegisterCpuFeaturesLib and export as an API in phase 2 for this task. Cc: Ruiyu Ni Cc: Laszlo Ersek Contributed-under: TianoCore Contribution Agreement 1.1 Signed-off-by: Eric Dong Eric Dong (6): UefiCpuPkg/Include/AcpiCpuData.h: Add Semaphore related Information. UefiCpuPkg/RegisterCpuFeaturesLib.h: Add new dependence types. UefiCpuPkg/RegisterCpuFeaturesLib: Add logic to support semaphore type. UefiCpuPkg/PiSmmCpuDxeSmm: Add logic to support semaphore type. UefiCpuPkg/CpuS3DataDxe: Keep old data if value already existed. UefiCpuPkg/CpuCommonFeaturesLib: Register MSR base on scope Info. UefiCpuPkg/CpuS3DataDxe/CpuS3Data.c | 2 + UefiCpuPkg/Include/AcpiCpuData.h | 67 ++- .../Include/Library/RegisterCpuFeaturesLib.h | 21 +- UefiCpuPkg/Library/CpuCommonFeaturesLib/C1e.c | 8 + UefiCpuPkg/Library/CpuCommonFeaturesLib/Eist.c | 12 + .../Library/CpuCommonFeaturesLib/ExecuteDisable.c | 10 + .../Library/CpuCommonFeaturesLib/FastStrings.c | 12 + .../Library/CpuCommonFeaturesLib/FeatureControl.c | 38 ++ .../CpuCommonFeaturesLib/LimitCpuIdMaxval.c | 14 + .../Library/CpuCommonFeaturesLib/MachineCheck.c | 38 ++ .../Library/CpuCommonFeaturesLib/MonitorMwait.c | 15 + .../Library/CpuCommonFeaturesLib/PendingBreak.c | 11 + UefiCpuPkg/Library/CpuCommonFeaturesLib/Ppin.c | 11 + .../Library/CpuCommonFeaturesLib/ProcTrace.c | 11 + UefiCpuPkg/Library/CpuCommonFeaturesLib/X2Apic.c | 10 + .../RegisterCpuFeaturesLib/CpuFeaturesInitialize.c | 396 ++++++++++++++--- .../DxeRegisterCpuFeaturesLib.c | 71 ++- .../DxeRegisterCpuFeaturesLib.inf | 2 + .../PeiRegisterCpuFeaturesLib.c | 55 ++- .../RegisterCpuFeaturesLib/RegisterCpuFeatures.h | 62 ++- .../RegisterCpuFeaturesLib.c | 485 ++++++++++++++++++--- UefiCpuPkg/PiSmmCpuDxeSmm/CpuS3.c | 413 ++++++++++++------ UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 3 - UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 3 +- 24 files changed, 1489 insertions(+), 281 deletions(-) -- 2.15.0.windows.1