From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id C582381C92 for ; Wed, 9 Nov 2016 07:02:10 -0800 (PST) Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP; 09 Nov 2016 07:02:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.31,614,1473145200"; d="scan'208,217";a="29284293" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga005.fm.intel.com with ESMTP; 09 Nov 2016 07:02:03 -0800 Received: from fmsmsx118.amr.corp.intel.com (10.18.116.18) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 9 Nov 2016 07:02:03 -0800 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx118.amr.corp.intel.com (10.18.116.18) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 9 Nov 2016 07:02:03 -0800 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.206]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.96]) with mapi id 14.03.0248.002; Wed, 9 Nov 2016 23:01:59 +0800 From: "Yao, Jiewen" To: Paolo Bonzini , Laszlo Ersek CC: "Tian, Feng" , "edk2-devel@ml01.01.org" , "Kinney, Michael D" , "Fan, Jeff" , "Zeng, Star" Thread-Topic: [edk2] [PATCH V2 0/6] Enable SMM page level protection. Thread-Index: AQHSNn5Npc+l51c2NUCmigdj/tAIBqDNyfkAgAJjohD//9iJAIAAvK1Q Date: Wed, 9 Nov 2016 15:01:58 +0000 Message-ID: <74D8A39837DF1E4DA445A8C0B3885C50386C10BD@shsmsx102.ccr.corp.intel.com> References: <1478251854-14660-1-git-send-email-jiewen.yao@intel.com> <08406bf5-4377-63a1-8dd9-34479c015d4b@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C50386C0CB8@shsmsx102.ccr.corp.intel.com> In-Reply-To: Accept-Language: zh-CN, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] MIME-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 Subject: Re: [PATCH V2 0/6] Enable SMM page level protection. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2016 15:02:10 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable What I found is that the BSP doesn't wait for the AP rendezvous before clo= sing SMRAM. [Jiewen] That is a good catch. Thanks to explain. I believe that is more convincible than AP getting interrupt. :) We have some places where BSP talking to AP in S3. 1) CpuS3.c - EarlyInitializeCpu() 2) CpuS3.c - SmmRelocateBases() 3) CpuS3.c - InitializeCpu() 4) S3Resume.c - SendSmiIpiAllExcludingSelf() I believe we can guarantee 1/2/3 is good, because I found we check BSP chec= k mNumberToFinish. 4 is a risk, because there is no AP finish check. If the AP is in below 1M = with CR3 in SMRAM, it will be a trouble. Once the AP executes RSM and return to non-SMM, the CR3 is no longer valid = and AP must be crashed immediately. WoW! The fix, I believe, is same. We should make 1) AP is in above 1M reserved memory, and 2) AP is in protec= ted mode with paging disabled. Thank you Yao Jiewen From: Paolo Bonzini [mailto:paolo.bonzini@gmail.com] On Behalf Of Paolo Bon= zini Sent: Wednesday, November 9, 2016 7:30 PM To: Yao, Jiewen ; Laszlo Ersek Cc: Tian, Feng ; edk2-devel@ml01.01.org; Kinney, Micha= el D ; Fan, Jeff ; Zeng, St= ar Subject: Re: [edk2] [PATCH V2 0/6] Enable SMM page level protection. On 09/11/2016 07:25, Yao, Jiewen wrote: > Current BSP just uses its own context to initialize AP. So that AP > takes BSP CR3, which is SMM CR3, unfortunately. After BSP initialized > APs, the AP is put to HALT-LOOP in X64 mode. It is the last straw, > because X64 mode halt still need paging. > > 3) The error happen, once the AP receives an interrupt (for > whatever reason), AP starts executing code. However, that that time > the AP might not be in SMM mode. It means SMM CR3 is not available. > And then we see this. > > 4) I guess we did not see the error, or this is RANDOM issue, > because it depends on if AP receives an interrupt before BSP send > INIT-SIPI-SIPI. > > 5) The fix, I think, should be below: We should always put AP to > protected mode, so that no paging is needed. We should put AP in > above 1M reserved memory, instead of <1M memory, because <1M memory > is restored. For what it's worth, this is not what I observed. What I found is that the BSP doesn't wait for the AP rendezvous before closing SMRAM. I'm not sure if the two things are related, but (3) would be a much worse bug. APs should not be receiving an interrupt. Perhaps an NMI if API is sitting in a CLI;HLT loop, but this is not what is happening. Paolo