From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.31; helo=mga06.intel.com; envelope-from=eric.dong@intel.com; receiver=edk2-devel@lists.01.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 4E774209831CF for ; Tue, 24 Jul 2018 20:51:22 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Jul 2018 20:51:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,400,1526367600"; d="scan'208";a="74261789" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga004.fm.intel.com with ESMTP; 24 Jul 2018 20:51:01 -0700 Received: from fmsmsx125.amr.corp.intel.com (10.18.125.40) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 24 Jul 2018 20:51:00 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by FMSMSX125.amr.corp.intel.com (10.18.125.40) with Microsoft SMTP Server (TLS) id 14.3.319.2; Tue, 24 Jul 2018 20:51:00 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.124]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.81]) with mapi id 14.03.0319.002; Wed, 25 Jul 2018 11:50:59 +0800 From: "Dong, Eric" To: Laszlo Ersek , "edk2-devel@lists.01.org" CC: "Ni, Ruiyu" Thread-Topic: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter. Thread-Index: AQHUD1ha1xVe4sKjFEqE5j2K5wud9KR2oK+AgB5mWOCAAVhjgIABbFlAgAAdXoCAB4WowA== Date: Wed, 25 Jul 2018 03:50:59 +0000 Message-ID: References: <20180629032047.6340-1-eric.dong@intel.com> <2eac3f3f-972f-9844-6567-5503a0403a85@redhat.com> <3ec340cf-3bf1-ad22-3b7b-aa1b2c1fcaa8@redhat.com> <055fb2f1-cd73-e5a9-11b2-407f31e81305@redhat.com> In-Reply-To: <055fb2f1-cd73-e5a9-11b2-407f31e81305@redhat.com> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Subject: Re: [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant parameter. X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2018 03:51:22 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Laszlo, I have root cause this issue, the AP hangs in the procedure when PiSmmCpuDx= eSmm driver start up trigged this issue. When PiSmmCpuDxeSmm driver start up, it will call StartAllAps to set memory= attribute. In StartAllAps function, after call WakeUpAp to start Aps, it = calls CheckAllAps to wait all Aps finished the task. In CheckAllAps functio= n, it detect AP state to know whether the AP has finished its task. In old = code, it check whether the AP state is CpuStateFinished to know whether AP = has finished tasks. This state is only set by AP when it truly finished tas= k. In new logic, CpuStateFinished been replace with CpuStateIdle. And CpuSt= ateIdle is also the begin state of the AP. AP will change state from CpuSta= teIdle to CpuStateBusy when it start execute the procedure. And after it fi= nished the procedure, it will change state back to CpuStateIdle. So when the hang issue raised, AP state is not been changed to CpuStateBusy= when BSP calls CheckAllAps to check whether the AP has finished its task. = So the state for the AP still in CpuStateIdle, but BSP think AP has finishe= d its task. In this case, BSP think all the Aps has finished their tasks an= d it continues boot. But some AP may wake up later and it failed to return = from the procedure. In this case, the AP state keeps at CpuStateBusy. So la= ter in ChangeApLoopCallback function, because this AP state still in CpuSta= teBusy, this AP will not trig the procedure. But BSP wait all APs to trig t= he procedure(BSP wait the Aps to reduce the mNumberToFinish value in proced= ure to continue boot) to continue the boot, so the hang occurred. I think we should keep a middle state to let us know whether the AP truly f= inished its task. I will send another serial patch for this issue. Please = help to check the new patches. Thanks, Eric > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of > Laszlo Ersek > Sent: Saturday, July 21, 2018 12:30 AM > To: Dong, Eric ; edk2-devel@lists.01.org > Cc: Ni, Ruiyu > Subject: Re: [edk2] [Patch V2] UefiCpuPkg/MpInitLib: Remove redundant > parameter. >=20 > On 07/20/18 08:53, Dong, Eric wrote: > >> -----Original Message----- From: Laszlo Ersek > >> [mailto:lersek@redhat.com] >=20 > >> Therefore, please upgrade the host to Fedora 26. In Fedora 26, QEMU > >> 2.9 is shipped: > >> > >> https://koji.fedoraproject.org/koji/buildinfo?buildID=3D986762 > >> > >> ... It's even better if you can upgrade to Fedora 27, as Fedora 27 is > >> the oldest Fedora release still supported at this point. The > >> following article describes the recommended upgrade method: > >> > >> https://fedoraproject.org/wiki/DNF_system_upgrade > >> > > > > I updated the system to fedora 28, but it failed to boot. :( so I > > borrowed an exited fedora 27 DVD and installed it. With this OS, I can > > reproduce this issue now. I found this issue is an random issue, I > > booted 5 times and met the issue. I'm checking the issue. >=20 > Awesome! >=20 > (I'm not happy about the problem itself, of course, but I'm *very* thankf= ul > that you took the time to install a Linux box, for testing with > KVM!!!) >=20 > Laszlo > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel