From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 00BD081E02 for ; Mon, 14 Nov 2016 10:07:28 -0800 (PST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 25A877F09D; Mon, 14 Nov 2016 18:07:33 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-50.phx2.redhat.com [10.3.116.50]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAEI7VkL015152; Mon, 14 Nov 2016 13:07:32 -0500 To: Paolo Bonzini , "Fan, Jeff" References: <20161111054545.19616-1-jeff.fan@intel.com> <542CF652F8836A4AB8DBFAAD40ED192A4A2DB4F5@shsmsx102.ccr.corp.intel.com> <00b6828b-78c5-af4f-ab98-de4460b1b8ec@redhat.com> <4dc14e5c-9b43-4338-c7a5-9750e8a9547a@redhat.com> <3e61ffc4-9eaf-0015-11a7-e2d698624acb@redhat.com> <648314a4-6c17-7c88-7e47-98c4de95fe2d@redhat.com> Cc: "edk2-devel@ml01.01.org" , "Yao, Jiewen" From: Laszlo Ersek Message-ID: Date: Mon, 14 Nov 2016 19:07:30 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <648314a4-6c17-7c88-7e47-98c4de95fe2d@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 14 Nov 2016 18:07:33 +0000 (UTC) Subject: Re: [PATCH v2 0/3] Put AP into safe hlt-loop code on S3 path X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Nov 2016 18:07:29 -0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 11/14/16 13:00, Paolo Bonzini wrote: > > > On 14/11/2016 12:27, Laszlo Ersek wrote: >> Well... >> >> http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html >> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00125.html >> http://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00563.html >> >> Are you suggesting that I resurrect this patch? That would be my >> pleasure. Please say yes. > > It's hard to say no when someone has written the code already. :) Thanks. I refreshed both patches (OVMF and QEMU -- no code changes just more precise commit messages). Unfortunately, quite a few things seem broken, although these patches worked a year ago. My QEMU base commit is current master 83c83f9a5266. My host kernel is 3.10.0-514.el7.x86_64. *** So, when I test these two patches, based on edk2 master (no on-list patches), Ia32 target, my boot hangs (spins) with the log ending in: > SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0 That is, MpInitChangeApLoopCallback() is entered, but it never finishes. "info cpus" prints: * CPU #0: pc=0x000000007f1f7763 thread_id=17395 CPU #1: pc=0x000000007f2ce01e (halted) thread_id=17396 CPU #2: pc=0x000000007f2ce01e (halted) thread_id=17397 CPU #3: pc=0x00000000fffffff0 thread_id=17398 and I've also seen a case where all the APs were stuck at the reset vector (0x00000000fffffff0), *not* halted, like VCPU#3 above. They don't spin, they're just stuck. The spinning comes from CPU#0, apparently in MpInitChangeApLoopCallback. *** I flipped the AP sync mode to traditional (considering the relaxed mode shouldn't be required with the broadcast SMIs). This time the log ends with: > SmmInstallProtocolInterface: [EdkiiSmmExitBootServicesProtocol] 0 > MpInitChangeApLoopCallback() done! but then QEMU abort()s: > kvm_io_ioeventfd_add: error adding ioeventfd: File exists > 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed I see some ioeventfd stuff in the recent QEMU history; do you think it's related? *** My last attempt was even more strange. I applied Jeff's v2 (this series), returned to the relaxed (= currently in-tree) sync mode, and (of course) the broadcast SMI patches on both sides. This time I didn't even boot an OS, I just entered the setup TUI, and selected the Reset option. QEMU crashed again with: > kvm_io_ioeventfd_add: error adding ioeventfd: File exists > 2016-11-14 17:00:41.405+0000: shutting down, reason=crashed I don't know what to look at, honestly. I think I'll check the reflog for my local QEMU master branch, and return to one of my earlier pulls, or else use v2.7.0 for testing. FWIW, the broadcast SMIs work just fine as long as I'm in the firmware (not booting an OS and not resetting, just browsing around); I verified with GDB that the broadcast SMI branch was taken in QEMU repeatedly. Thanks Laszlo