From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id EEF3181E50 for ; Tue, 22 Nov 2016 05:41:26 -0800 (PST) Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 54EBB8EB3F; Tue, 22 Nov 2016 13:41:26 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-82.phx2.redhat.com [10.3.116.82]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAMDfOmN013418; Tue, 22 Nov 2016 08:41:25 -0500 To: Evgeny Yakovlev , edk2-devel@ml01.01.org References: Cc: eyakovlev@virtuozzo.com, den@virtuozzo.com, Jeff Fan From: Laszlo Ersek Message-ID: <2340021c-4bcb-2622-07a8-6e6173f94d81@redhat.com> Date: Tue, 22 Nov 2016 14:41:23 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 22 Nov 2016 13:41:26 +0000 (UTC) Subject: Re: OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Nov 2016 13:41:27 -0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Hello Evgeny, On 11/22/16 13:57, Evgeny Yakovlev wrote: > We are running windows UEFI-based VMs on QEMU/KVM with OvmfPkg. > > Very rarely we are experiencing a crash when VM tries to write to RO memory > very early during UEFI boot process. > > Crash happens when VM tries to execute this code in interrupt handler: > https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.asm#L244-L246 > > > fxsave [rdi], where RDI = 0xffe60 > > Which is bad - it points to ISA BIOS F-segment area. > > This memory was mapped by qemu for read only access, which is reflected in > KVM EPT: > 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios > > This is a very early IRQ0 interrupt, presumably during early initialization > phase (Sec or Pei). > > Looks like CommonInterruptHandler does not switch to a separate stack and > works on interrupted context's stack, which was fairly close to 1MB > boundary when IRQ0 fired (RSP around 1002c0). When CommonInterruptEntry > reached highlighted code it subtracted 512 bytes from current RSP which > dropped to 0xffe60, below 1MB and into QEMU RO region. > > We were figuring out how to best fix this. Possible solutions are to switch > to a separate stack in CommonInterruptEntry, relocate early OvmfPkg stack > to somewhere farther away from 1MB, to run with interrupts disabled until > we reach a later phase or maybe something else. > > Any comments would be very appreciated! I wonder if you started to see this issue very recently. I suspect (hope!) that the symptoms you are experiencing are a consequence of a bug in UefiCpuPkg that I've debugged and fixed just today. (I hope to post the patches today.) While testing those patches on your end will of course tell us if your issue has the same root cause, you could gather a few more symptoms even before I get around posting the patches. The bug that I'm working on has extremely varied crash symptoms (basically the APs wander off into the weeds), and some of those symptoms have involved CpuExceptionHandlerLib. The point is, by the time we get into CpuExceptionHandlerLib, all is lost -- it is executing on an AP whose state is corrupt anyway. The fxsave symptom is a red herring, most likely. CpuExceptionHandlerLib works fine otherwise, especially when invoked from the BSP -- we've used the output dumped by CpuExceptionHandlerLib to the serial port several times to track down issues. So, my request is that you please capture the OVMF debug log (please see the "OvmfPkg/README" file for how). I'm curious if it crashes where and how I suspect it crashes. Also, it would help if you provided - your host CPU model, - the host kernel (KVM) version, - the guest CPU model, - the guest CPU topology. Thanks! Laszlo