From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 7C4FB81E5B for ; Tue, 22 Nov 2016 08:58:48 -0800 (PST) Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 091474A549; Tue, 22 Nov 2016 16:58:48 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-82.phx2.redhat.com [10.3.116.82]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAMGwkCj018516; Tue, 22 Nov 2016 11:58:46 -0500 To: Evgeny Yakovlev References: <2340021c-4bcb-2622-07a8-6e6173f94d81@redhat.com> Cc: edk2-devel@ml01.01.org, eyakovlev@virtuozzo.com, den@virtuozzo.com, Jeff Fan From: Laszlo Ersek Message-ID: <9fcf577d-cf9e-db6f-c0f8-6842baf8bb83@redhat.com> Date: Tue, 22 Nov 2016 17:58:45 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 22 Nov 2016 16:58:48 +0000 (UTC) Subject: Re: OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Nov 2016 16:58:48 -0000 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 11/22/16 14:58, Evgeny Yakovlev wrote: > Wow, that is more than i expected :) > >> I wonder if you started to see this issue very recently. > Very recently, however we use a pretty old OVMF build, circa 2015 Ugh. Please update OVMF first... A whole lot of things has changed in edk2 in this year. > >> OVMF debug log > Sorry, we hadn't had it enabled when VM crashed and these crashes are very > rare. We will try to capture it when it happens again > >> - your host CPU model, > cpu family : 6 > model : 42 > model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz > stepping : 7 > >> - the host kernel (KVM) version, > Our kernel is roughly based on RHEL7.2 (kernel version 3.10.0-327.36.1). We > also have some upstream KVM patches backported. > >> - the guest CPU model, > -cpu > SandyBridge,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+osxsave,-arat,-xsaveopt,-xgetbv1,-vmx,-xsavec,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vpindex,hv_runtime,hv_synic,hv_stimer,hv_reset,hv_crash > >> - the guest CPU topology. > 8 sockets, 1 core per socket, 1 thread per core > > Hope that helps! The fact that you are using 8 VCPUs is definitely relevant. However, I don't think it would make sense to try to analyze any errors with an OVMF / edk2 tree this old. Please try to reproduce the issue with a fresh build from master. Thanks! Laszlo > 2016-11-22 16:41 GMT+03:00 Laszlo Ersek : > >> Hello Evgeny, >> >> On 11/22/16 13:57, Evgeny Yakovlev wrote: >>> We are running windows UEFI-based VMs on QEMU/KVM with OvmfPkg. >>> >>> Very rarely we are experiencing a crash when VM tries to write to RO >> memory >>> very early during UEFI boot process. >>> >>> Crash happens when VM tries to execute this code in interrupt handler: >>> https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/ >> CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.asm#L244-L246 >>> >>> >>> fxsave [rdi], where RDI = 0xffe60 >>> >>> Which is bad - it points to ISA BIOS F-segment area. >>> >>> This memory was mapped by qemu for read only access, which is reflected >> in >>> KVM EPT: >>> 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios >>> >>> This is a very early IRQ0 interrupt, presumably during early >> initialization >>> phase (Sec or Pei). >>> >>> Looks like CommonInterruptHandler does not switch to a separate stack and >>> works on interrupted context's stack, which was fairly close to 1MB >>> boundary when IRQ0 fired (RSP around 1002c0). When CommonInterruptEntry >>> reached highlighted code it subtracted 512 bytes from current RSP which >>> dropped to 0xffe60, below 1MB and into QEMU RO region. >>> >>> We were figuring out how to best fix this. Possible solutions are to >> switch >>> to a separate stack in CommonInterruptEntry, relocate early OvmfPkg stack >>> to somewhere farther away from 1MB, to run with interrupts disabled until >>> we reach a later phase or maybe something else. >>> >>> Any comments would be very appreciated! >> >> I wonder if you started to see this issue very recently. >> >> I suspect (hope!) that the symptoms you are experiencing are a >> consequence of a bug in UefiCpuPkg that I've debugged and fixed just >> today. (I hope to post the patches today.) >> >> While testing those patches on your end will of course tell us if your >> issue has the same root cause, you could gather a few more symptoms even >> before I get around posting the patches. The bug that I'm working on has >> extremely varied crash symptoms (basically the APs wander off into the >> weeds), and some of those symptoms have involved CpuExceptionHandlerLib. >> The point is, by the time we get into CpuExceptionHandlerLib, all is >> lost -- it is executing on an AP whose state is corrupt anyway. The >> fxsave symptom is a red herring, most likely. >> >> CpuExceptionHandlerLib works fine otherwise, especially when invoked >> from the BSP -- we've used the output dumped by CpuExceptionHandlerLib >> to the serial port several times to track down issues. >> >> So, my request is that you please capture the OVMF debug log (please see >> the "OvmfPkg/README" file for how). I'm curious if it crashes where and >> how I suspect it crashes. >> >> Also, it would help if you provided >> - your host CPU model, >> - the host kernel (KVM) version, >> - the guest CPU model, >> - the guest CPU topology. >> >> Thanks! >> Laszlo >> > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel >