From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-x242.google.com (mail-io0-x242.google.com [IPv6:2607:f8b0:4001:c06::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id B73F081EAB for ; Wed, 23 Nov 2016 06:31:23 -0800 (PST) Received: by mail-io0-x242.google.com with SMTP id h133so1831484ioe.2 for ; Wed, 23 Nov 2016 06:31:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=nPE+MsmbReMSOSIdEv8a2FwT+DUIu82ophpZvh+Ggis=; b=iZ26zBerdebdcwfIrkSa3QsTgvB+VqjZKFE6XElyzhM+M1DbsUYZoLZcdEnmR4hZpy F5ZrqiuwSo2atkUbTxOWK3E1k6ml0D3Jfm8caQwYp3IXrN6hOO1XBQWR9Z1/R12/hT6Q qYdhc1pXieurRkfyI3pzifNRMIJi2aFiv0ILuFLTo1gbMPuUFVNvENgLOQKeARTNsb69 F+PgwTKncQbnXzzOtyuThXjmPV4SJNsf+4ewOAo1aHdEVeeBub62fxfFDKBgJTzQpKMt Xq2PlYgtJMO11NPAY9aRfxyM8/szVDAyv4ZDhME3GPfPGJchnsnyfT8hXK7tEuyK8LkZ OWRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=nPE+MsmbReMSOSIdEv8a2FwT+DUIu82ophpZvh+Ggis=; b=dtyQVRB5JnGBdcybGe3XN/DPBHqAWLvMueZgm2CnLUw4a0/lPwg9MrJOX/lT3CEg7Z +WmN9ccPjOIehV/XxetmfD4wDWUQSWo016gs449ULEbEbCL6L1UuBxOqXm9CMXf6q6Tj 3W/+fi+eKr7X14FPouRlj+zXmFzmv58Ve53rQBoFVDUfBuAtoHiVt8V2HysTd8kE9S3A 90f5tNrrT7EJoB/IjmpthgGON45N9Pg0fvTFxbnQQp7SMeGQHh9fH+lCTNqGsPM7cimq Pl7KdXNVDRA2ncnGa55sxlGpRwrMwEYg82yhKPfyYrrUDFT2ZQ3b9wok2lYsnsM7pbwH rEPQ== X-Gm-Message-State: AKaTC00xTaO2OyOQaovzFCadpIR9gkXQsK3FNoR6zROT5ulLiKqoZh5EKT3MwC5VsQgNfVH4ksHi6lpCHbI6/A== X-Received: by 10.107.46.25 with SMTP id i25mr3443576ioo.145.1479911482635; Wed, 23 Nov 2016 06:31:22 -0800 (PST) MIME-Version: 1.0 Received: by 10.36.113.196 with HTTP; Wed, 23 Nov 2016 06:31:22 -0800 (PST) In-Reply-To: <9fcf577d-cf9e-db6f-c0f8-6842baf8bb83@redhat.com> References: <2340021c-4bcb-2622-07a8-6e6173f94d81@redhat.com> <9fcf577d-cf9e-db6f-c0f8-6842baf8bb83@redhat.com> From: Evgeny Yakovlev Date: Wed, 23 Nov 2016 17:31:22 +0300 Message-ID: To: Laszlo Ersek Cc: edk2-devel@ml01.01.org, eyakovlev@virtuozzo.com, den@virtuozzo.com, Jeff Fan X-Content-Filtered-By: Mailman/MimeDel 2.1.21 Subject: Re: OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 23 Nov 2016 14:31:24 -0000 Content-Type: text/plain; charset=UTF-8 Looks like we're actually based on OVMF tree for RHEL7.2: ovmf/rhel-20150414-2.gitc9e5618.el7, ovmf/rhel-srpm-7.2 So maybe this affects those deployments as well 2016-11-22 19:58 GMT+03:00 Laszlo Ersek : > On 11/22/16 14:58, Evgeny Yakovlev wrote: > > Wow, that is more than i expected :) > > > >> I wonder if you started to see this issue very recently. > > Very recently, however we use a pretty old OVMF build, circa 2015 > > Ugh. Please update OVMF first... A whole lot of things has changed in > edk2 in this year. > > > > >> OVMF debug log > > Sorry, we hadn't had it enabled when VM crashed and these crashes are > very > > rare. We will try to capture it when it happens again > > > >> - your host CPU model, > > cpu family : 6 > > model : 42 > > model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz > > stepping : 7 > > > >> - the host kernel (KVM) version, > > Our kernel is roughly based on RHEL7.2 (kernel version 3.10.0-327.36.1). > We > > also have some upstream KVM patches backported. > > > >> - the guest CPU model, > > -cpu > > SandyBridge,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+ > monitor,+ds_cpl,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+ > osxsave,-arat,-xsaveopt,-xgetbv1,-vmx,-xsavec,hv_time, > hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vpindex, > hv_runtime,hv_synic,hv_stimer,hv_reset,hv_crash > > > >> - the guest CPU topology. > > 8 sockets, 1 core per socket, 1 thread per core > > > > Hope that helps! > > The fact that you are using 8 VCPUs is definitely relevant. However, I > don't think it would make sense to try to analyze any errors with an > OVMF / edk2 tree this old. Please try to reproduce the issue with a > fresh build from master. > > Thanks! > Laszlo > > > 2016-11-22 16:41 GMT+03:00 Laszlo Ersek : > > > >> Hello Evgeny, > >> > >> On 11/22/16 13:57, Evgeny Yakovlev wrote: > >>> We are running windows UEFI-based VMs on QEMU/KVM with OvmfPkg. > >>> > >>> Very rarely we are experiencing a crash when VM tries to write to RO > >> memory > >>> very early during UEFI boot process. > >>> > >>> Crash happens when VM tries to execute this code in interrupt handler: > >>> https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/ > >> CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.asm#L244-L246 > >>> > >>> > >>> fxsave [rdi], where RDI = 0xffe60 > >>> > >>> Which is bad - it points to ISA BIOS F-segment area. > >>> > >>> This memory was mapped by qemu for read only access, which is reflected > >> in > >>> KVM EPT: > >>> 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios > >>> > >>> This is a very early IRQ0 interrupt, presumably during early > >> initialization > >>> phase (Sec or Pei). > >>> > >>> Looks like CommonInterruptHandler does not switch to a separate stack > and > >>> works on interrupted context's stack, which was fairly close to 1MB > >>> boundary when IRQ0 fired (RSP around 1002c0). When CommonInterruptEntry > >>> reached highlighted code it subtracted 512 bytes from current RSP which > >>> dropped to 0xffe60, below 1MB and into QEMU RO region. > >>> > >>> We were figuring out how to best fix this. Possible solutions are to > >> switch > >>> to a separate stack in CommonInterruptEntry, relocate early OvmfPkg > stack > >>> to somewhere farther away from 1MB, to run with interrupts disabled > until > >>> we reach a later phase or maybe something else. > >>> > >>> Any comments would be very appreciated! > >> > >> I wonder if you started to see this issue very recently. > >> > >> I suspect (hope!) that the symptoms you are experiencing are a > >> consequence of a bug in UefiCpuPkg that I've debugged and fixed just > >> today. (I hope to post the patches today.) > >> > >> While testing those patches on your end will of course tell us if your > >> issue has the same root cause, you could gather a few more symptoms even > >> before I get around posting the patches. The bug that I'm working on has > >> extremely varied crash symptoms (basically the APs wander off into the > >> weeds), and some of those symptoms have involved CpuExceptionHandlerLib. > >> The point is, by the time we get into CpuExceptionHandlerLib, all is > >> lost -- it is executing on an AP whose state is corrupt anyway. The > >> fxsave symptom is a red herring, most likely. > >> > >> CpuExceptionHandlerLib works fine otherwise, especially when invoked > >> from the BSP -- we've used the output dumped by CpuExceptionHandlerLib > >> to the serial port several times to track down issues. > >> > >> So, my request is that you please capture the OVMF debug log (please see > >> the "OvmfPkg/README" file for how). I'm curious if it crashes where and > >> how I suspect it crashes. > >> > >> Also, it would help if you provided > >> - your host CPU model, > >> - the host kernel (KVM) version, > >> - the guest CPU model, > >> - the guest CPU topology. > >> > >> Thanks! > >> Laszlo > >> > > _______________________________________________ > > edk2-devel mailing list > > edk2-devel@lists.01.org > > https://lists.01.org/mailman/listinfo/edk2-devel > > > >