public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Evgeny Yakovlev <insoreiges@gmail.com>
Cc: edk2-devel@ml01.01.org, eyakovlev@virtuozzo.com,
	den@virtuozzo.com, Jeff Fan <jeff.fan@intel.com>
Subject: Re: OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry
Date: Wed, 23 Nov 2016 17:38:59 +0100	[thread overview]
Message-ID: <0792b1f6-468a-a5dc-02a7-188ca7b8ab61@redhat.com> (raw)
In-Reply-To: <CAM0BJjQ_oPMwPCLTGVVibUgDKMipp-NFwT__zCgHgj_4Q8zsgg@mail.gmail.com>

On 11/23/16 15:31, Evgeny Yakovlev wrote:
> Looks like we're actually based on OVMF tree for RHEL7.2:
> ovmf/rhel-20150414-2.gitc9e5618.el7, ovmf/rhel-srpm-7.2
> 
> So maybe this affects those deployments as well

Maybe, but OVMF is Tech Preview (--> totally unsupported) even in
RHEL-7.3. :)

Laszlo

> 2016-11-22 19:58 GMT+03:00 Laszlo Ersek <lersek@redhat.com
> <mailto:lersek@redhat.com>>:
> 
>     On 11/22/16 14:58, Evgeny Yakovlev wrote:
>     > Wow, that is more than i expected :)
>     >
>     >> I wonder if you started to see this issue very recently.
>     > Very recently, however we use a pretty old OVMF build, circa 2015
> 
>     Ugh. Please update OVMF first... A whole lot of things has changed in
>     edk2 in this year.
> 
>     >
>     >>  OVMF debug log
>     > Sorry, we hadn't had it enabled when VM crashed and these crashes are very
>     > rare. We will try to capture it when it happens again
>     >
>     >> - your host CPU model,
>     > cpu family      : 6
>     > model           : 42
>     > model name      : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
>     > stepping        : 7
>     >
>     >> - the host kernel (KVM) version,
>     > Our kernel is roughly based on RHEL7.2 (kernel version 3.10.0-327.36.1). We
>     > also have some upstream KVM patches backported.
>     >
>     >> - the guest CPU model,
>     > -cpu
>     > SandyBridge,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+osxsave,-arat,-xsaveopt,-xgetbv1,-vmx,-xsavec,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vpindex,hv_runtime,hv_synic,hv_stimer,hv_reset,hv_crash
>     >
>     >> - the guest CPU topology.
>     > 8 sockets, 1 core per socket, 1 thread per core
>     >
>     > Hope that helps!
> 
>     The fact that you are using 8 VCPUs is definitely relevant. However, I
>     don't think it would make sense to try to analyze any errors with an
>     OVMF / edk2 tree this old. Please try to reproduce the issue with a
>     fresh build from master.
> 
>     Thanks!
>     Laszlo
> 
>     > 2016-11-22 16:41 GMT+03:00 Laszlo Ersek <lersek@redhat.com
>     <mailto:lersek@redhat.com>>:
>     >
>     >> Hello Evgeny,
>     >>
>     >> On 11/22/16 13:57, Evgeny Yakovlev wrote:
>     >>> We are running windows UEFI-based VMs on QEMU/KVM with OvmfPkg.
>     >>>
>     >>> Very rarely we are experiencing a crash when VM tries to write to RO
>     >> memory
>     >>> very early during UEFI boot process.
>     >>>
>     >>> Crash happens when VM tries to execute this code in interrupt
>     handler:
>     >>>
>     https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/
>     <https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/>
>     >> CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.asm#L244-L246
>     >>>
>     >>>
>     >>> fxsave [rdi], where RDI = 0xffe60
>     >>>
>     >>> Which is bad - it points to ISA BIOS F-segment area.
>     >>>
>     >>> This memory was mapped by qemu for read only access, which is
>     reflected
>     >> in
>     >>> KVM EPT:
>     >>> 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios
>     >>>
>     >>> This is a very early IRQ0 interrupt, presumably during early
>     >> initialization
>     >>> phase (Sec or Pei).
>     >>>
>     >>> Looks like CommonInterruptHandler does not switch to a separate
>     stack and
>     >>> works on interrupted context's stack, which was fairly close to 1MB
>     >>> boundary when IRQ0 fired (RSP around 1002c0). When
>     CommonInterruptEntry
>     >>> reached highlighted code it subtracted 512 bytes from current
>     RSP which
>     >>> dropped to 0xffe60, below 1MB and into QEMU RO region.
>     >>>
>     >>> We were figuring out how to best fix this. Possible solutions are to
>     >> switch
>     >>> to a separate stack in CommonInterruptEntry, relocate early
>     OvmfPkg stack
>     >>> to somewhere farther away from 1MB, to run with interrupts
>     disabled until
>     >>> we reach a later phase or maybe something else.
>     >>>
>     >>> Any comments would be very appreciated!
>     >>
>     >> I wonder if you started to see this issue very recently.
>     >>
>     >> I suspect (hope!) that the symptoms you are experiencing are a
>     >> consequence of a bug in UefiCpuPkg that I've debugged and fixed just
>     >> today. (I hope to post the patches today.)
>     >>
>     >> While testing those patches on your end will of course tell us if
>     your
>     >> issue has the same root cause, you could gather a few more
>     symptoms even
>     >> before I get around posting the patches. The bug that I'm working
>     on has
>     >> extremely varied crash symptoms (basically the APs wander off
>     into the
>     >> weeds), and some of those symptoms have involved
>     CpuExceptionHandlerLib.
>     >> The point is, by the time we get into CpuExceptionHandlerLib, all is
>     >> lost -- it is executing on an AP whose state is corrupt anyway. The
>     >> fxsave symptom is a red herring, most likely.
>     >>
>     >> CpuExceptionHandlerLib works fine otherwise, especially when invoked
>     >> from the BSP -- we've used the output dumped by
>     CpuExceptionHandlerLib
>     >> to the serial port several times to track down issues.
>     >>
>     >> So, my request is that you please capture the OVMF debug log
>     (please see
>     >> the "OvmfPkg/README" file for how). I'm curious if it crashes
>     where and
>     >> how I suspect it crashes.
>     >>
>     >> Also, it would help if you provided
>     >> - your host CPU model,
>     >> - the host kernel (KVM) version,
>     >> - the guest CPU model,
>     >> - the guest CPU topology.
>     >>
>     >> Thanks!
>     >> Laszlo
>     >>
>     > _______________________________________________
>     > edk2-devel mailing list
>     > edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>
>     > https://lists.01.org/mailman/listinfo/edk2-devel
>     <https://lists.01.org/mailman/listinfo/edk2-devel>
>     >
> 
> 



      reply	other threads:[~2016-11-23 16:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-22 12:57 OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry Evgeny Yakovlev
2016-11-22 13:41 ` Laszlo Ersek
2016-11-22 13:58   ` Evgeny Yakovlev
2016-11-22 16:58     ` Laszlo Ersek
2016-11-23  8:37       ` Evgeny Yakovlev
2016-11-23 16:54         ` Laszlo Ersek
2016-12-07  9:11           ` Evgeny Yakovlev
2016-11-23 14:31       ` Evgeny Yakovlev
2016-11-23 16:38         ` Laszlo Ersek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0792b1f6-468a-a5dc-02a7-188ca7b8ab61@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox