public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Evgeny Yakovlev <insoreiges@gmail.com>
Cc: edk2-devel@ml01.01.org, eyakovlev@virtuozzo.com,
	den@virtuozzo.com, Jeff Fan <jeff.fan@intel.com>
Subject: Re: OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry
Date: Wed, 23 Nov 2016 17:54:41 +0100	[thread overview]
Message-ID: <c26c4e7c-0c56-6813-7f91-8591e5ca666d@redhat.com> (raw)
In-Reply-To: <CAM0BJjQX4jq1vu3ng_16Xpc7iO9sM=6pBPX51oxKOD2D0pqgVg@mail.gmail.com>

On 11/23/16 09:37, Evgeny Yakovlev wrote:
> You are right of course about the old tree, no objections here. I will
> try to advocate for an update however i am pretty sure we're stuck with
> our version for some time at least.
> 
> Still, my original question was about is it normal for OVMF Sec/Pei
> stage to have its stack so close to 0x100000

The SEC phase, and the part of the PEI phase that runs before installing
and migrating to the permanent PEI RAM, use the range 8256KB..8288KB as
heap and stack (half/half, 16 KB and 16 KB). This range is above 8MB.

The address you specify is 1MB. I don't think we ever set up such a
stack intentionally.

It is possible that the failure occurs in one of the AP startup routines
(the APs start in real mode), and the exception handler is executed by
the AP.

To know more, the serial port and debug port outputs would be
interesting; but, I should note, based on past experience, when the APs
run into such issues in their startup routines, the final symptoms are
usually garbage / chaotic. These issues are usually fixed by preventing
the APs from wandering off into the woods in the first place (for
example, nailing down race conditions between BSP and APs), and edk2 has
seen a lot of improvements in that area, with the arrival of MpInitLib.

> and/or why interrupt
> handler in UefiCpuPkg/Library/CpuExceptionHandlerLib/X64 does not switch
> to a separate stack.

I hope Jeff can provide some insight here.

Thanks
Laszlo

> Code in UefiCpuPkg/Library/CpuExceptionHandlerLib/X64 hasn't been
> touched for 2 years so our version is still relevant.
> 
> 2016-11-22 19:58 GMT+03:00 Laszlo Ersek <lersek@redhat.com
> <mailto:lersek@redhat.com>>:
> 
>     On 11/22/16 14:58, Evgeny Yakovlev wrote:
>     > Wow, that is more than i expected :)
>     >
>     >> I wonder if you started to see this issue very recently.
>     > Very recently, however we use a pretty old OVMF build, circa 2015
> 
>     Ugh. Please update OVMF first... A whole lot of things has changed in
>     edk2 in this year.
> 
>     >
>     >>  OVMF debug log
>     > Sorry, we hadn't had it enabled when VM crashed and these crashes are very
>     > rare. We will try to capture it when it happens again
>     >
>     >> - your host CPU model,
>     > cpu family      : 6
>     > model           : 42
>     > model name      : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
>     > stepping        : 7
>     >
>     >> - the host kernel (KVM) version,
>     > Our kernel is roughly based on RHEL7.2 (kernel version 3.10.0-327.36.1). We
>     > also have some upstream KVM patches backported.
>     >
>     >> - the guest CPU model,
>     > -cpu
>     > SandyBridge,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+osxsave,-arat,-xsaveopt,-xgetbv1,-vmx,-xsavec,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vpindex,hv_runtime,hv_synic,hv_stimer,hv_reset,hv_crash
>     >
>     >> - the guest CPU topology.
>     > 8 sockets, 1 core per socket, 1 thread per core
>     >
>     > Hope that helps!
> 
>     The fact that you are using 8 VCPUs is definitely relevant. However, I
>     don't think it would make sense to try to analyze any errors with an
>     OVMF / edk2 tree this old. Please try to reproduce the issue with a
>     fresh build from master.
> 
>     Thanks!
>     Laszlo
> 
>     > 2016-11-22 16:41 GMT+03:00 Laszlo Ersek <lersek@redhat.com
>     <mailto:lersek@redhat.com>>:
>     >
>     >> Hello Evgeny,
>     >>
>     >> On 11/22/16 13:57, Evgeny Yakovlev wrote:
>     >>> We are running windows UEFI-based VMs on QEMU/KVM with OvmfPkg.
>     >>>
>     >>> Very rarely we are experiencing a crash when VM tries to write to RO
>     >> memory
>     >>> very early during UEFI boot process.
>     >>>
>     >>> Crash happens when VM tries to execute this code in interrupt
>     handler:
>     >>>
>     https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/
>     <https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/>
>     >> CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.asm#L244-L246
>     >>>
>     >>>
>     >>> fxsave [rdi], where RDI = 0xffe60
>     >>>
>     >>> Which is bad - it points to ISA BIOS F-segment area.
>     >>>
>     >>> This memory was mapped by qemu for read only access, which is
>     reflected
>     >> in
>     >>> KVM EPT:
>     >>> 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios
>     >>>
>     >>> This is a very early IRQ0 interrupt, presumably during early
>     >> initialization
>     >>> phase (Sec or Pei).
>     >>>
>     >>> Looks like CommonInterruptHandler does not switch to a separate
>     stack and
>     >>> works on interrupted context's stack, which was fairly close to 1MB
>     >>> boundary when IRQ0 fired (RSP around 1002c0). When
>     CommonInterruptEntry
>     >>> reached highlighted code it subtracted 512 bytes from current
>     RSP which
>     >>> dropped to 0xffe60, below 1MB and into QEMU RO region.
>     >>>
>     >>> We were figuring out how to best fix this. Possible solutions are to
>     >> switch
>     >>> to a separate stack in CommonInterruptEntry, relocate early
>     OvmfPkg stack
>     >>> to somewhere farther away from 1MB, to run with interrupts
>     disabled until
>     >>> we reach a later phase or maybe something else.
>     >>>
>     >>> Any comments would be very appreciated!
>     >>
>     >> I wonder if you started to see this issue very recently.
>     >>
>     >> I suspect (hope!) that the symptoms you are experiencing are a
>     >> consequence of a bug in UefiCpuPkg that I've debugged and fixed just
>     >> today. (I hope to post the patches today.)
>     >>
>     >> While testing those patches on your end will of course tell us if
>     your
>     >> issue has the same root cause, you could gather a few more
>     symptoms even
>     >> before I get around posting the patches. The bug that I'm working
>     on has
>     >> extremely varied crash symptoms (basically the APs wander off
>     into the
>     >> weeds), and some of those symptoms have involved
>     CpuExceptionHandlerLib.
>     >> The point is, by the time we get into CpuExceptionHandlerLib, all is
>     >> lost -- it is executing on an AP whose state is corrupt anyway. The
>     >> fxsave symptom is a red herring, most likely.
>     >>
>     >> CpuExceptionHandlerLib works fine otherwise, especially when invoked
>     >> from the BSP -- we've used the output dumped by
>     CpuExceptionHandlerLib
>     >> to the serial port several times to track down issues.
>     >>
>     >> So, my request is that you please capture the OVMF debug log
>     (please see
>     >> the "OvmfPkg/README" file for how). I'm curious if it crashes
>     where and
>     >> how I suspect it crashes.
>     >>
>     >> Also, it would help if you provided
>     >> - your host CPU model,
>     >> - the host kernel (KVM) version,
>     >> - the guest CPU model,
>     >> - the guest CPU topology.
>     >>
>     >> Thanks!
>     >> Laszlo
>     >>
>     > _______________________________________________
>     > edk2-devel mailing list
>     > edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>
>     > https://lists.01.org/mailman/listinfo/edk2-devel
>     <https://lists.01.org/mailman/listinfo/edk2-devel>
>     >
> 
> 



  reply	other threads:[~2016-11-23 16:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-22 12:57 OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry Evgeny Yakovlev
2016-11-22 13:41 ` Laszlo Ersek
2016-11-22 13:58   ` Evgeny Yakovlev
2016-11-22 16:58     ` Laszlo Ersek
2016-11-23  8:37       ` Evgeny Yakovlev
2016-11-23 16:54         ` Laszlo Ersek [this message]
2016-12-07  9:11           ` Evgeny Yakovlev
2016-11-23 14:31       ` Evgeny Yakovlev
2016-11-23 16:38         ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c26c4e7c-0c56-6813-7f91-8591e5ca666d@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox