Re: OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry

From: Evgeny Yakovlev <insoreiges@gmail.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: edk2-devel@ml01.01.org, eyakovlev@virtuozzo.com,
	den@virtuozzo.com,  Jeff Fan <jeff.fan@intel.com>
Subject: Re: OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry
Date: Wed, 7 Dec 2016 12:11:55 +0300	[thread overview]
Message-ID: <CAM0BJjTmfk0DzA3wmp-5tWB3OCJ0rY8e_ZufFLmUpis7c9Pm8Q@mail.gmail.com> (raw)
In-Reply-To: <c26c4e7c-0c56-6813-7f91-8591e5ca666d@redhat.com>

Hi Laszlo

So i managed to reproduce this with debug logs this time and figured out
what happened.
tl;dr: KVM is spamming guest firmware with 8254 PIT timer IRQs after
accumulating a lot of missed timer expiration events while win 2k16 boot
loader was waiting for debugger connection on COM2.

As in previous bugs we see recursive entry into IRQ0 handler in OVMF. OVMF
IRQ0 handler workload is O(1) however it enables interrupts during normal
processing so reenters are possible. If we manually slow down IRQ0 handler
enough we can reproduce this easily.

Repro always happens strictly during UEFI BDS phase, when 2k16 boot loader
is already executing but still as a UEFI application (ExitBootServices
hasn't been called yet). During this time boot loader waits for debugger
connection on COM2 (installation ISO was configured this way).

While it does so a lot of IRQ3's are generated (COM2) and guest does not
react to IRQ0 interrupts which are generated by kvm-pit. KVM-PIT keeps
track of all non-acked interrupts. When any ACK comes KVM tries to
immediately reinject all accumulated pending interrupts without any delays:

kvm_pit_ack_irq:
248         /* in this case, we had multiple outstanding pit interrupts
249          * that we needed to inject.  Reinject
250          */
251         queue_kthread_work(&ps->pit->worker, &ps->pit->expired);

In one case KVM managed to accumulate 0x108b interrupts to reinject:
CPU 0/KVM-47696 [000] d... 63779.924496: kvm_pit_ack_irq_44:
(kvm_pit_ack_irq+0x2c/0x70 [kvm]) pending=108b

Normally OVMF configures PIT to generate IRQ0 each 10 milliseconds, however
guest now sees IRQ0 each 500-600 microseconds:
kvm-pit/47688-47704 [007] .... 63779.924512: kvm_pic_set_irq: chip 0 pin 0
(edge)
kvm-pit/47688-47704 [007] .... 63779.925132: kvm_pic_set_irq: chip 0 pin 0
(edge)

Which is not enough time for normal OVMF IRQ0 handler to exit so when it
enables interrupts IRQ0 handler is immediately reintered. After about 300
reinjects machine runs out of stack space and hits RO area below 1MB.

Just though you'd be interested to know!

2016-11-23 19:54 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:

> On 11/23/16 09:37, Evgeny Yakovlev wrote:
> > You are right of course about the old tree, no objections here. I will
> > try to advocate for an update however i am pretty sure we're stuck with
> > our version for some time at least.
> >
> > Still, my original question was about is it normal for OVMF Sec/Pei
> > stage to have its stack so close to 0x100000
>
> The SEC phase, and the part of the PEI phase that runs before installing
> and migrating to the permanent PEI RAM, use the range 8256KB..8288KB as
> heap and stack (half/half, 16 KB and 16 KB). This range is above 8MB.
>
> The address you specify is 1MB. I don't think we ever set up such a
> stack intentionally.
>
> It is possible that the failure occurs in one of the AP startup routines
> (the APs start in real mode), and the exception handler is executed by
> the AP.
>
> To know more, the serial port and debug port outputs would be
> interesting; but, I should note, based on past experience, when the APs
> run into such issues in their startup routines, the final symptoms are
> usually garbage / chaotic. These issues are usually fixed by preventing
> the APs from wandering off into the woods in the first place (for
> example, nailing down race conditions between BSP and APs), and edk2 has
> seen a lot of improvements in that area, with the arrival of MpInitLib.
>
> > and/or why interrupt
> > handler in UefiCpuPkg/Library/CpuExceptionHandlerLib/X64 does not switch
> > to a separate stack.
>
> I hope Jeff can provide some insight here.
>
> Thanks
> Laszlo
>
> > Code in UefiCpuPkg/Library/CpuExceptionHandlerLib/X64 hasn't been
> > touched for 2 years so our version is still relevant.
> >
> > 2016-11-22 19:58 GMT+03:00 Laszlo Ersek <lersek@redhat.com
> > <mailto:lersek@redhat.com>>:
> >
> >     On 11/22/16 14:58, Evgeny Yakovlev wrote:
> >     > Wow, that is more than i expected :)
> >     >
> >     >> I wonder if you started to see this issue very recently.
> >     > Very recently, however we use a pretty old OVMF build, circa 2015
> >
> >     Ugh. Please update OVMF first... A whole lot of things has changed in
> >     edk2 in this year.
> >
> >     >
> >     >>  OVMF debug log
> >     > Sorry, we hadn't had it enabled when VM crashed and these crashes
> are very
> >     > rare. We will try to capture it when it happens again
> >     >
> >     >> - your host CPU model,
> >     > cpu family      : 6
> >     > model           : 42
> >     > model name      : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
> >     > stepping        : 7
> >     >
> >     >> - the host kernel (KVM) version,
> >     > Our kernel is roughly based on RHEL7.2 (kernel version
> 3.10.0-327.36.1). We
> >     > also have some upstream KVM patches backported.
> >     >
> >     >> - the guest CPU model,
> >     > -cpu
> >     > SandyBridge,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+
> monitor,+ds_cpl,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+
> osxsave,-arat,-xsaveopt,-xgetbv1,-vmx,-xsavec,hv_time,
> hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vpindex,
> hv_runtime,hv_synic,hv_stimer,hv_reset,hv_crash
> >     >
> >     >> - the guest CPU topology.
> >     > 8 sockets, 1 core per socket, 1 thread per core
> >     >
> >     > Hope that helps!
> >
> >     The fact that you are using 8 VCPUs is definitely relevant. However,
> I
> >     don't think it would make sense to try to analyze any errors with an
> >     OVMF / edk2 tree this old. Please try to reproduce the issue with a
> >     fresh build from master.
> >
> >     Thanks!
> >     Laszlo
> >
> >     > 2016-11-22 16:41 GMT+03:00 Laszlo Ersek <lersek@redhat.com
> >     <mailto:lersek@redhat.com>>:
> >     >
> >     >> Hello Evgeny,
> >     >>
> >     >> On 11/22/16 13:57, Evgeny Yakovlev wrote:
> >     >>> We are running windows UEFI-based VMs on QEMU/KVM with OvmfPkg.
> >     >>>
> >     >>> Very rarely we are experiencing a crash when VM tries to write
> to RO
> >     >> memory
> >     >>> very early during UEFI boot process.
> >     >>>
> >     >>> Crash happens when VM tries to execute this code in interrupt
> >     handler:
> >     >>>
> >     https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/
> >     <https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/>
> >     >> CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.asm#L244-L246
> >     >>>
> >     >>>
> >     >>> fxsave [rdi], where RDI = 0xffe60
> >     >>>
> >     >>> Which is bad - it points to ISA BIOS F-segment area.
> >     >>>
> >     >>> This memory was mapped by qemu for read only access, which is
> >     reflected
> >     >> in
> >     >>> KVM EPT:
> >     >>> 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios
> >     >>>
> >     >>> This is a very early IRQ0 interrupt, presumably during early
> >     >> initialization
> >     >>> phase (Sec or Pei).
> >     >>>
> >     >>> Looks like CommonInterruptHandler does not switch to a separate
> >     stack and
> >     >>> works on interrupted context's stack, which was fairly close to
> 1MB
> >     >>> boundary when IRQ0 fired (RSP around 1002c0). When
> >     CommonInterruptEntry
> >     >>> reached highlighted code it subtracted 512 bytes from current
> >     RSP which
> >     >>> dropped to 0xffe60, below 1MB and into QEMU RO region.
> >     >>>
> >     >>> We were figuring out how to best fix this. Possible solutions
> are to
> >     >> switch
> >     >>> to a separate stack in CommonInterruptEntry, relocate early
> >     OvmfPkg stack
> >     >>> to somewhere farther away from 1MB, to run with interrupts
> >     disabled until
> >     >>> we reach a later phase or maybe something else.
> >     >>>
> >     >>> Any comments would be very appreciated!
> >     >>
> >     >> I wonder if you started to see this issue very recently.
> >     >>
> >     >> I suspect (hope!) that the symptoms you are experiencing are a
> >     >> consequence of a bug in UefiCpuPkg that I've debugged and fixed
> just
> >     >> today. (I hope to post the patches today.)
> >     >>
> >     >> While testing those patches on your end will of course tell us if
> >     your
> >     >> issue has the same root cause, you could gather a few more
> >     symptoms even
> >     >> before I get around posting the patches. The bug that I'm working
> >     on has
> >     >> extremely varied crash symptoms (basically the APs wander off
> >     into the
> >     >> weeds), and some of those symptoms have involved
> >     CpuExceptionHandlerLib.
> >     >> The point is, by the time we get into CpuExceptionHandlerLib, all
> is
> >     >> lost -- it is executing on an AP whose state is corrupt anyway.
> The
> >     >> fxsave symptom is a red herring, most likely.
> >     >>
> >     >> CpuExceptionHandlerLib works fine otherwise, especially when
> invoked
> >     >> from the BSP -- we've used the output dumped by
> >     CpuExceptionHandlerLib
> >     >> to the serial port several times to track down issues.
> >     >>
> >     >> So, my request is that you please capture the OVMF debug log
> >     (please see
> >     >> the "OvmfPkg/README" file for how). I'm curious if it crashes
> >     where and
> >     >> how I suspect it crashes.
> >     >>
> >     >> Also, it would help if you provided
> >     >> - your host CPU model,
> >     >> - the host kernel (KVM) version,
> >     >> - the guest CPU model,
> >     >> - the guest CPU topology.
> >     >>
> >     >> Thanks!
> >     >> Laszlo
> >     >>
> >     > _______________________________________________
> >     > edk2-devel mailing list
> >     > edk2-devel@lists.01.org <mailto:edk2-devel@lists.01.org>
> >     > https://lists.01.org/mailman/listinfo/edk2-devel
> >     <https://lists.01.org/mailman/listinfo/edk2-devel>
> >     >
> >
> >
>
>