From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <insoreiges@gmail.com>
Received: from mail-io0-x241.google.com (mail-io0-x241.google.com
 [IPv6:2607:f8b0:4001:c06::241])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 3C53F81EA4
 for <edk2-devel@ml01.01.org>; Wed, 23 Nov 2016 00:37:35 -0800 (PST)
Received: by mail-io0-x241.google.com with SMTP id r94so866492ioe.1
 for <edk2-devel@ml01.01.org>; Wed, 23 Nov 2016 00:37:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=DujKyPuH+mIbUIvJljaEPXopYPSNegCN9qquc6hdSYA=;
 b=ipmLAqDcjOCFZhSYVhq5+IfrIzWhBV2IcE4/YmE9EBHLa4BbsSuXcWLZaQWg7bXUdq
 X/3jh+hIDznUzspaXQvQ3ReN0BTs4cyCVbWPPQ44fpcwur8mnh0Z8Jxs3j6fZB55AAoj
 9I1Dh4mRDa+oaAbxQYgueN/IdUEbKtMKBiNC+CTnquFCXw1UMFtO971ZPnKFoBrbIh6x
 pqwTsvbV7nBgTxCfbCVbA1pfE4xzOj0KN1UbJOpspHSG2lniQhMXS5B9Y2wA9oZuP+y6
 C0Mn/tWYPqQw/6PQ4AbdBB2O6Pv2fqegpsl81usW/6ndvcmG8FMCt2e4LOz4uR1ni5Tz
 ArBw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=DujKyPuH+mIbUIvJljaEPXopYPSNegCN9qquc6hdSYA=;
 b=ifNY0wNBQaBxWvQqdYE2kR33xppfJEwhvkV8kLX/nWplUZUUaVdzE7/hxXp1xvXjMS
 ediEmrqPWgFWvmw4LIsQVPpzTMW0/5V1CxPzKErdrkA7rnGNj+ox+ZnWZf1jjD/gyDvn
 v0IZpQdfrBInNphDH6c9an4nip/jwUi3JogCw1lqZvA3Hrjr0CVBVjG3FWNTHEobh+2D
 0kI9FkHX1OfYK0qHq3KzBDUfM5HV7tanBoroJadsX9GQffRGa98TPfRVkCpbdflWGMWD
 87K2LHEL9Jw+AAYJl5laTjyYF+wyQBWLumtRXS2wc3kljknaGkm7QgPSpj+gInXFfz+G
 DklA==
X-Gm-Message-State: AKaTC03rFbl1r2KJod5ZIvo6i1qoApTjoR1guLydnCrsbaxk5shLPxmQIouThNpeOOxis3wvAjx0t5W3QeoNIA==
X-Received: by 10.107.46.25 with SMTP id i25mr2140348ioo.145.1479890254533;
 Wed, 23 Nov 2016 00:37:34 -0800 (PST)
MIME-Version: 1.0
Received: by 10.36.113.196 with HTTP; Wed, 23 Nov 2016 00:37:34 -0800 (PST)
In-Reply-To: <9fcf577d-cf9e-db6f-c0f8-6842baf8bb83@redhat.com>
References: <CAM0BJjTMEH4pqtmUU2wSSnDeSz6SqFRopJygBZ28mmBdhHE0ow@mail.gmail.com>
 <2340021c-4bcb-2622-07a8-6e6173f94d81@redhat.com>
 <CAM0BJjQDdwPULhik-F2d77jpWKUX=oyHnvS9FMU+ECusT4SeGQ@mail.gmail.com>
 <9fcf577d-cf9e-db6f-c0f8-6842baf8bb83@redhat.com>
From: Evgeny Yakovlev <insoreiges@gmail.com>
Date: Wed, 23 Nov 2016 11:37:34 +0300
Message-ID: <CAM0BJjQX4jq1vu3ng_16Xpc7iO9sM=6pBPX51oxKOD2D0pqgVg@mail.gmail.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: edk2-devel@ml01.01.org, eyakovlev@virtuozzo.com, den@virtuozzo.com, 
 Jeff Fan <jeff.fan@intel.com>
X-Content-Filtered-By: Mailman/MimeDel 2.1.21
Subject: Re: OvmfPkg: VM crashed trying to write to RO memory from CommonInterruptEntry
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Nov 2016 08:37:35 -0000
Content-Type: text/plain; charset=UTF-8

You are right of course about the old tree, no objections here. I will try
to advocate for an update however i am pretty sure we're stuck with our
version for some time at least.

Still, my original question was about is it normal for OVMF Sec/Pei stage
to have its stack so close to 0x100000 and/or why interrupt handler in
UefiCpuPkg/Library/CpuExceptionHandlerLib/X64 does not switch to a separate
stack.
Code in UefiCpuPkg/Library/CpuExceptionHandlerLib/X64 hasn't been touched
for 2 years so our version is still relevant.

2016-11-22 19:58 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:

> On 11/22/16 14:58, Evgeny Yakovlev wrote:
> > Wow, that is more than i expected :)
> >
> >> I wonder if you started to see this issue very recently.
> > Very recently, however we use a pretty old OVMF build, circa 2015
>
> Ugh. Please update OVMF first... A whole lot of things has changed in
> edk2 in this year.
>
> >
> >>  OVMF debug log
> > Sorry, we hadn't had it enabled when VM crashed and these crashes are
> very
> > rare. We will try to capture it when it happens again
> >
> >> - your host CPU model,
> > cpu family      : 6
> > model           : 42
> > model name      : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
> > stepping        : 7
> >
> >> - the host kernel (KVM) version,
> > Our kernel is roughly based on RHEL7.2 (kernel version 3.10.0-327.36.1).
> We
> > also have some upstream KVM patches backported.
> >
> >> - the guest CPU model,
> > -cpu
> > SandyBridge,+vme,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+
> monitor,+ds_cpl,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+
> osxsave,-arat,-xsaveopt,-xgetbv1,-vmx,-xsavec,hv_time,
> hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vpindex,
> hv_runtime,hv_synic,hv_stimer,hv_reset,hv_crash
> >
> >> - the guest CPU topology.
> > 8 sockets, 1 core per socket, 1 thread per core
> >
> > Hope that helps!
>
> The fact that you are using 8 VCPUs is definitely relevant. However, I
> don't think it would make sense to try to analyze any errors with an
> OVMF / edk2 tree this old. Please try to reproduce the issue with a
> fresh build from master.
>
> Thanks!
> Laszlo
>
> > 2016-11-22 16:41 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:
> >
> >> Hello Evgeny,
> >>
> >> On 11/22/16 13:57, Evgeny Yakovlev wrote:
> >>> We are running windows UEFI-based VMs on QEMU/KVM with OvmfPkg.
> >>>
> >>> Very rarely we are experiencing a crash when VM tries to write to RO
> >> memory
> >>> very early during UEFI boot process.
> >>>
> >>> Crash happens when VM tries to execute this code in interrupt handler:
> >>> https://github.com/tianocore/edk2/blob/master/UefiCpuPkg/Library/
> >> CpuExceptionHandlerLib/X64/ExceptionHandlerAsm.asm#L244-L246
> >>>
> >>>
> >>> fxsave [rdi], where RDI = 0xffe60
> >>>
> >>> Which is bad - it points to ISA BIOS F-segment area.
> >>>
> >>> This memory was mapped by qemu for read only access, which is reflected
> >> in
> >>> KVM EPT:
> >>> 00000000000e0000-00000000000fffff (prio 1, R-): isa-bios
> >>>
> >>> This is a very early IRQ0 interrupt, presumably during early
> >> initialization
> >>> phase (Sec or Pei).
> >>>
> >>> Looks like CommonInterruptHandler does not switch to a separate stack
> and
> >>> works on interrupted context's stack, which was fairly close to 1MB
> >>> boundary when IRQ0 fired (RSP around 1002c0). When CommonInterruptEntry
> >>> reached highlighted code it subtracted 512 bytes from current RSP which
> >>> dropped to 0xffe60, below 1MB and into QEMU RO region.
> >>>
> >>> We were figuring out how to best fix this. Possible solutions are to
> >> switch
> >>> to a separate stack in CommonInterruptEntry, relocate early OvmfPkg
> stack
> >>> to somewhere farther away from 1MB, to run with interrupts disabled
> until
> >>> we reach a later phase or maybe something else.
> >>>
> >>> Any comments would be very appreciated!
> >>
> >> I wonder if you started to see this issue very recently.
> >>
> >> I suspect (hope!) that the symptoms you are experiencing are a
> >> consequence of a bug in UefiCpuPkg that I've debugged and fixed just
> >> today. (I hope to post the patches today.)
> >>
> >> While testing those patches on your end will of course tell us if your
> >> issue has the same root cause, you could gather a few more symptoms even
> >> before I get around posting the patches. The bug that I'm working on has
> >> extremely varied crash symptoms (basically the APs wander off into the
> >> weeds), and some of those symptoms have involved CpuExceptionHandlerLib.
> >> The point is, by the time we get into CpuExceptionHandlerLib, all is
> >> lost -- it is executing on an AP whose state is corrupt anyway. The
> >> fxsave symptom is a red herring, most likely.
> >>
> >> CpuExceptionHandlerLib works fine otherwise, especially when invoked
> >> from the BSP -- we've used the output dumped by CpuExceptionHandlerLib
> >> to the serial port several times to track down issues.
> >>
> >> So, my request is that you please capture the OVMF debug log (please see
> >> the "OvmfPkg/README" file for how). I'm curious if it crashes where and
> >> how I suspect it crashes.
> >>
> >> Also, it would help if you provided
> >> - your host CPU model,
> >> - the host kernel (KVM) version,
> >> - the guest CPU model,
> >> - the guest CPU topology.
> >>
> >> Thanks!
> >> Laszlo
> >>
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
> >
>
>