From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: huawei.com, ip: 45.249.212.32, mailfrom: guoheyi@huawei.com) Received: from huawei.com (huawei.com [45.249.212.32]) by groups.io with SMTP; Tue, 27 Aug 2019 23:12:15 -0700 Received: from DGGEMS401-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 8734F9D939E6756E94EE; Wed, 28 Aug 2019 14:12:11 +0800 (CST) Received: from [127.0.0.1] (10.133.216.73) by DGGEMS401-HUB.china.huawei.com (10.3.19.201) with Microsoft SMTP Server id 14.3.439.0; Wed, 28 Aug 2019 14:12:02 +0800 Subject: Re: [edk2-devel] Getting Synchronous Exception while run avocado-vt tests To: , , , Zhanghailiang References: <4e8a0c5f50b642538b310a8edd9ce248@huawei.com> <6256d296-1985-5719-c89a-6b959be6cbc6@redhat.com> CC: "edk2-devel@lists.01.org" From: guoheyi@huawei.com Message-ID: <2b645729-0617-0618-a960-a2ad064eb9ce@huawei.com> Date: Wed, 28 Aug 2019 14:12:01 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <6256d296-1985-5719-c89a-6b959be6cbc6@redhat.com> X-Originating-IP: [10.133.216.73] X-CFilter-Loop: Reflected Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable Hi Ard, Laszlo, Greetings and thanks for your time to help investigate the issue.=20 Finally we found it is caused by KVM and fixed by this patch: https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git/commit/?= id=3D2113c5f62b7423e4a72b890bd479704aa85c81ba KVM: arm/arm64: Only skip MMIO insn once If after an MMIO exit to userspace a VCPU is immediately run with an=20 immediate_exit request, such as when a signal is delivered or an MMIO=20 emulation completion is needed, then the VCPU completes the MMIO=20 emulation and immediately returns to userspace. As the exit_reason does=20 not get changed from KVM_EXIT_MMIO in these cases we have to be careful=20 not to complete the MMIO emulation again, when the VCPU is eventually=20 run again, because the emulation does an instruction skip (and doing too= =20 many skips would be a waste of guest code :-) We need to use additional=20 VCPU state to track if the emulation is complete. As luck would have it,= =20 we already have 'mmio_needed', which even appears to be used in this way= =20 by other architectures already. Fixes: 0d640732dbeb ("arm64: KVM: Skip=20 MMIO insn after emulation") Acked-by: Mark Rutland=20 Signed-off-by: Andrew Jones =20 Signed-off-by: Marc Zyngier Before this patch, MMIO instructions may be skipped more than once when=20 VCPU is requested to exit immediately, and mmio32 assembly function is=20 not far from mmio16... Thanks, Heyi On 2019/8/23 2:56, Laszlo Ersek wrote: > On 08/22/19 11:24, Ard Biesheuvel wrote: >> On Thu, 22 Aug 2019 at 10:40, Zhanghailiang >> wrote: >>> Hi All, >>> >>> >>> >>> We caught an =E2=80=98Synchronous Exception=E2=80=99 error while booti= ng VM with uefi firmware in the avocado-vt tests. >>> >>> The Edk2 version we used is edk2-stable201905. The qemu version is qem= u-4.0.0 and kernel version is 4.19.0. >>> >>> Parts of the log we got from serial is bellow, you can get the full lo= g from attachment. >>> >>> We can easily reproduce this issue with running avocado-vt tests. Actu= ally, we tried the new edk2 from upstream, >>> >>> It is still can be reproduced. >>> >>> >>> >>> Reproduce command: >>> >>> # avocado run type_specific.io-github-autotest-qemu.qmp_event_notifica= tion --vt-type qemu --vt-guest-os Guest.Linux.Fedora.29 >>> >>> >>> >>> Qemu command is : >>> >> .. >>> It reports that this is a alignment fault from log, We analyzed the ca= llstack from log: >>> >>> VirtioScsiPassThru-> VirtioFlush->virtio10SetQueueNotify->Virtio10Tran= sfer->PciIoMemWrite-> CpuMemoryServiceWrite-> MmioWrite32 <- here, the addr= ess is not align. >>> >> The faulting address ends in 0x16, so the access is to the QueueSelect >> field in VIRTIO_PCI_COMMON_CFG. This is a UINT16 field, so the access >> should be 16-bit not 32-bits wide. >> >> Could you dump the instructions leading up to the first >> Virtio10Transfer() call in Virtio10SetQueueNotify()? (from >> Build/ArmVirtQemu-AARCH64/DEBUG_GCC49/AARCH64/OvmfPkg/Virtio10Dxe/Virti= o10/DEBUG/Virtio10.dll) >> >> 2280: aa0103e5 mov x5, x1 >> 2284: d2800044 mov x4, #0x2 = // #2 >> 2288: d28002c3 mov x3, #0x16 = // #22 >> 228c: 52800002 mov w2, #0x0 = // #0 >> 2290: aa0003e1 mov x1, x0 >> 2294: aa0603e0 mov x0, x6 >> 2298: 97fffcf3 bl 1664 >> >> If the size is passed correctly here, we'll have to track down how the >> call gets routed to Mmio32Write instead of Mmio16Write(). Do you have >> any patches on top of edk2-stable-201905 ? > Right -- checking the "QueueSelect" (whole word) references in > Virtio10SetQueueNotify(), the "FieldSize" arguments passed to > Virtio10Transfer() are: > > - sizeof SavedQueueSelect > - sizeof Index > - sizeof SavedQueueSelect > > and both "SavedQueueSelect" and "Index" are of type UINT16. > > Virtio10Transfer() maps (FieldSize=3D=3D2) to "EfiPciIoWidthUint16". > > PciIoMemWrite() can only decrease "Width" (provided > "PcdUnalignedPciIoEnable" is set to TRUE -- which is not the case in > ArmVirtPkg). So "Width" is passed to RootBridgeIoMemWrite() unchanged, > in "MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c". > > The latter passes "Width" unchanged to CpuMemoryServiceWrite(), in > "ArmPkg/Drivers/ArmPciCpuIo2Dxe/ArmPciCpuIo2Dxe.c". > > That function seems to set "OperationWidth" to "EfiCpuIoWidthUint16" > (value 1, unchanged), which should result in a call to MmioWrite16()... > > > I have a different question. We recently saw a bunch of Synchronous > Exceptions, but those were not deterministic. Whenever they fired (which > was not always), they popped up in different spots. It turned out to be > a KVM regression, apparently a problem with the vtimer. I believe it was > fixed by a backport of upstream commit 6bc210003dff ("KVM: arm/arm64: > Don't emulate virtual timers on userspace ioctls", 2019-04-25). I could > be totally off-target, of course. > > (The RHBZ is , bu= t > *of course* it has to be a private bug; it was reported for the kernel > after all! /s) > > Thanks > Laszlo > >=20 > > > . >