From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qv1-f68.google.com (mail-qv1-f68.google.com [209.85.219.68]) by mx.groups.io with SMTP id smtpd.web11.3587.1587093747884887937 for ; Thu, 16 Apr 2020 20:22:28 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@gmail.com header.s=20161025 header.b=kNaZgFXQ; spf=pass (domain: gmail.com, ip: 209.85.219.68, mailfrom: hqm03ster@gmail.com) Received: by mail-qv1-f68.google.com with SMTP id s18so269778qvn.1; Thu, 16 Apr 2020 20:22:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bQ06fEooVh7sOXvVPToFswSsEZOXrqTuD9McfgI5eOk=; b=kNaZgFXQ2md7aC3YDo3kHD1OK1DX77bPIB3CADolX2NrfFmQIJZYEWH/pIXa96ZvXt WqWbZ11gb3C9ZpUcb0w+D3YpxVVLPgeowglhirzjE1aK4+XcVH7rMRQ8TU2walQZjs4D +PrA0nLcxQYNWrBOCJGcekz0cz0M3GyNF6j+jUaI9oPdxUrTSs90sTYSPQPc4pqvauLx kVOPpr1Lr3KtTDukwmu1UEvVtNyIgqYBy7WevK5VFGVrciDwWmDeqxGmxtspre0VLiAQ DdhVGCwl7/e8LnhB9GXgxhUsvFKU2CEOAVL6v7x3oW39cuDgtUVdkb9AwdLuALXPoiD1 AKKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bQ06fEooVh7sOXvVPToFswSsEZOXrqTuD9McfgI5eOk=; b=DW6I9pEhjBBvTtHgnUsCjFtsDfPslWU8/7UdCrytla0ZeRWDaD6kXLxVB3tG1J76f1 XwMMENDcEzBGicUAuzJhfktQwaF4oxch4YAy9DAwypoKpqQfJ4gXEpUvBuqbnFx9vIWi L84DMWnORFyGj0YM42uPXZRuQWzHST0fjQoAhFwVAkvPdNw1+MPZY4/4mXMm/cmcDWYh 4t6mrOOYlT4HbCteEO9dQCA4fVZFNLlJgxVCAvaSl8vW1zTLhNFPRjbc/HGrAEUdyblM Ztf3ywtaUaaDMZK2P8maivjhYUMWqDVanfN3u0b855Hrk9oOJDKT7EyEzLsyeWqlDHgM h80Q== X-Gm-Message-State: AGi0PuY3LQ978Tx4Oi/yZluGEnlNqQOi8JEPIsCarcy6xjuyJDVL5VRc lgC73kSP3PnmCKM7lzthAwqK2vzbfKHW/hLmGQY= X-Google-Smtp-Source: APiQypIikDH5adz5Op/eOzz9Aorvz50SZEN5f1d7DMhllBnBVl0gELRMxXFxSP5kCF9JlDrs3txpFclxhEsNZeWQlM8= X-Received: by 2002:a05:6214:1801:: with SMTP id o1mr792861qvw.224.1587093746984; Thu, 16 Apr 2020 20:22:26 -0700 (PDT) MIME-Version: 1.0 References: <623b1855-285c-cce3-c806-c17e5fd217ea@redhat.com> <5211.1586899245384995995@groups.io> <2941f608-7e0f-1190-cccb-2b17d9ea20bf@redhat.com> In-Reply-To: <2941f608-7e0f-1190-cccb-2b17d9ea20bf@redhat.com> From: Hou Qiming Date: Fri, 17 Apr 2020 11:22:14 +0800 Message-ID: Subject: Re: [edk2-discuss] Load Option passing. Either bugs or my confusion. To: Laszlo Ersek Cc: valerij zaporogeci , discuss@edk2.groups.io, Gerd Hoffmann , "Marcel Apfelbaum (GMail address)" , edk2-devel-groups-io , qemu devel list Content-Type: multipart/alternative; boundary="00000000000003aa9005a3740fc1" --00000000000003aa9005a3740fc1 Content-Type: text/plain; charset="UTF-8" I'm glad we can reach a consensus that ramfb needs sanity checks. And well, I'm probably at fault with the hijacking. Your QEMU/TCG in QEMU/TCG example also made me realize a deeper problem, though: your setting still can't escape the host display / physical GPU issue. The middle display layers be bochs or whatever, but as long as the framebuffer content and resolution values are propagated, and the end result is displayed at all on the host, the host GPU attack surface remains exposed to the L2 guest, and checks are needed. Everything shown on the screen involves the display driver - GPU stack, GTK or SDL or tty, you can't avoid that. ramfb-kvmgt just happened to be the shortest pipeline where every stage neglected the checks, which exposed this problem. Blaming this on ramfb is unfair since in your scenario the checks are better done in the display subsystems. TL;DR You made me realize right now, it's a very real risk that an AARCH64 Windows guest could exploit a x64 host's display driver by specifying a crafted framebuffer with overflowing resolution. I don't want to break it, but I'd prefer a broken state over an insecure state. I'm not quite sure what this thread is. But I think with the scope this discussion is going, maybe it's more of a bug than a regression. On Thu, Apr 16, 2020 at 10:12 PM Laszlo Ersek wrote: > On 04/16/20 06:38, Hou Qiming wrote: > > Very good point, I did neglect ramfb resolution changes... But there is > one > > important thing: it *can* cause a QEMU crash, a potentially exploitable > > one, not always a guest crash. That's what motivated my heavy-handed > > approach since allowing resolution change would have necessitated a good > > deal of security checks. It has crashed my host *kernel* quite a few > times. > > > > The point is, while the QemuRamfbDxe driver may behave properly, nothing > > prevents the guest from writing garbage or *malicious* values to the > ramfb > > config space. Then the values are sent to the display component without > any > > sanity check. For some GUI frontends, this means allocating an OpenGL > > texture with guest-supplied dimensions and uploading guest memory content > > to it, which means that guest memory content goes straight into a *kernel > > driver*, *completely unchecked*. Some integer overflow and a lenient GPU > > driver later, and the guest escapes straight to kernel. > > > > The proper way to enable ramfb resolution change again is adding sanity > > checks for ramfb resolution / pointer / etc. on the QEMU side. We have to > > make sure it doesn't exceed what the host GPU driver supports. Maybe > clamp > > both width and height to between 1 and 2048? We also need to validate > that > > OpenGL texture dimension update succeeds. Note that OpenGL is not obliged > > to validate anything and everything has to be checked on the QEMU side. > > I agree that QEMU should sanity check the resolution requested by the > guest. I also agree that "arbitrary" limits are acceptable, for > preventing integer overflows and -- hopefully -- memory allocation > failures too. > > But I don't see the host kernel / OpenGL / physical GPU angle, at least > not directly. That angle seems to be specific to your particular use > case (particular choice of display backend). > > For example, if you nest QEMU/TCG in QEMU/TCG, with no KVM and no device > assignment in the picture anywhere, and OVMF drives ramfb in L2, and the > display *backend* (such as GTK or SDL GUI window) for the QEMU process > running in L1 sits on top of a virtual device (such as bochs-display) > provided by QEMU running in L0, then the ramfb stuff (including the > resolution changes and the range checks) should work just the same, > between L2 and L1. > > I kinda feel like ramfb has been hijacked for providing a boot time > display crutch for kvmgt. (I might not be using the correct terminology > here; sorry about that). That's *not* what ramfb was originally intended > for, as far as I recall. Compare: > > - 59926de9987c ("Merge remote-tracking branch > 'remotes/kraxel/tags/vga-20180618-pull-request' into staging", 2018-06-19) > > - dddb37495b84 ("Merge remote-tracking branch > 'remotes/awilliam/tags/vfio-updates-20181015.0' into staging", 2018-10-15) > > IIRC, Gerd originally invented ramfb for giving AARCH64 Windows the > linear framebuffer that the latter so badly wants, in particular so that > the framebuffer exist in guest RAM (not in guest MMIO), in order to > avoid the annoying S1/S2 caching behavior of AARCH64/KVM when the guest > maps an area as MMIO that is mapped as RAM on the host [1]. See: > > - https://bugzilla.tianocore.org/show_bug.cgi?id=785#c4 > - https://bugzilla.tianocore.org/show_bug.cgi?id=785#c7 > - https://bugzilla.tianocore.org/show_bug.cgi?id=785#c8 > > and the further references given in those bugzilla comments. > > [1] https://bugzilla.redhat.com/show_bug.cgi?id=1679680#c0 > > Component reuse is obviously *hugely* important, and it would be silly > for me to argue against reusing ramfb wherever it applies. Just please > don't break the original use case. > > Should I file a bug report in LaunchPad, or is this thread enough for > tracking the QEMU regression? > > Thanks > Laszlo > > > > > Qiming > > > > > > On Wed, Apr 15, 2020 at 11:05 PM Laszlo Ersek wrote: > > > >> (CC Gerd, Qiming, Marcel, qemu-devel for ramfb:) > >> > >> On 04/14/20 23:20, valerij zaporogeci wrote: > >> > >> [snip] > >> > >>> There is a Boot Manager UI display problem, I don't know if this is > >>> qemu problem, but with the ARM (both 32 and 64 bits, the qemu version > >>> is 4.2.0, the OVMF is fresh), and using "ramfb" device, the Boot > >>> Manager has troubles with drawing - it's interfase looks entirely > >>> broken, like this (I'll try to attach the screenshot). UEFI shell > >>> doesn't have this problem. switching to "serial" (which is -serial vc) > >>> doesn't produce it too. Only when ramfb is chosen, the Boot Manager UI > >>> gets smeared. But it takes input and presumable works properly, except > >>> displaying things. qemu writes these messages in the command prompt: > >>> ramfb_fw_cfg_write: 640x480 @ 0x4bd00000 > >>> ramfb_fw_cfg_write: resolution locked, change rejected > >>> ramfb_fw_cfg_write: 800x600 @ 0x4bd00000 > >>> ramfb_fw_cfg_write: resolution locked, change rejected > >> > >> Gerd contributed the OVMF QemuRamfbDxe driver in edk2 commit > >> 1d25ff51af5c ("OvmfPkg: add QemuRamfbDxe", 2018-06-14). Note the date: > >> June 2018. > >> > >> The then-latest (released) QEMU version was v2.12.0, and v2.12.1 / > >> v3.0.0 were in the making. > >> > >> At that time, the resolution change definitely worked -- note my > >> "Tested-by" on the edk2 commit message. > >> > >> > >> Running "git blame" on the QEMU source, I now find commit a9e0cb67b7f4 > >> ("hw/display/ramfb: lock guest resolution after it's set", 2019-05-24). > >> > >> Again, note the date: May 2019 (and this commit was released with QEMU > >> v4.1.0). > >> > >> So I would say that the symptom you see is a QEMU v4.1.0 regression. The > >> QemuRamfbGraphicsOutputSetMode() function in the OVMF ramfb driver > >> certainly needs the QemuFwCfgWriteBytes() call to work, for changing the > >> resolution. > >> > >> > >> Now, I'm not familiar with the reasons behind QEMU commit a9e0cb67b7f4. > >> It says it intends to "prevent[] a crash when the guest writes garbage > >> to the configuration space (e.g. when rebooting)". > >> > >> But I don't understand why locking the resolution was necessary for > >> preventing "a crash": > >> > >> (1) Registering a device reset handler in QEMU seems sufficient, so that > >> QEMU forget about the currently shared RAMFB area at platform reset. > >> > >> (2) The crash in question is apparently not a *QEMU* crash -- which > >> might otherwise justify a heavy-handed approach. Instead, it is a > >> *guest* crash. See the references below: > >> > >> (2a) > >> > http://mid.mail-archive.com/CABSdmrmU7FK90Bupq_ySowcc9Uk=8nQxNLHgzvDsNYdp_QLogA@mail.gmail.com > >> > https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg02299.html > >> > >> (2b) > https://github.com/intel/gvt-linux/issues/23#issuecomment-483651476 > >> > >> Therefore, I don't think that locking the resolution was justified! > >> > >> Importantly: > >> > >> - The QemuRamfbDxe driver allocates the framebuffer in *reserved* > >> memory, therefore any well-behaving OS will *never* touch the > >> framebuffer. > >> > >> - The QemuRamfbDxe driver allocates the framebuffer memory only once, > >> namely for such a resolution that needs the largest amount of > >> framebuffer memory. Therefore, framebuffer re-allocations in the guest > >> driver -- and thereby guest RAM *re-mapping* in QEMU -- are *not* > >> necessary, upon resolution change. > >> > >> The ramfb device reset handler in QEMU is justified (for unmapping / > >> forgetting the previously shared RAMFB area). > >> > >> The resolution locking is *NOT* justified, and it breaks the OVMF > >> driver. I suggest backing out the resolution locking from QEMU. > >> > >> Reference (2a) above indicates 'It could be a misguided attempt to > >> "resize ramfb" by the guest Intel driver'. If that is the case, then > >> please fix the Intel guest driver, without regressing the QEMU device > >> model. > >> > >> I'm sad that the QEMU device model change was not regression-tested > >> against the *upstream* OVMF driver (which, by then, had been upstream > >> for almost a year). > >> > >> Laszlo > >> > >> > > > > --00000000000003aa9005a3740fc1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'm glad we can reach a consensus that ramfb need= s sanity checks. And well, I'm probably at fault with the hijacking.

Your QEMU/TCG in QEMU/TCG example also made me r= ealize a deeper problem, though: your setting still can't escape the ho= st display / physical GPU issue. The middle display layers be bochs or what= ever, but as long as the framebuffer content and resolution values are prop= agated, and the end result is displayed at all on the host, the host GPU at= tack surface remains exposed to the L2 guest, and checks are needed. Everyt= hing shown on the screen involves the display driver - GPU stack, GTK or SD= L or tty, you can't avoid that.=20 ramfb-kvmgt just happened to be the shortest pipeline where every stage neg= lected the checks, which exposed this problem. Blaming this on ramfb is unf= air since in your scenario the checks are better done in the display subsys= tems.

TL;DR You made me realize right now, it&= #39;s a very real risk that=20 an AARCH64 Windows guest could exploit a x64 host's display driver by s= pecifying a crafted framebuffer with overflowing resolution. I don't wa= nt to break it, but I'd prefer a broken state over an insecure state.

I'm not quite sure what this thread is. But= I think with the scope this discussion is going, maybe it's more of a = bug than a regression.


On Thu, Apr 16, 2= 020 at 10:12 PM Laszlo Ersek <lerse= k@redhat.com> wrote:
On 04/16/20 06:38, Hou Qiming wrote:
> Very good point, I did neglect ramfb resolution changes... But there i= s one
> important thing: it *can* cause a QEMU crash, a potentially exploitabl= e
> one, not always a guest crash. That's what motivated my heavy-hand= ed
> approach since allowing resolution change would have necessitated a go= od
> deal of security checks. It has crashed my host *kernel* quite a few t= imes.
>
> The point is, while the QemuRamfbDxe driver may behave properly, nothi= ng
> prevents the guest from writing garbage or *malicious* values to the r= amfb
> config space. Then the values are sent to the display component withou= t any
> sanity check. For some GUI frontends, this means allocating an OpenGL<= br> > texture with guest-supplied dimensions and uploading guest memory cont= ent
> to it, which means that guest memory content goes straight into a *ker= nel
> driver*, *completely unchecked*. Some integer overflow and a lenient G= PU
> driver later, and the guest escapes straight to kernel.
>
> The proper way to enable ramfb resolution change again is adding sanit= y
> checks for ramfb resolution / pointer / etc. on the QEMU side. We have= to
> make sure it doesn't exceed what the host GPU driver supports. May= be clamp
> both width and height to between 1 and 2048? We also need to validate = that
> OpenGL texture dimension update succeeds. Note that OpenGL is not obli= ged
> to validate anything and everything has to be checked on the QEMU side= .

I agree that QEMU should sanity check the resolution requested by the
guest. I also agree that "arbitrary" limits are acceptable, for preventing integer overflows and -- hopefully -- memory allocation
failures too.

But I don't see the host kernel / OpenGL / physical GPU angle, at least=
not directly. That angle seems to be specific to your particular use
case (particular choice of display backend).

For example, if you nest QEMU/TCG in QEMU/TCG, with no KVM and no device assignment in the picture anywhere, and OVMF drives ramfb in L2, and the display *backend* (such as GTK or SDL GUI window) for the QEMU process
running in L1 sits on top of a virtual device (such as bochs-display)
provided by QEMU running in L0, then the ramfb stuff (including the
resolution changes and the range checks) should work just the same,
between L2 and L1.

I kinda feel like ramfb has been hijacked for providing a boot time
display crutch for kvmgt. (I might not be using the correct terminology
here; sorry about that). That's *not* what ramfb was originally intende= d
for, as far as I recall. Compare:

- 59926de9987c ("Merge remote-tracking branch
'remotes/kraxel/tags/vga-20180618-pull-request' into staging",= 2018-06-19)

- dddb37495b84 ("Merge remote-tracking branch
'remotes/awilliam/tags/vfio-updates-20181015.0' into staging",= 2018-10-15)

IIRC, Gerd originally invented ramfb for giving AARCH64 Windows the
linear framebuffer that the latter so badly wants, in particular so that the framebuffer exist in guest RAM (not in guest MMIO), in order to
avoid the annoying S1/S2 caching behavior of AARCH64/KVM when the guest
maps an area as MMIO that is mapped as RAM on the host [1]. See:

- https://bugzilla.tianocore.org/show_bug.c= gi?id=3D785#c4
- https://bugzilla.tianocore.org/show_bug.c= gi?id=3D785#c7
- https://bugzilla.tianocore.org/show_bug.c= gi?id=3D785#c8

and the further references given in those bugzilla comments.

[1] https://bugzilla.redhat.com/show_bug.cgi= ?id=3D1679680#c0

Component reuse is obviously *hugely* important, and it would be silly
for me to argue against reusing ramfb wherever it applies. Just please
don't break the original use case.

Should I file a bug report in LaunchPad, or is this thread enough for
tracking the QEMU regression?

Thanks
Laszlo

>
> Qiming
>
>
> On Wed, Apr 15, 2020 at 11:05 PM Laszlo Ersek <lersek@redhat.com> wrote:
>
>> (CC Gerd, Qiming, Marcel, qemu-devel for ramfb:)
>>
>> On 04/14/20 23:20, valerij zaporogeci wrote:
>>
>> [snip]
>>
>>> There is a Boot Manager UI display problem, I don't know i= f this is
>>> qemu problem, but with the ARM (both 32 and 64 bits, the qemu = version
>>> is 4.2.0, the OVMF is fresh), and using "ramfb" devi= ce, the Boot
>>> Manager has troubles with drawing - it's interfase looks e= ntirely
>>> broken, like this (I'll try to attach the screenshot). UEF= I shell
>>> doesn't have this problem. switching to "serial"= (which is -serial vc)
>>> doesn't produce it too. Only when ramfb is chosen, the Boo= t Manager UI
>>> gets smeared. But it takes input and presumable works properly= , except
>>> displaying things. qemu writes these messages in the command p= rompt:
>>> ramfb_fw_cfg_write: 640x480 @ 0x4bd00000
>>> ramfb_fw_cfg_write: resolution locked, change rejected
>>> ramfb_fw_cfg_write: 800x600 @ 0x4bd00000
>>> ramfb_fw_cfg_write: resolution locked, change rejected
>>
>> Gerd contributed the OVMF QemuRamfbDxe driver in edk2 commit
>> 1d25ff51af5c ("OvmfPkg: add QemuRamfbDxe", 2018-06-14). = Note the date:
>> June 2018.
>>
>> The then-latest (released) QEMU version was v2.12.0, and v2.12.1 /=
>> v3.0.0 were in the making.
>>
>> At that time, the resolution change definitely worked -- note my >> "Tested-by" on the edk2 commit message.
>>
>>
>> Running "git blame" on the QEMU source, I now find commi= t a9e0cb67b7f4
>> ("hw/display/ramfb: lock guest resolution after it's set&= quot;, 2019-05-24).
>>
>> Again, note the date: May 2019 (and this commit was released with = QEMU
>> v4.1.0).
>>
>> So I would say that the symptom you see is a QEMU v4.1.0 regressio= n. The
>> QemuRamfbGraphicsOutputSetMode() function in the OVMF ramfb driver=
>> certainly needs the QemuFwCfgWriteBytes() call to work, for changi= ng the
>> resolution.
>>
>>
>> Now, I'm not familiar with the reasons behind QEMU commit a9e0= cb67b7f4.
>> It says it intends to "prevent[] a crash when the guest write= s garbage
>> to the configuration space (e.g. when rebooting)".
>>
>> But I don't understand why locking the resolution was necessar= y for
>> preventing "a crash":
>>
>> (1) Registering a device reset handler in QEMU seems sufficient, s= o that
>> QEMU forget about the currently shared RAMFB area at platform rese= t.
>>
>> (2) The crash in question is apparently not a *QEMU* crash -- whic= h
>> might otherwise justify a heavy-handed approach. Instead, it is a<= br> >> *guest* crash. See the references below:
>>
>> (2a)
>> http://mid.mail-archive.com/CABSdmrmU7FK90Bupq_ySowcc9Uk=3D8nQxNLHgzvD= sNYdp_QLogA@mail.gmail.com
>>=C2=A0 =C2=A0 =C2=A0 http= s://lists.gnu.org/archive/html/qemu-devel/2019-05/msg02299.html
>>
>> (2b) https://github.com/= intel/gvt-linux/issues/23#issuecomment-483651476
>>
>> Therefore, I don't think that locking the resolution was justi= fied!
>>
>> Importantly:
>>
>> - The QemuRamfbDxe driver allocates the framebuffer in *reserved*<= br> >> memory, therefore any well-behaving OS will *never* touch the
>> framebuffer.
>>
>> - The QemuRamfbDxe driver allocates the framebuffer memory only on= ce,
>> namely for such a resolution that needs the largest amount of
>> framebuffer memory. Therefore, framebuffer re-allocations in the g= uest
>> driver -- and thereby guest RAM *re-mapping* in QEMU -- are *not*<= br> >> necessary, upon resolution change.
>>
>> The ramfb device reset handler in QEMU is justified (for unmapping= /
>> forgetting the previously shared RAMFB area).
>>
>> The resolution locking is *NOT* justified, and it breaks the OVMF<= br> >> driver. I suggest backing out the resolution locking from QEMU. >>
>> Reference (2a) above indicates 'It could be a misguided attemp= t to
>> "resize ramfb" by the guest Intel driver'. If that i= s the case, then
>> please fix the Intel guest driver, without regressing the QEMU dev= ice
>> model.
>>
>> I'm sad that the QEMU device model change was not regression-t= ested
>> against the *upstream* OVMF driver (which, by then, had been upstr= eam
>> for almost a year).
>>
>> Laszlo
>>
>>
>

--00000000000003aa9005a3740fc1--