From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mout02.posteo.de (mout02.posteo.de [185.67.36.66]) by mx.groups.io with SMTP id smtpd.web10.79931.1675764007786896878 for ; Tue, 07 Feb 2023 02:00:08 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@posteo.de header.s=2017 header.b=hObWcijE; spf=pass (domain: posteo.de, ip: 185.67.36.66, mailfrom: mhaeuser@posteo.de) Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id D432E2404E8 for ; Tue, 7 Feb 2023 11:00:05 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.de; s=2017; t=1675764005; bh=wdRsgYUxp5ioVFI6GK8Gxtqkwxukcs1tInukcKt0Pls=; h=From:Subject:Date:Cc:To:From; b=hObWcijEFTuBkiQTKQTn/xumc+Fcq4SWPu26kfnDIg4Wrl7xfelm0/MzKDg8oZCbo 7L13Ru9sO5iMK+f46yDoY+Qp6KKv05RerEdy3UEQzeQfoE4eXSt8AR6/82b78tlUgu CxnLtH51hhPxBvSIP+TgKr9gwn9Dt0z+2+7CKGzGqwRdHnBsJ85O7cXCJzy3xTpjBD Y2xO13odeP/wFZ57MKuG9GKNjiimOAyG8pg4Rn3iPMwFsVwOFNByDX9B293h9nxWxF DoGfwkriNmgl2PMuMplbGniKiUFV6KK99wxRxVE4wAvYZOiBrDbs8DmsHg9Qyer1g5 alzIVZW6r/b1g== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4P9zBr5mq6z9rxF; Tue, 7 Feb 2023 11:00:04 +0100 (CET) From: =?UTF-8?B?TWFydmluIEjDpHVzZXI=?= Mime-Version: 1.0 (1.0) Subject: Re: [edk2-devel] [PATCH 4/4] ArmPkg/CpuDxe: Implement EFI memory attributes protocol Date: Tue, 7 Feb 2023 10:00:04 +0000 Message-Id: <3C95E4B6-F597-4CB1-973C-58AA7FE1C6B5@posteo.de> References: Cc: devel@edk2.groups.io, t@taylorbeebe.com In-Reply-To: To: Ard Biesheuvel Content-Type: multipart/alternative; boundary=Apple-Mail-60CB70D9-F567-4D2E-893E-963A22DCE5B6 Content-Transfer-Encoding: 7bit --Apple-Mail-60CB70D9-F567-4D2E-893E-963A22DCE5B6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable > On 7. Feb 2023, at 10:16, Ard Biesheuvel wrote: >=20 > =EF=BB=BFOn Tue, 7 Feb 2023 at 09:56, Marvin H=C3=A4user wrote: >>=20 >> Hi Taylor and Ard, >>=20 >>>> On 7. Feb 2023, at 09:29, Ard Biesheuvel wrote: >>>=20 >>> =EF=BB=BFOn Tue, 7 Feb 2023 at 02:18, Taylor Beebe w= rote: >>>>=20 >>>> I can't see the Bugzilla you referenced so I requested security Bugzill= a >>>> access. But, yes, that's the bug to which I was referring :) >>>>=20 >>>=20 >>> I cannot see that bugzilla entry either. >>=20 >> I CC=E2=80=99d you both. >>=20 >=20 > Thanks. >=20 > I wrote that code but nobody ever involved me or mentioned that there > was anything wrong with it. Sorry, I filed this privately because I wasn=E2=80=99t sure whether or how y= ou could exploit this - after all, you can affect memory permissions by load= ing an image that after all is not even executed. I didn=E2=80=99t expect th= ere to be any such issues with ARM at the time (and did expect that if there= were, the security team would bring whoever can help in themselves), so I d= idn=E2=80=99t expect needing to consult the author (you as the patch author a= nd you as an ARM maintainer are two separate roles to me). I thought it was j= ust an oversight. Hence (at first), minimal ACL. Sorry, my mistake. Past a point, I realised security is not a priority of TianoCore at a scale (= no offence, just a mere observation form various factors, also should not re= flect back on individual contributors) and CC=E2=80=99d people for whom this= could be relevant to know, including downstream folks. I should have CC=E2=80= =99d you then at the very latest. Though, I will certainly never in my life a= gain file a private BZ with TianoCore - which will allow me to CC more peopl= e from the start! :) The BZ should just be unembargoed straight away. >=20 >>>=20 >>>> Once Ard's change to add Memory Attribute Protocol support to ARM >>>> platforms is in, the change you linked may be palatable for the >>>> upstream. However, ARM platforms could run into the infinite loop I >>>> outlined if that change is pushed upstream because of the lack of >>>> per-allocated page tables and a software semaphore to prevent looping. >>>>=20 >>>=20 >>> I still don't see how this happens. There is an ASSERT in >>> CoreInitializeMemoryProtection() to ensure that EfiConventialMemory >>> and EfiBootServicesData have the same memory type, and I added that >>> (in commit 7eb927db3e25a) for precisely this reason, i.e., to ensure >>> that the plumbing of this feature wouldn't recurse. >>>=20 >>> Could this be related to heap guard, perhaps? I could see how changing >>> the boundaries of allocations might trigger a split even if the old >>> and new type should have the exact same type, and perhaps we should >>> use unguarded pages for page tables. >>=20 >> I know you meant recursing, but that might be related to the BZ, if I und= erstood Taylor correctly. The only scenario I first-hand experienced this bu= g with was unloading a PE image. I don=E2=80=99t have the time right now to c= heck the guarding page code in detail, but from what I just saw, I can very w= ell imagine it can trigger the BZ bug (and thus potentially the recursion is= sue?). >=20 > if we disregard explicit invocations of > EFI_DXE_SERVICES.SetMemorySpaceAttributes (), the typical region > transitions from EfiConventionalMemory to some other type and back, > and never directly from one type to another. So I would expect that > unloading a PE image would result in a FreePages () call with > EfiConventionalMemory as the new type. Yes. >=20 > However, it appears that UnprotectUefiImage does not restore the > region's permissions to whatever is configured as the default for the > associated memory type, and so we end up with EfiConventialMemory > regions that lack the XP attribute. Yes, that=E2=80=99s the issue. I=E2=80=99m not sure though whether this shou= ld be a design requirement? I certainly don=E2=80=99t remember this being do= cumented anywhere and ignoring this particular bug, at a high level, that fe= els like a pointless chore when the permissions will be replaced by the Conv= Mem ones again anyway. The stateful design also is prone to errors, as we ca= n see. I=E2=80=99d prefer a stateless design, where the permissions are alwa= ys forced to the same constant value, no matter the previous configuration. Then again, I lack the entire context of how the ARM VM code works and this m= ight be so much easier at a low level that the stateful design is indeed wor= th it. I=E2=80=99ll shut up and leave it up to you. :) >=20 > I'll send out a separate fix for that. We can resume the discussion on > your patch on the bugzilla at the same time. Tyvm! Not sure I have anything else to contribute though. >=20 >>=20 >>>=20 >>>=20 >>>> I implemented pre-allocated pages for ARM a while back in a private rep= o >>>> but never committed it to Mu. Maybe that would also be worth committing= >>>> and pushing upstream. >>>>=20 >>>=20 >>> I'd like to understand better whether or not there is a way to avoid >>> the need for this, but if not, I'd be happy to review your solution. >>> Does the issue only exist on ARM? Does it still happen after I rewrote >>> the MMU library? (in 2020) >>=20 >> Sorry to interject with no contribution, but for x86 platforms, our downs= tream fork removed the requirement that BSData and ConvMem need to have the s= ame permissions. In fact, ideally ConvMem is just unmapped. Can this be enab= led for ARM without a solution like Taylor=E2=80=99s? You said you added the= requirement as a mitigation. >>=20 >=20 > Mapping a single page in a large unmapped region may require the > allocation of page tables. If populating that page table means we have > to map it first, we have a circular dependency and the recursion, so > we cannot deal with that currently. >=20 > Note that the page table walker does not need this page to be mapped, > it is only the code running on the CPU that needs a mapping while it > creates the entries. So this is fixable in principle, by mapping the > page somewhere else in the address space temporarily, just so we can > populate its contents without the need for recursing into the page > table code to map the page table. >=20 > This would take a bit of work, though, and I'd like to avoid this if possi= ble. Tyvm! >=20 >> Unrelated FYI, we also removed the XP checks for code downstream and all t= ypes but ConvMem (which is unmapped or read-only, I forgot) have default per= missions of RW. The reason for that is that unlike an OS, we don=E2=80=99t h= ave a fully-featured VM system and especially things like mmap are absent. T= hus, any data or code must first be written to the memory before it can be e= xecuted. The execute flag is added after loading the code to ensure W^X. >>=20 >=20 > I would like to do the same, but this is currently not feasible. I > recently removed the exec permissions from EfiLoaderData regions in > ArmVirtQemu, and even though GRUB was fixed 5+ years ago not to rely > on this, a couple of distro forks got broken, and so they had to > revert this change in their builds. :( >=20 > With the EFI memory attributes protocol, the OS can tolerate this, and > I am adding handling of this to Linux so it no longer relies on any > allocation being executable by default. Related: https://lwn.net/ml/linux-kernel/20220303142120.1975-1-baskov@ispras= .ru/ Best regards, Marvin= --Apple-Mail-60CB70D9-F567-4D2E-893E-963A22DCE5B6 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
On 7. Feb 2023, at 10:16, A= rd Biesheuvel <ardb@kernel.org> wrote:

=EF=BB=BFOn Tue, 7 Feb 2023 at 0= 9:56, Marvin H=C3=A4user <mhaeuser@posteo.de> wrote:

<= span>Hi Taylor and Ard,

On 7. Feb 2023, at 09:29, Ard Biesheuvel <ardb@kernel.org> wr= ote:

=EF=BB=BFOn Tue, 7 Feb 2023 at 02:= 18, Taylor Beebe <t@taylorbeebe.com> wrote:

= I can't see the Bugzilla you referenced so I requested security Bugzilla
access. But, yes, that= 's the bug to which I was referring :)
<= /blockquote>


<= /blockquote>
I cann= ot see that bugzilla entry either.

I CC=E2=80=99d you both.


Thanks.
=
I wrote that code but nobody ever involved me or ment= ioned that there
was anything wrong with it.

Sorry, I filed this privately because I wa= sn=E2=80=99t sure whether or how you could exploit this - after all, you can= affect memory permissions by loading an image that after all is not even ex= ecuted. I didn=E2=80=99t expect there to be any such issues with ARM at the t= ime (and did expect that if there were, the security team would bring whoeve= r can help in themselves), so I didn=E2=80=99t expect needing to consult the= author (you as the patch author and you as an ARM maintainer are two separa= te roles to me). I thought it was just an oversight. Hence (at first), minim= al ACL. Sorry, my mistake.

Past a point, I realised= security is not a priority of TianoCore at a scale (no offence, just a mere= observation form various factors, also should not reflect back on individua= l contributors) and CC=E2=80=99d people for whom this could be relevant to k= now, in= cluding downstream folks. I should have CC=E2=80=99d you then at the v= ery latest. Though, I will certainly never in my life again file a private B= Z with TianoCore - which will allow me to CC more people from the start! :)<= /div>

The BZ should just be unembargoed straight away.


Once Ard's change to add Memory Attribute Protocol support= to ARM
platforms i= s in, the change you linked may be palatable for the
=
upstream. However, ARM platforms could run= into the infinite loop I
<= blockquote type=3D"cite">
outlined if that change is pushed upstream because of the lack of
per-allocated page tab= les and a software semaphore to prevent looping.


<= span>I still don't see how this happens. There is an ASSERT in
<= span>CoreInitializeMemoryProtection() to ensure that EfiConventialMemory
and EfiBootServicesData have the same memory type, and I added t= hat
(in commit 7eb927db3e25a) for precisely this reason, i= .e., to ensure
that the plumbing of this feature wouldn't r= ecurse.

Could this be related to heap g= uard, perhaps? I could see how changing
=
the boundaries of a= llocations might trigger a split even if the old
and new t= ype should have the exact same type, and perhaps we should
use unguarded pages for page tables.

I know you meant recursing, but that might be related to the BZ, i= f I understood Taylor correctly. The only scenario I first-hand experienced t= his bug with was unloading a PE image. I don=E2=80=99t have the time right n= ow to check the guarding page code in detail, but from what I just saw, I ca= n very well imagine it can trigger the BZ bug (and thus potentially the recu= rsion issue?).

if we disregard= explicit invocations of
EFI_DXE_SERVICES.SetMemorySpaceAttr= ibutes (), the typical region
transitions from EfiConvention= alMemory to some other type and back,
and never directly fro= m one type to another. So I would expect that
unloading a PE= image would result in a FreePages () call with
EfiConventio= nalMemory as the new type.

= Yes.


However, it appears that UnprotectUefiImage does not restore the<= br>region's permissions to whatever is configured as the default for t= he
associated memory type, and so we end up with EfiConventi= alMemory
regions that lack the XP attribute.

Yes, that=E2=80=99s the issue. I=E2=80=99m= not sure though whether this should be a design requirement? I certainly do= n=E2=80=99t remember this being documented anywhere and ignoring this partic= ular bug, at a high level, that feels like a pointless chore when the permis= sions will be replaced by the ConvMem ones again anyway. The stateful design= also is prone to errors, as we can see. I=E2=80=99d prefer a stateless desi= gn, where the permissions are always forced to the same constant value, no m= atter the previous configuration.

Then again, I lac= k the entire context of how the ARM VM code works and this might be so much e= asier at a low level that the stateful design is indeed worth it. I=E2=80=99= ll shut up and leave it up to you. :)


I'll send out a separate fix for that. W= e can resume the discussion on
your patch on the bugzilla at= the same time.

Tyvm! Not s= ure I have anything else to contribute though.





I implement= ed pre-allocated pages for ARM a while back in a private repo
but never committed it to Mu. May= be that would also be worth committing
<= /blockquote>
and pushing upstream.


I'd= like to understand better whether or not there is a way to avoid
=
the need for this, but if not, I'd be happy to review your solution.=
Does the issue only exist on ARM? Does it still happen af= ter I rewrote
=
the MMU library? (in 2020)

<= blockquote type=3D"cite">Sorry to interject with no contribution, but f= or x86 platforms, our downstream fork removed the requirement that BSData an= d ConvMem need to have the same permissions. In fact, ideally ConvMem is jus= t unmapped. Can this be enabled for ARM without a solution like Taylor=E2=80= =99s? You said you added the requirement as a mitigation.

Mapping a single page in a large unmapped region may require the
allocation of page tables. If populating that page table means w= e have
to map it first, we have a circular dependency and th= e recursion, so
we cannot deal with that currently.
Note that the page table walker does not need this p= age to be mapped,
it is only the code running on the CPU tha= t needs a mapping while it
creates the entries. So this is f= ixable in principle, by mapping the
page somewhere else in t= he address space temporarily, just so we can
populate its co= ntents without the need for recursing into the page
table co= de to map the page table.

This would take a= bit of work, though, and I'd like to avoid this if possible.

Tyvm!

=

Unrelated = FYI, we also removed the XP checks for code downstream and all types but Con= vMem (which is unmapped or read-only, I forgot) have default permissions of R= W. The reason for that is that unlike an OS, we don=E2=80=99t have a fully-f= eatured VM system and especially things like mmap are absent. Thus, any data= or code must first be written to the memory before it can be executed. The e= xecute flag is added after loading the code to ensure W^X.

<= br>I would like to do the same, but this is currently not feasible. I<= /span>
recently removed the exec permissions from EfiLoaderData reg= ions in
ArmVirtQemu, and even though GRUB was fixed 5+ years= ago not to rely
on this, a couple of distro forks got broke= n, and so they had to
revert this change in their builds.

:(


With the EFI memory attribute= s protocol, the OS can tolerate this, and
I am adding handli= ng of this to Linux so it no longer relies on any
allocation= being executable by default.


Best regards,
Marvin
<= /body>= --Apple-Mail-60CB70D9-F567-4D2E-893E-963A22DCE5B6--