On 7. Feb 2023, at 10:16, Ard Biesheuvel <ardb@kernel.org> wrote:

On Tue, 7 Feb 2023 at 09:56, Marvin Häuser <mhaeuser@posteo.de> wrote:

Hi Taylor and Ard,

On 7. Feb 2023, at 09:29, Ard Biesheuvel <ardb@kernel.org> wrote:

On Tue, 7 Feb 2023 at 02:18, Taylor Beebe <t@taylorbeebe.com> wrote:

I can't see the Bugzilla you referenced so I requested security Bugzilla
access. But, yes, that's the bug to which I was referring :)


I cannot see that bugzilla entry either.

I CC’d you both.


Thanks.

I wrote that code but nobody ever involved me or mentioned that there
was anything wrong with it.

Sorry, I filed this privately because I wasn’t sure whether or how you could exploit this - after all, you can affect memory permissions by loading an image that after all is not even executed. I didn’t expect there to be any such issues with ARM at the time (and did expect that if there were, the security team would bring whoever can help in themselves), so I didn’t expect needing to consult the author (you as the patch author and you as an ARM maintainer are two separate roles to me). I thought it was just an oversight. Hence (at first), minimal ACL. Sorry, my mistake.

Past a point, I realised security is not a priority of TianoCore at a scale (no offence, just a mere observation form various factors, also should not reflect back on individual contributors) and CC’d people for whom this could be relevant to know, including downstream folks. I should have CC’d you then at the very latest. Though, I will certainly never in my life again file a private BZ with TianoCore - which will allow me to CC more people from the start! :)

The BZ should just be unembargoed straight away.



Once Ard's change to add Memory Attribute Protocol support to ARM
platforms is in, the change you linked may be palatable for the
upstream. However, ARM platforms could run into the infinite loop I
outlined if that change is pushed upstream because of the lack of
per-allocated page tables and a software semaphore to prevent looping.


I still don't see how this happens. There is an ASSERT in
CoreInitializeMemoryProtection() to ensure that EfiConventialMemory
and EfiBootServicesData have the same memory type, and I added that
(in commit 7eb927db3e25a) for precisely this reason, i.e., to ensure
that the plumbing of this feature wouldn't recurse.

Could this be related to heap guard, perhaps? I could see how changing
the boundaries of allocations might trigger a split even if the old
and new type should have the exact same type, and perhaps we should
use unguarded pages for page tables.

I know you meant recursing, but that might be related to the BZ, if I understood Taylor correctly. The only scenario I first-hand experienced this bug with was unloading a PE image. I don’t have the time right now to check the guarding page code in detail, but from what I just saw, I can very well imagine it can trigger the BZ bug (and thus potentially the recursion issue?).

if we disregard explicit invocations of
EFI_DXE_SERVICES.SetMemorySpaceAttributes (), the typical region
transitions from EfiConventionalMemory to some other type and back,
and never directly from one type to another. So I would expect that
unloading a PE image would result in a FreePages () call with
EfiConventionalMemory as the new type.

Yes.


However, it appears that UnprotectUefiImage does not restore the
region's permissions to whatever is configured as the default for the
associated memory type, and so we end up with EfiConventialMemory
regions that lack the XP attribute.

Yes, that’s the issue. I’m not sure though whether this should be a design requirement? I certainly don’t remember this being documented anywhere and ignoring this particular bug, at a high level, that feels like a pointless chore when the permissions will be replaced by the ConvMem ones again anyway. The stateful design also is prone to errors, as we can see. I’d prefer a stateless design, where the permissions are always forced to the same constant value, no matter the previous configuration.

Then again, I lack the entire context of how the ARM VM code works and this might be so much easier at a low level that the stateful design is indeed worth it. I’ll shut up and leave it up to you. :)


I'll send out a separate fix for that. We can resume the discussion on
your patch on the bugzilla at the same time.

Tyvm! Not sure I have anything else to contribute though.





I implemented pre-allocated pages for ARM a while back in a private repo
but never committed it to Mu. Maybe that would also be worth committing
and pushing upstream.


I'd like to understand better whether or not there is a way to avoid
the need for this, but if not, I'd be happy to review your solution.
Does the issue only exist on ARM? Does it still happen after I rewrote
the MMU library? (in 2020)

Sorry to interject with no contribution, but for x86 platforms, our downstream fork removed the requirement that BSData and ConvMem need to have the same permissions. In fact, ideally ConvMem is just unmapped. Can this be enabled for ARM without a solution like Taylor’s? You said you added the requirement as a mitigation.


Mapping a single page in a large unmapped region may require the
allocation of page tables. If populating that page table means we have
to map it first, we have a circular dependency and the recursion, so
we cannot deal with that currently.

Note that the page table walker does not need this page to be mapped,
it is only the code running on the CPU that needs a mapping while it
creates the entries. So this is fixable in principle, by mapping the
page somewhere else in the address space temporarily, just so we can
populate its contents without the need for recursing into the page
table code to map the page table.

This would take a bit of work, though, and I'd like to avoid this if possible.

Tyvm!


Unrelated FYI, we also removed the XP checks for code downstream and all types but ConvMem (which is unmapped or read-only, I forgot) have default permissions of RW. The reason for that is that unlike an OS, we don’t have a fully-featured VM system and especially things like mmap are absent. Thus, any data or code must first be written to the memory before it can be executed. The execute flag is added after loading the code to ensure W^X.


I would like to do the same, but this is currently not feasible. I
recently removed the exec permissions from EfiLoaderData regions in
ArmVirtQemu, and even though GRUB was fixed 5+ years ago not to rely
on this, a couple of distro forks got broken, and so they had to
revert this change in their builds.

:(


With the EFI memory attributes protocol, the OS can tolerate this, and
I am adding handling of this to Linux so it no longer relies on any
allocation being executable by default.

Related: https://lwn.net/ml/linux-kernel/20220303142120.1975-1-baskov@ispras.ru/

Best regards,
Marvin