From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by mx.groups.io with SMTP id smtpd.web11.2.1675896789923677971 for ; Wed, 08 Feb 2023 14:53:10 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=NjiCB2C+; spf=pass (domain: kernel.org, ip: 139.178.84.217, mailfrom: ardb@kernel.org) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2091E617E5 for ; Wed, 8 Feb 2023 22:53:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 85A75C433D2 for ; Wed, 8 Feb 2023 22:53:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675896788; bh=tdfEg4MD8OE/zE+yMxrYC5keqz9JWIuRJrkRTH202AA=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=NjiCB2C+hZpIp7cM0eI2pgVOkCPjDGhPb0nzmgUP7nblSZHLXkjglfy+UdJojnkO9 wuG4+uri5/l+GS6aqSggX18X9ecpCaaRLAZYH4Gogy7zIHleFcMy0f8fzwRNEg5hv2 wxXc1vU+t+HghMBiullUSo81zxXNHIDzqbJJLljL3A9eh922mMWYIGnHOkcktpMMya d6R0jb2M2HosDiG67TD4QLHuHlm3A810Fjx3WpLfAq0OxSr8tKiLkCiYB0LxtGalIy Cv1etblDJQKUnGuDpJqNfsqJ8Lv11ANXkCQxImITEk4nO7FhO+7xMwUuzQvpvqDJ05 VxhA04F1qoagg== Received: by mail-lf1-f50.google.com with SMTP id w11so647808lfu.11 for ; Wed, 08 Feb 2023 14:53:08 -0800 (PST) X-Gm-Message-State: AO0yUKXqp62qsn0aSlC//WXLbCZk6Qg7DAYsibrAuHEP7T06WWNFacmo gYBQffmy1HPwUiv0ks1EkXAJE5rgXpAcJvanKaQ= X-Google-Smtp-Source: AK7set/5zLfVc9xC3sb1MrpnsuLZql9lEc8uDUICyZem8vFaQYO3X1L8lrnboTZXlV/2MUBLJg3oN/UTK03isTb/9Iw= X-Received: by 2002:ac2:5550:0:b0:4b6:e197:3aeb with SMTP id l16-20020ac25550000000b004b6e1973aebmr1604440lfk.233.1675896786498; Wed, 08 Feb 2023 14:53:06 -0800 (PST) MIME-Version: 1.0 References: <20230208175812.700129-1-ardb@kernel.org> <20230208175812.700129-3-ardb@kernel.org> <97EEE177-AE23-47CD-946F-F8578988C5EF@posteo.de> <2b7edb49-0c46-0654-ca78-23c56b6add92@taylorbeebe.com> In-Reply-To: <2b7edb49-0c46-0654-ca78-23c56b6add92@taylorbeebe.com> From: "Ard Biesheuvel" Date: Wed, 8 Feb 2023 23:52:49 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [edk2-devel] [PATCH 2/3] ArmPkg/CpuDxe: Perform preliminary NX remap of free memory To: Taylor Beebe Cc: devel@edk2.groups.io, mhaeuser@posteo.de, Michael Kinney , Liming Gao , Jiewen Yao , Michael Kubacki , Sean Brogan , Rebecca Cran , Leif Lindholm , Sami Mujawar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 8 Feb 2023 at 19:57, Taylor Beebe wrote: > > > > On 2/8/2023 10:49 AM, Ard Biesheuvel wrote: > > On Wed, 8 Feb 2023 at 19:32, Marvin H=C3=A4user wr= ote: > >> > >> Thanks!! :) Comments inline. > >> > >>> On 8. Feb 2023, at 18:58, Ard Biesheuvel wrote: > >>> > >>> The DXE core implementation of PcdDxeNxMemoryProtectionPolicy already > >>> contains an assertion that EfiConventionalMemory and EfiBootServicesD= ata > >>> are subjected to the same policy when it comes to the use of NX > >>> permissions. The reason for this is that we may otherwise end up with > >>> unbounded recursion in the page table code, given that allocating a p= age > >>> table would then involve a permission attribute change, and this coul= d > >>> result in the need for a block entry to be split, which would trigger > >>> the allocation of a page table recursively. > >>> > >>> For the same reason, a shortcut exists in ApplyMemoryProtectionPolicy= () > >>> where, instead of setting the memory attributes unconditionally, we > >>> compare the NX policies and avoid touching the page tables if they ar= e > >>> the same for the old and the new memory types. Without this shortcut,= we > >>> may end up in a situation where, as the CPU arch protocol DXE driver = is > >>> ramping up, the same unbounded recursion is triggered, due to the fac= t > >>> that the NX policy for EfiConventionalMemory has not been applied yet= . > >>> > >>> To break this cycle, let's remap all EfiConventionalMemory regions > >>> according to the NX policy for EfiBootServicesData before exposing th= e > >>> CPU arch protocol to the DXE core and other drivers. This ensures tha= t > >>> creating EfiBootServicesData allocations does not result in memory > >>> attribute changes, and therefore no recursion. > >>> > >>> Signed-off-by: Ard Biesheuvel > >>> --- > >>> ArmPkg/Drivers/CpuDxe/CpuDxe.c | 77 ++++++++++++++++++++ > >>> ArmPkg/Drivers/CpuDxe/CpuDxe.inf | 2 + > >>> 2 files changed, 79 insertions(+) > >>> > >>> diff --git a/ArmPkg/Drivers/CpuDxe/CpuDxe.c b/ArmPkg/Drivers/CpuDxe/C= puDxe.c > >>> index d04958e79e52..83fd6fd4e476 100644 > >>> --- a/ArmPkg/Drivers/CpuDxe/CpuDxe.c > >>> +++ b/ArmPkg/Drivers/CpuDxe/CpuDxe.c > >>> @@ -11,6 +11,8 @@ > >>> > >>> #include > >>> > >>> +#include > >>> + > >>> BOOLEAN mIsFlushingGCD; > >>> > >>> /** > >>> @@ -227,6 +229,69 @@ InitializeDma ( > >>> CpuArchProtocol->DmaBufferAlignment =3D ArmCacheWritebackGranule (= ); > >>> } > >>> > >>> +STATIC > >>> +VOID > >>> +RemapUnusedMemoryNx ( > >>> + VOID > >>> + ) > >>> +{ > > This feels somewhat hacky but it's strictly better than what we have > right now and a value add so I'm all for this change. > > >>> + UINT64 TestBit; > >>> + UINTN MemoryMapSize; > >>> + UINTN MapKey; > >>> + UINTN DescriptorSize; > >>> + UINT32 DescriptorVersion; > >>> + EFI_MEMORY_DESCRIPTOR *MemoryMap; > >>> + EFI_MEMORY_DESCRIPTOR *MemoryMapEntry; > >>> + EFI_MEMORY_DESCRIPTOR *MemoryMapEnd; > >>> + EFI_STATUS Status; > >>> + > >>> + TestBit =3D LShiftU64 (1, EfiBootServicesData); > >>> + if ((PcdGet64 (PcdDxeNxMemoryProtectionPolicy) & TestBit) =3D=3D 0= ) { > >>> + return; > >>> + } > >>> + > >>> + MemoryMapSize =3D 0; > >>> + MemoryMap =3D NULL; > >>> + > >>> + Status =3D gBS->GetMemoryMap ( > >>> + &MemoryMapSize, > >>> + MemoryMap, > >>> + &MapKey, > >>> + &DescriptorSize, > >>> + &DescriptorVersion > >>> + ); > >>> + ASSERT (Status =3D=3D EFI_BUFFER_TOO_SMALL); > >>> + do { > >>> + MemoryMap =3D (EFI_MEMORY_DESCRIPTOR *)AllocatePool (MemoryMapSi= ze); > >> > >> nitpick: *If* there is a V2, maybe drop the cast? > >> > >>> + ASSERT (MemoryMap !=3D NULL); > >> > >> I know ASSERTing for NULL is a common pattern for some reason, but I'd= rather not have more code with this added. > >> > >>> + Status =3D gBS->GetMemoryMap ( > >>> + &MemoryMapSize, > >>> + MemoryMap, > >>> + &MapKey, > >>> + &DescriptorSize, > >>> + &DescriptorVersion > >>> + ); > >> > >> Another nitpick, isn't it common practice to call the Core* functions = directly within *Core? I know there is code that doesn't, but the former sh= ould be more efficient. > >> > >>> + if (EFI_ERROR (Status)) { > >>> + FreePool (MemoryMap); > >>> + } > >>> + } while (Status =3D=3D EFI_BUFFER_TOO_SMALL); > >> > >> Is this guaranteed to terminate? I mean, I get the idea - try again wi= th the larger size and when allocating the bigger buffer, its potential new= entry will already be accounted for. However, I saw downstream code that t= ried something like this (they actually added a constant overhead ahead of = time) bounded by something like 5 iterations. Might just have been defensiv= e programming - probably was -, but it's not trivial to verify, I think, is= it? > >> > >> Regarding the added constant, the spec actually says the following, wh= ich obviously is just to shortcut a second round of GetMemoryMap(), but sti= ll: > >> "The actual size of the buffer allocated for the consequent call to Ge= tMemoryMap() should be bigger then the value returned in MemoryMapSize" > >> > >> It appears the spec did not define a canonical way to call GetMemoryMa= p(), did it? :( > >> > > > > This is all copy-pasted from MdeModulePkg/Core/Dxe/Misc/MemoryProtectio= n.c > > > >>> + > >>> + ASSERT_EFI_ERROR (Status); > >>> + > >>> + MemoryMapEntry =3D MemoryMap; > >>> + MemoryMapEnd =3D (EFI_MEMORY_DESCRIPTOR *)((UINT8 *)MemoryMap + = MemoryMapSize); > >>> + while ((UINTN)MemoryMapEntry < (UINTN)MemoryMapEnd) { > >>> + if (MemoryMapEntry->Type =3D=3D EfiConventionalMemory) { > >>> + ArmSetMemoryRegionNoExec ( > >>> + MemoryMapEntry->PhysicalStart, > >>> + EFI_PAGES_TO_SIZE (MemoryMapEntry->NumberOfPages) > >>> + ); > >>> + } > >>> + > >>> + MemoryMapEntry =3D NEXT_MEMORY_DESCRIPTOR (MemoryMapEntry, Descr= iptorSize); > >>> + } > >>> +} > >>> + > >>> EFI_STATUS > >>> CpuDxeInitialize ( > >>> IN EFI_HANDLE ImageHandle, > >>> @@ -240,6 +305,18 @@ CpuDxeInitialize ( > >>> > >>> InitializeDma (&mCpu); > >>> > >>> + // > >>> + // Once we install the CPU arch protocol, the DXE core's memory > >>> + // protection routines will invoke them to manage the permissions = of page > >>> + // allocations as they are created. Given that this includes pages > >>> + // allocated for page tables by this driver, we must ensure that u= nused > >>> + // memory is mapped with the same permissions as boot services dat= a > >>> + // regions. Otherwise, we may end up with unbounded recursion, due= to the > >>> + // fact that updating permissions on a newly allocated page table = may trigger > >>> + // a block entry split, which triggers a page table allocation, et= c etc > >>> + // > >>> + RemapUnusedMemoryNx (); > >> > >> Hmm. I might be too tired, but why is this not already done by Initial= izeDxeNxMemoryProtectionPolicy(), assuming BS_Data and ConvMem have the sam= e XP configuration? > >> > >> This reassures me that the CPU Arch protocol producers should be linke= d into DxeCore rather than loaded at some arbitrary point in time... Unrela= ted to the patch, of course. > >> > > > > The ordering here is a bit tricky. As soon as the CPU arch protocol is > > exposed, every allocation will be remapped individually, resulting in > > page table splits and therefore recursion. > > > > > > As Ard says, this is done in InitializeDxeNxMemoryProtectionPolicy(), > but at that point the calls to ApplyMemoryProtectionPolicy() will cause > the infinite recursion issue when you no longer skip applying attributes > if the to-from types have the same NX policy. > > Yes, it is arbitrary that protection is linked to the CPU Arch Protocol. > But, it would also be bad practice to apply the memory protection policy > before the interface for manipulating attributes has been published for > use by modules outside of the core. Page table and translation table > manipulation is architecturally defined, so I think a good long term > goal should be to allow the manipulation of attributes via a library > rather than a protocol in DXE as ARM platforms currently do. > Note that the routine above is slightly dodgy in the sense that the memory map might change while it is being traversed, due to the page allocations that result from splitting block entries. This is why it is important that EfiConventionalMemory and EfiBootServicesData use the same NX policy, so that such changes to the memory map do not affect the outcome of the traversal. If we want to move this into generic code, I think it should be feasible for the DXE core to - grab the CPU arch protocol but don't make it available globally right away, so that AllocatePages() will not recurse - call the CPU arch protocol to map all EfiConventionalMemory as NX (with the caveat above) - proceed with remapping the active regions, which will now set the attributes on all EfiBootServicesData regions again. That would make the thing a bit more robust, but I am reluctant to add this to generic code and deal with the fallout on x86 or RISC-V if it breaks something there.