From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail03.groups.io (mail03.groups.io [45.79.227.220]) by spool.mail.gandi.net (Postfix) with ESMTPS id DCEC8740038 for ; Fri, 15 Mar 2024 22:56:56 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=8PzQeGPlHqu54Svu2mYpvzPSfgQwVaMORNs97XevTQ8=; c=relaxed/simple; d=groups.io; h=DKIM-Filter:Message-ID:Date:MIME-Version:User-Agent:Subject:To:Cc:References:From:In-Reply-To:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Resent-Date:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Language:Content-Type:Content-Transfer-Encoding; s=20240206; t=1710543415; v=1; b=aA+DBPTin2dsvFaWxmln+/V62AFMS0ocpTOTeCVpxIinTGbYxxv8BuRlSq16LTVfxQavKu2h cGmb6Ac3U8VrPFo79vEQ7M7/Ns5M0cHRvMYoq+PPpmuK2bTm0m/qpMJeQYWD8u5nq2U+9HmYKee FG2NOEgtZrNnPpap2X9H/t1Ag2tl3JSyvpUQpxx3Fb/Q0MmffngduVsFeqHtyeirRed7qY0+o37 nHQihRDaWI9hqmAyEbi3QtNeYK4Zmp4vu14CgciuWAInqlkYHbCtTX8Pcp1KCkWHDSjm81XKG1F CL/h4ydMf2TM8Ou/YrT5fcBspRFdu3FZtv/xutkrL0G9Q== X-Received: by 127.0.0.2 with SMTP id jmlJYY7687511x82HKy1zQrK; Fri, 15 Mar 2024 15:56:55 -0700 X-Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by mx.groups.io with SMTP id smtpd.web10.6842.1710543414678978949 for ; Fri, 15 Mar 2024 15:56:54 -0700 X-Received: from [10.137.194.171] (unknown [131.107.159.43]) by linux.microsoft.com (Postfix) with ESMTPSA id D45AD20B74C0; Fri, 15 Mar 2024 15:56:53 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com D45AD20B74C0 Message-ID: <58edd77a-1282-4e3a-b2c9-9d5a8dd8b1d2@linux.microsoft.com> Date: Fri, 15 Mar 2024 15:56:53 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [edk2-devel][PATCH v1 1/3] MdeModulePkg: ImagePropertiesRecordLib: Use SectionAlignment for CodeSize To: devel@edk2.groups.io, mhaeuser@posteo.de Cc: Leif Lindholm , Ard Biesheuvel , Sami Mujawar , Taylor Beebe References: <20240227202721.30070-1-osde@...> <20240227202721.30070-2-osde@...> <5d95548d-d25f-4306-a271-a2960cc81ccf@...> <531b3cfe-b57c-4b5a-97fd-45e428f97853@...> <8DF936BA-F1D0-4513-A8F4-36A2CBEFD84A@posteo.de> <9d6f07c2-3579-4555-94eb-fb7b79a8e05a@linux.microsoft.com> From: "Oliver Smith-Denny" In-Reply-To: Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Resent-Date: Fri, 15 Mar 2024 15:56:54 -0700 Reply-To: devel@edk2.groups.io,osde@linux.microsoft.com List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: X-Gm-Message-State: efM9Ke6Q2lZujbGEZ7SNPjR9x7686176AA= Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20240206 header.b=aA+DBPTi; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=linux.microsoft.com (policy=none); spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 45.79.227.220 as permitted sender) smtp.mailfrom=bounce@groups.io On 3/14/2024 2:57 PM, Marvin H=C3=A4user wrote: >=20 >> On 14. Mar 2024, at 15:45, Oliver Smith-Denny=20 >> wrote: >> >> This does not appear to be the case with MSVC built binaries. I am >> seeing that VirtualSize is the size of the data-initialized part of the >> section. >=20 > What? :( So you are saying modern MSVC generates PEs that either have=20 > non-contiguous sections, or they have sections that contain=20 > zero-initialized portions explicitly? Both variants sound badly borked=20 > and should be investigated, even if only for resource constraints. >=20 I don't think this is what I'm saying. What I am trying to say is that on MSVC, I see PE images getting created that have VirtualSize set to the actual number of initialized bytes in that section (not padded to the section alignment). On ElfConverted binaries, I see the VirtualSize is padded to the section alignment. I've dropped an example below > Just thinking about it, is there any chance you set ALIGN =3D FILEALIGN?= =20 No, the specific case where I was researching this was explicitly setting /ALIGN:0x10000 and /FILEALIGN:0x1000 for DXE_RUNTIME_DRIVERs on ARM64 (a UEFI spec requirement). So I would see the SizeOfRawData is aligned to the file alignment, as expected, but VirtualSize would be the actual size of the data. Again, the troubling thing here for me is that the same binary built with gcc has the VirtualSize aligned to the section alignment. And I have seen other code that loads PE images that relies on VirtualSize not including the padding. The spec is vague here, it says VirtualSize is the size of the section as loaded in memory (which would lead me to believe this should include padding) but it does not explicitly say it should be a multiple of the section alignment (as other fields do). But at a minimum I think we should have different toolchains doing the same behavior here. > Because then this will be invoked and your file will be converted to=20 > XIP:=20 > https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6ec= cdc65d/BaseTools/Source/C/GenFw/GenFw.c#L612 >=20 > BaseTools has no concept of what should be a XIP and what should not, so= =20 > if a file somewhat qualifies, it will just always emit a XIP:=20 > https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6ec= cdc65d/BaseTools/Source/C/GenFw/GenFw.c#L2150 >=20 > In AUDK I addressed this by tying XIP to reloc stripping, but that is=20 > more of a convenient heuristic:=20 > https://github.com/acidanthera/audk/blob/13f99fa7cbde0829ba98e1e4f2a03891= a0791ac6/BaseTools/Source/Python/GenFds/EfiSection.py#L293 >=20 >> Whereas on our ElfConverted binaries, what you say is true. The >> difference concerns me. >=20 > Yes, very concerning. Would you mind sharing dumps (section table, etc.)= =20 > and/or binaries (unless you indeed hit what I outlined above)? I recall= =20 > MSVC to rather be more aggressive with space-savings, whereas ElfConvert= =20 > would include section-alignment padding within the file (but not=20 > zero-init data). Correct, ElfConvert sets the file alignment and the section alignment to the same value, based on common-page-size/max-page-size. I will show some examples at the bottom. >=20 >> Agreed. I have done further research and v2 uses VirtualSize. I was >> initially led astray here because the ElfConvert tool sets VirtualSize >> and SizeOfRawData to the SectionAlignment universally: >> >> https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6e= ccdc65d/BaseTools/Source/C/GenFw/ElfConvert.c#L134-L136 >=20 > Yes=E2=80=A6 plenty of more things in the tool to lead people astray. :) >=20 >> Yep, the PE headers are definitely filled with information that can be >> deduced from other sources and the spec is not very verbose, which is >> why I definitely want to tread carefully here. For example, it says >> SizeOfImage is the size of the image loaded into memory, which >> current implementations seem to take to mean the sum of the headers >> plus each section's VirtualSize rounded to the SectionAlignment (which >> the spec does call out that SizeOfImage must be aligned to the >> SectionAlignment). VirtualSize it says is also the size of the section >> as loaded into memory, but current implementations (other than our >> ElfConvert script which I'm currently investigating if we should >> change that) seem to take this to mean the size of the initialized >> data in memory, i.e. the "real" data from the image, not the zeroed >> data to get to the SectionAlignment. >=20 > Do you have an example for that? I=E2=80=99m quite certain SizeOfImage sh= ould=20 > always cover the whole thing. See below for the VirtualSize examples, I'm confused on your comment on SizeOfImage. I agree that SizeOfImage covers everything as loaded into memory and I have not seen any issues there. >=20 >> But the language in the spec >> is almost identical between the two. And again, we have a difference >> in our MSVC and gcc built binaries here. >=20 > Do we? :( >=20 >> I agree with you there. That being said, the current Image record code >> is, unsurprisingly, broken, and does need to be fixed. Can you review >> the v2 of this? The only difference is that I changed to VirtualSize. >=20 > LGTM. Do you mind adding your RB to v2? And certainly if you have any other comments that is greatly appreciated. >=20 >> Do you think the code should be updated for the case of VirtualSize >> =3D=3D 0? In the current implementation, it will round 0 up to the >> SectionAlignment >=20 > It should not. Well, it will technically round, but it will remain 0. It= =20 > does not round up to the strictly next boundary, but if the current=20 > value is a multiple of the alignment (0 is a multiple of any alignment),= =20 > it remains unchanged. >=20 >> , which is what I believe will happen in the PE >> loader, also, but want to get your thoughts on it. >> >> This would still be a short term solution to fix the active bug >> (ARM64 cannot boot a 64k pagesize Linux with a broken MAT table or >> we will boot without the benefit of the MAT, if we happen to have >> an aligned total region but not each entry, testing shows). I do >> think it is a broken pattern to try to determine the image records >> after we've loaded them into memory, the PE loader should create >> them with the factual data of what was loaded. >=20 > There are many more broken patterns everywhere. Did you notice yet that= =20 > the API itself is broken from the ground up? Not to keep =E2=80=9Cadverti= sing=E2=80=9D=20 > it, but maybe our paper will give you some fun too. :)=20 > https://arxiv.org/pdf/2012.05471.pdf I'll take a look. It's been on my list to look at for awhile but as always other things come up :) Examples of the differences I see between MSVC and gcc binaries: I originally noticed this on ARM64 on edk2, but wanted to make sure I saw it on x64 too, so this is with binaries from Project Mu's QemuQ35Pkg (edk2 doesn't have VS2022 support and I didn't feel like adding it or reverting back to VS2019). For reference, this is building the current top of tree at a4dd5d9785f48302c95fec852338d6bd26dd961a. I dumped ReportStatusCodeRouterRuntimeDxe.efi from both using dumpbin (from VS2022) to examine the PE headers. MSVC selected header values: Main header: 0x3200 size of code 0x2400 size of initialized data 0x0 size of uninitialized data 0x1000 section alignment 0x200 file alignment 0xB000 size of image 6 sections: .data, .pdata, .rdata, .reloc, .text, .xdata .text section: 0x30DF virtual size 0x3200 size of raw data .data section: 0x130 virtual size 0x200 size of raw data GCC ElfConverted selected header values: Main header: 0x4000 size of code 0x1000 size of initialized data 0x0 size of uninitialized data 0x1000 section alignment 0x1000 file alignment 0x7000 size of image 3 sections: .data, .text, .reloc .text section: 0x4000 virtual size 0x4000 size of raw data .data section: 0x1000 virtual size 0x1000 size of raw data So my concern here is that ElfConvert takes a different view of VirtualSize, that is should be section aligned, whereas MSVC binaries take VirtualSize to be the actual size without padding. I think the correct thing to do would be change ElfConvert to do what MSVC does (although the spec is vague and so I can understand the confusion). In practice this will tend to work as is, since we are using SizeOfImage to allocate with, but as you have pointed out there are so many edge cases that having a difference here makes me worried we would see something weird with only binaries built by one toolchain. I am curious to hear your thoughts though. This is very easy to reproduce, build any UEFI binary with MSVC and GCC and compare the headers. Thanks, Oliver -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#116819): https://edk2.groups.io/g/devel/message/116819 Mute This Topic: https://groups.io/mt/104610770/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-