public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Oliver Smith-Denny" <osde@linux.microsoft.com>
To: devel@edk2.groups.io, mhaeuser@posteo.de
Cc: Leif Lindholm <quic_llindhol@quicinc.com>,
	Ard Biesheuvel <ardb+tianocore@kernel.org>,
	Sami Mujawar <sami.mujawar@arm.com>,
	Taylor Beebe <Taylor.Beebe@microsoft.com>
Subject: Re: [edk2-devel][PATCH v1 1/3] MdeModulePkg: ImagePropertiesRecordLib: Use SectionAlignment for CodeSize
Date: Fri, 15 Mar 2024 15:56:53 -0700	[thread overview]
Message-ID: <58edd77a-1282-4e3a-b2c9-9d5a8dd8b1d2@linux.microsoft.com> (raw)
In-Reply-To: <B36FF71F-0EDD-47AA-B52B-AC2063EEC900@posteo.de>

On 3/14/2024 2:57 PM, Marvin Häuser wrote:
> 
>> On 14. Mar 2024, at 15:45, Oliver Smith-Denny 
>> <osde@linux.microsoft.com> wrote:
>>
>> This does not appear to be the case with MSVC built binaries. I am
>> seeing that VirtualSize is the size of the data-initialized part of the
>> section.
> 
> What? :( So you are saying modern MSVC generates PEs that either have 
> non-contiguous sections, or they have sections that contain 
> zero-initialized portions explicitly? Both variants sound badly borked 
> and should be investigated, even if only for resource constraints.
> 

I don't think this is what I'm saying. What I am trying to say is that
on MSVC, I see PE images getting created that have VirtualSize set to
the actual number of initialized bytes in that section (not padded to
the section alignment). On ElfConverted binaries, I see the VirtualSize
is padded to the section alignment. I've dropped an example below

> Just thinking about it, is there any chance you set ALIGN = FILEALIGN? 

No, the specific case where I was researching this was explicitly
setting /ALIGN:0x10000 and /FILEALIGN:0x1000 for DXE_RUNTIME_DRIVERs
on ARM64 (a UEFI spec requirement). So I would see the SizeOfRawData
is aligned to the file alignment, as expected, but VirtualSize would
be the actual size of the data. Again, the troubling thing here for
me is that the same binary built with gcc has the VirtualSize aligned
to the section alignment. And I have seen other code that loads PE
images that relies on VirtualSize not including the padding. The spec
is vague here, it says VirtualSize is the size of the section as
loaded in memory (which would lead me to believe this should include
padding) but it does not explicitly say it should be a multiple of
the section alignment (as other fields do). But at a minimum I think
we should have different toolchains doing the same behavior here.

> Because then this will be invoked and your file will be converted to 
> XIP: 
> https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/GenFw.c#L612 <https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/GenFw.c#L612>
> 
> BaseTools has no concept of what should be a XIP and what should not, so 
> if a file somewhat qualifies, it will just always emit a XIP: 
> https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/GenFw.c#L2150 <https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/GenFw.c#L2150>
> 
> In AUDK I addressed this by tying XIP to reloc stripping, but that is 
> more of a convenient heuristic: 
> https://github.com/acidanthera/audk/blob/13f99fa7cbde0829ba98e1e4f2a03891a0791ac6/BaseTools/Source/Python/GenFds/EfiSection.py#L293 <https://github.com/acidanthera/audk/blob/13f99fa7cbde0829ba98e1e4f2a03891a0791ac6/BaseTools/Source/Python/GenFds/EfiSection.py#L293>
> 
>> Whereas on our ElfConverted binaries, what you say is true. The
>> difference concerns me.
> 
> Yes, very concerning. Would you mind sharing dumps (section table, etc.) 
> and/or binaries (unless you indeed hit what I outlined above)? I recall 
> MSVC to rather be more aggressive with space-savings, whereas ElfConvert 
> would include section-alignment padding within the file (but not 
> zero-init data).

Correct, ElfConvert sets the file alignment and the section alignment
to the same value, based on common-page-size/max-page-size. I will show
some examples at the bottom.

> 
>> Agreed. I have done further research and v2 uses VirtualSize. I was
>> initially led astray here because the ElfConvert tool sets VirtualSize
>> and SizeOfRawData to the SectionAlignment universally:
>>
>> https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/ElfConvert.c#L134-L136 <https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/ElfConvert.c#L134-L136>
> 
> Yes… plenty of more things in the tool to lead people astray. :)
> 
>> Yep, the PE headers are definitely filled with information that can be
>> deduced from other sources and the spec is not very verbose, which is
>> why I definitely want to tread carefully here. For example, it says
>> SizeOfImage is the size of the image loaded into memory, which
>> current implementations seem to take to mean the sum of the headers
>> plus each section's VirtualSize rounded to the SectionAlignment (which
>> the spec does call out that SizeOfImage must be aligned to the
>> SectionAlignment). VirtualSize it says is also the size of the section
>> as loaded into memory, but current implementations (other than our
>> ElfConvert script which I'm currently investigating if we should
>> change that) seem to take this to mean the size of the initialized
>> data in memory, i.e. the "real" data from the image, not the zeroed
>> data to get to the SectionAlignment.
> 
> Do you have an example for that? I’m quite certain SizeOfImage should 
> always cover the whole thing.

See below for the VirtualSize examples, I'm confused on your comment on
SizeOfImage. I agree that SizeOfImage covers everything as loaded into
memory and I have not seen any issues there.

> 
>> But the language in the spec
>> is almost identical between the two. And again, we have a difference
>> in our MSVC and gcc built binaries here.
> 
> Do we? :(
> 
>> I agree with you there. That being said, the current Image record code
>> is, unsurprisingly, broken, and does need to be fixed. Can you review
>> the v2 of this? The only difference is that I changed to VirtualSize.
> 
> LGTM.

Do you mind adding your RB to v2? And certainly if you have any other
comments that is greatly appreciated.

> 
>> Do you think the code should be updated for the case of VirtualSize
>> == 0? In the current implementation, it will round 0 up to the
>> SectionAlignment
> 
> It should not. Well, it will technically round, but it will remain 0. It 
> does not round up to the strictly next boundary, but if the current 
> value is a multiple of the alignment (0 is a multiple of any alignment), 
> it remains unchanged.
> 
>> , which is what I believe will happen in the PE
>> loader, also, but want to get your thoughts on it.
>>
>> This would still be a short term solution to fix the active bug
>> (ARM64 cannot boot a 64k pagesize Linux with a broken MAT table or
>> we will boot without the benefit of the MAT, if we happen to have
>> an aligned total region but not each entry, testing shows). I do
>> think it is a broken pattern to try to determine the image records
>> after we've loaded them into memory, the PE loader should create
>> them with the factual data of what was loaded.
> 
> There are many more broken patterns everywhere. Did you notice yet that 
> the API itself is broken from the ground up? Not to keep “advertising” 
> it, but maybe our paper will give you some fun too. :) 
> https://arxiv.org/pdf/2012.05471.pdf <https://arxiv.org/pdf/2012.05471.pdf>

I'll take a look. It's been on my list to look at for awhile but as
always other things come up :)

Examples of the differences I see between MSVC and gcc binaries:

I originally noticed this on ARM64 on edk2, but wanted to make sure I
saw it on x64 too, so this is with binaries from Project Mu's QemuQ35Pkg
(edk2 doesn't have VS2022 support and I didn't feel like adding it
or reverting back to VS2019). For reference, this is building the
current top of tree at a4dd5d9785f48302c95fec852338d6bd26dd961a.

I dumped ReportStatusCodeRouterRuntimeDxe.efi from both using dumpbin
(from VS2022) to examine the PE headers.

MSVC selected header values:

Main header:
0x3200 size of code
0x2400 size of initialized data
0x0    size of uninitialized data
0x1000 section alignment
0x200  file alignment
0xB000 size of image

6 sections: .data, .pdata, .rdata, .reloc, .text, .xdata

.text section:
0x30DF virtual size
0x3200 size of raw data

.data section:
0x130 virtual size
0x200 size of raw data


GCC ElfConverted selected header values:

Main header:
0x4000 size of code
0x1000 size of initialized data
0x0    size of uninitialized data
0x1000 section alignment
0x1000 file alignment
0x7000 size of image

3 sections: .data, .text, .reloc

.text section:
0x4000 virtual size
0x4000 size of raw data

.data section:
0x1000 virtual size
0x1000 size of raw data

So my concern here is that ElfConvert takes a
different view of VirtualSize, that is should be
section aligned, whereas MSVC binaries take
VirtualSize to be the actual size without padding.
I think the correct thing to do would be change
ElfConvert to do what MSVC does (although the spec
is vague and so I can understand the confusion).
In practice this will tend to work as is, since we
are using SizeOfImage to allocate with, but as you
have pointed out there are so many edge cases that
having a difference here makes me worried we would
see something weird with only binaries built by one toolchain.

I am curious to hear your thoughts though. This is
very easy to reproduce, build any UEFI binary with
MSVC and GCC and compare the headers.

Thanks,
Oliver


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116819): https://edk2.groups.io/g/devel/message/116819
Mute This Topic: https://groups.io/mt/104610770/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



  reply	other threads:[~2024-03-15 22:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240227202721.30070-1-osde@...>
     [not found] ` <20240227202721.30070-2-osde@...>
     [not found]   ` <CAMj1kXGDCsFZmPsKe4GD1XdT+t9SKHVStNSx-FsA_YwBiVh6ng@...>
     [not found]     ` <5d95548d-d25f-4306-a271-a2960cc81ccf@...>
     [not found]       ` <CAMj1kXGsso6BjOajEgqE3AOC1ZSdStQmeYmCoJMQo_tK1QHTWw@...>
     [not found]         ` <531b3cfe-b57c-4b5a-97fd-45e428f97853@...>
2024-03-14  9:57           ` [edk2-devel][PATCH v1 1/3] MdeModulePkg: ImagePropertiesRecordLib: Use SectionAlignment for CodeSize Marvin Häuser
2024-03-14 14:45             ` Oliver Smith-Denny
2024-03-14 21:57               ` Marvin Häuser
2024-03-15 22:56                 ` Oliver Smith-Denny [this message]
2024-03-15 23:45                   ` Marvin Häuser
2024-03-16 15:47                     ` Oliver Smith-Denny
2024-02-27 20:27 [edk2-devel][PATCH v1 0/3] MdeModulePkg: ImagePropertiesRecordLib Fixes Oliver Smith-Denny
2024-02-27 20:27 ` [edk2-devel][PATCH v1 1/3] MdeModulePkg: ImagePropertiesRecordLib: Use SectionAlignment for CodeSize Oliver Smith-Denny
2024-03-01 11:58   ` Ard Biesheuvel
2024-03-04 17:49     ` Oliver Smith-Denny
2024-03-04 18:54       ` Ard Biesheuvel
2024-03-04 19:24         ` Oliver Smith-Denny
     [not found]         ` <17B9A63355E03AE5.30946@groups.io>
2024-03-04 22:38           ` Oliver Smith-Denny
     [not found]           ` <17B9B0CE1BDDC06B.30946@groups.io>
2024-03-11 19:34             ` Oliver Smith-Denny
2024-03-12  8:32               ` Ard Biesheuvel
2024-03-12 14:38                 ` Oliver Smith-Denny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58edd77a-1282-4e3a-b2c9-9d5a8dd8b1d2@linux.microsoft.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox