> On 14. Mar 2024, at 15:45, Oliver Smith-Denny <osde@linux.microsoft.com> wrote:
> 
> This does not appear to be the case with MSVC built binaries. I am
> seeing that VirtualSize is the size of the data-initialized part of the
> section.

What? :( So you are saying modern MSVC generates PEs that either have non-contiguous sections, or they have sections that contain zero-initialized portions explicitly? Both variants sound badly borked and should be investigated, even if only for resource constraints.

Just thinking about it, is there any chance you set ALIGN = FILEALIGN? Because then this will be invoked and your file will be converted to XIP: https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/GenFw.c#L612

BaseTools has no concept of what should be a XIP and what should not, so if a file somewhat qualifies, it will just always emit a XIP: https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/GenFw.c#L2150

In AUDK I addressed this by tying XIP to reloc stripping, but that is more of a convenient heuristic: https://github.com/acidanthera/audk/blob/13f99fa7cbde0829ba98e1e4f2a03891a0791ac6/BaseTools/Source/Python/GenFds/EfiSection.py#L293

> Whereas on our ElfConverted binaries, what you say is true. The
> difference concerns me.

Yes, very concerning. Would you mind sharing dumps (section table, etc.) and/or binaries (unless you indeed hit what I outlined above)? I recall MSVC to rather be more aggressive with space-savings, whereas ElfConvert would include section-alignment padding within the file (but not zero-init data).

> Agreed. I have done further research and v2 uses VirtualSize. I was
> initially led astray here because the ElfConvert tool sets VirtualSize
> and SizeOfRawData to the SectionAlignment universally:
> 
> https://github.com/tianocore/edk2/blob/5572b43c6767f7cc46b074ae1fc288f6eccdc65d/BaseTools/Source/C/GenFw/ElfConvert.c#L134-L136

Yes… plenty of more things in the tool to lead people astray. :)

> Yep, the PE headers are definitely filled with information that can be
> deduced from other sources and the spec is not very verbose, which is
> why I definitely want to tread carefully here. For example, it says
> SizeOfImage is the size of the image loaded into memory, which
> current implementations seem to take to mean the sum of the headers
> plus each section's VirtualSize rounded to the SectionAlignment (which
> the spec does call out that SizeOfImage must be aligned to the
> SectionAlignment). VirtualSize it says is also the size of the section
> as loaded into memory, but current implementations (other than our
> ElfConvert script which I'm currently investigating if we should
> change that) seem to take this to mean the size of the initialized
> data in memory, i.e. the "real" data from the image, not the zeroed
> data to get to the SectionAlignment.

Do you have an example for that? I’m quite certain SizeOfImage should always cover the whole thing.

> But the language in the spec
> is almost identical between the two. And again, we have a difference
> in our MSVC and gcc built binaries here.

Do we? :(

> I agree with you there. That being said, the current Image record code
> is, unsurprisingly, broken, and does need to be fixed. Can you review
> the v2 of this? The only difference is that I changed to VirtualSize.

LGTM.

> Do you think the code should be updated for the case of VirtualSize
> == 0? In the current implementation, it will round 0 up to the
> SectionAlignment

It should not. Well, it will technically round, but it will remain 0. It does not round up to the strictly next boundary, but if the current value is a multiple of the alignment (0 is a multiple of any alignment), it remains unchanged.

> , which is what I believe will happen in the PE
> loader, also, but want to get your thoughts on it.
> 
> This would still be a short term solution to fix the active bug
> (ARM64 cannot boot a 64k pagesize Linux with a broken MAT table or
> we will boot without the benefit of the MAT, if we happen to have
> an aligned total region but not each entry, testing shows). I do
> think it is a broken pattern to try to determine the image records
> after we've loaded them into memory, the PE loader should create
> them with the factual data of what was loaded.

There are many more broken patterns everywhere. Did you notice yet that the API itself is broken from the ground up? Not to keep “advertising” it, but maybe our paper will give you some fun too. :) https://arxiv.org/pdf/2012.05471.pdf

Best regards,
Marvin

> 
> Thanks,
> Oliver



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#116780): https://edk2.groups.io/g/devel/message/116780
Mute This Topic: https://groups.io/mt/104610770/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-