On 31. Mar 2023, at 11:36, Ard Biesheuvel <ardb@kernel.org> wrote:

On Fri, 31 Mar 2023 at 11:27, Marvin Häuser <mhaeuser@posteo.de> wrote:


On 31. Mar 2023, at 10:59, Ard Biesheuvel <ardb@kernel.org> wrote:

On Fri, 31 Mar 2023 at 10:29, Marvin Häuser <mhaeuser@posteo.de> wrote:


On 31. Mar 2023, at 09:39, Ard Biesheuvel <ardb@kernel.org> wrote:
Hi Marvin,

Thanks for the context.


On Thu, 30 Mar 2023 at 23:54, Marvin Häuser <mhaeuser@posteo.de> wrote:

Hi Ard,

Sorry, I cannot preserve the CC list as the groups.io interface doesn't seem to allow it. Can you please CC me on future revisions?

This patch will badly corrupt binaries. I cannot cite a source right now (if you want me to, please remind me in your response, so I can look it up tomorrow), but for X64 (but not IA32, which is why this is enabled there), relocs are relative to the first *writable* segment. In other words, any relocation to __TEXT will badly corrupt binaries this way.

OMG.

I can't believe how buggy all this stuff is. But I can confirm that
the resulting binaries don't look right, even though they appear to
boot fine.

Codegen does not change from the suppress flag, so there will be no additional text relocs beyond those you introduced. As they target the exception handler, I guess you’d need to actively provoke the broken code paths (and may end up with a nice recursion :) ).


I understand that the codegen is the same. I was specifically talking
about the PE relocations, which seem to be lacking entirely.

Sure, I was just elaborating on the “appear to boot fine” part, which does make sense. When I last tried, the relocs were not absent but underflown. Might be mtoc drops them somehow, I think I inspected the Mach-O directly. Whatever, you reproduce the issue. :)


Fair enough.


In particular, when I dump the PE relocations using
llvm-readobj --coff-basereloc, I don't see any relocations referring
to the .text section.

In AUDK, we support this with two essential changes. The first is that we always generate a writable dummy segment at the beginning of the address space [1], making the relocs relative to the image base. The second is that in ocmtoc, our fork of the abandoned (and pretty badly-bugged) Apple mtoc, we explicitly require this segment to be present and verify its virtual address is the minimum virtual address [2]. It is then omitted from the conversion process [3]. I suggest you replicate these changes and fully switch to ocmtoc for XCODE5 builds.

I'm not going to do any of that. Instead, I am going to drop this
change, and do the following:

- modify the SecPei version of CpuExceptionHandlerLib to put the
vector templates in .data, as I proposed before. This works around the
issue, and given that SEC/PEI is assumed to be read-only anyway (as it
may execute in place from flash) and does not use page alignment for
the sections due to size constraints, it is reasonable to assume that
.text and .data will be mapped executable anyway.

Well, that assumption is more than fair to make for the status quo platforms, but this is just another rock in the way of doing things the right way (even if it’s just VMs).


How so? SEC and PEI could be mapped read-only today, it's just that we
never bother.

I’m not concerned about read-only but NX.


We don't have writable data in SEC or PEI, so this would require SEC,
PEI_CORE and every PEIM to have split .text and .rodata, and round
them up to page size. Not sure this is worth it, especially given the
fact that CoCo targets seems to be skipping the PEI phase entirely.

CoCo = Confidential Computing? Right, I actually hope that’s true. :) But there are also some plans for real hardware here.



Cc Gerd for an OVMF security perspective. Is PEI-time memory protection something you’d be interested in in the future?


My WXN series for ARM maps all PEIMs read-only, and turns off
shadowing entirely (which makes no sense for VMs). So we have most of
what we need to do that, and this change has no bearing on that.

Well yes, if everything is read-only, you guarantee W^X implicitly, but downstream we have plans for the full deal including NX data. It’s however shelved for the distant future, so as long as this is changed with the intention of reverting it once XCODE5 is fixed or dropped, that sounds fine to me. I just don’t like the notion of intentionally breaking the memory permission model as a hack. I rather hope we’ll make some swift progress on removing XCODE5 as a source of frustration. :)


Pardon my bluntness, but why should I care about the shelved future
plans of some downstream project?

No worries. The part you should care about is that this violates a well-established, well-reasoned, and important convention. This generates objectively broken binaries that only happen to work due to current implementation details. The “future plans” part was an explanation of why I’m persistent about it, stating that some folks want to depend on said convention. As your change affects XCODE5 only (and thus there will be no future changes that rely on this hack), I’m fine to drop this. Basically I was scared this will become part of the design and folks will magically start depending on this hack. :)




- update the version that performs the runtime fixups to only do so
when using the XCODE toolchain - we can phase that out once we drop
XCODE support.

I agree if there’s an actual plan on doing that. We depend on XCODE5 downstream, but I think it would literally be easier for us if the upstream version was dropped than rebasing against hacks that our slightly modded variant does not require.

Cc Andrew and Rebecca. I don’t know anyone else who might still be using XCODE5. Any objections to dropping it? If so, any plans to pick up my proposed changes instead?


I wouldn't mind dropping it. In fact, I'm wondering - given that you
need to install nasm and iasl anyway - if it wouldn't make more sense
to use the CLANGPDB toolchain on macOS, and avoid the mtoc mess
entirely?

I’d say using XCODE5 is a historical thing for us. Years ago, Vitaly evaluated both CLANG38 and CLANGPDB and found various things including debugging to be badly broken. In fact, CLANG38 turned out to have issues like misaligning UINT64s *for years*.

Wow, that is very bad. Was that reported to the mailing list?

Yes, with my 2021’s patch ignored, of course. Pedro’s respin was merged though: https://github.com/tianocore/edk2/commit/c5d68ef6e7553ab2894f541eba4e982428ecbd53


However, those issues might have been fixed and it’s not impossible Vitaly will give it another try eventually. In any case, I think our downstream variant of XCODE5 doesn’t require any level of special care, so it doesn’t really matter to us.

(Another thing to consider is despite the bugs are fixed, mtoc has a much higher overall code quality and more safety checks than GenFw, which is used for CLANGDWARF.)

The upstream toolchain has no future in my opinion, as mtoc has been deprecated and already failed to compile certain things (like it lacked Standalone MM types). The reason it still “worked” was because homebrew silently shipped a variant with a subset of our ocmtoc patches. So as I see it, taking our changes or dropping it entirely are the only sane options, even regardless of this particular issue you’re trying to fix. Personally, I have no preference.


I think both GenFw and mtoc are horrible hacks that should be phased
out once we can - with good cross-architecture Clang support for
native PE binaries, I'd hope macOS could move to CLANGPDB for all
targets.

That’s a bit too utopian (or actually, dystopian). First, to my understanding, this will also break GCC, no? Maybe there’s good support for generating PEs now, who knows. As far as I’m aware, pairing PE with DWARF isn’t really a well-supported thing either. But again, old experiences, may be better now.

Yet, even if all of this works fine now, this is still a PE lock-in. For cross-platform and cross-format support, something like GenFw and mtoc is unavoidable. In fact, my current thesis topic is designing a replacement for the TE format and a tool to generate it from PEs and ELFs. In contrast to GenFw and mtoc, the design is not to attempt to translate the details of one format into another, but to define an abstract model of an UEFI image file and use this both on the consumer side (a generic loader library) and the producer side (the generation tool uses an intermediate representation for conversion rather than doing format-to-format). This actually works very well. :)

Best regards,
Marvin