On Tue, Nov 22, 2022 at 12:20 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
Hi Pedro,

On Tue, Nov 22, 2022 at 12:35 PM Pedro Falcato <pedro.falcato@gmail.com> wrote:
> Given this patch plus the corresponding linux-efi patches wrt RNG, I'm
> mildly concerned about buggy RDRAND implementations compromising the
> kernel's RNG. Is this not a concern?
 
Hi,

Thank you for your thorough response, glad to have you in this thread.

Speaking with my kernel RNG maintainer hat on, no, this is not really a
concern, for several reasons:

- The kernel's RNG takes input from multiple sources, continuously, and
  tries to mix in new inputs rather quickly, especially at early boot.

I am aware, but I'm more scared when it comes to very early boot (think linux's EFI stub or some other bootloader) I can see how
an ill-advised RNG_PROTOCOL user can try to exclusively rely on it (if it's available, which I don't believe it is atm on non-virtio-rng OVMF) vs
mixing in the few sources you can get at that point, making important things like KASLR addresses or possibly even a stack canary 100% guessable.

- The kernel will use RDRAND on its own, even in the case that EFI
  doesn't provide it, so it's not gaining anything here.

Yes, but as you said, the kernel mixes RDRAND with a lot of other entropy sources and also does proper sanity checking on it.

- EFI on actual baremetal firmware, as opposed to OVMF, already provides
  EFI, so this is par for the course.

Hm? What do you mean?
- Most of those RDRAND bugs have concerned coming in and out of various
  sleep states, which doesn't really apply to early boot EFI.

- And again, just to reinforce the first point, the kernel takes inputs
  from many sources. Having EFI provide its own thing -- via RDRAND or
  any other mechanism -- is complementary and will only help.

Regarding your "corresponding linux-efi patches wrt RNG", I'm not quite
sure what you're referring to. If anything, recent work during this
cycle has been aimed around shuffling more sources of randomness in from
elsewhere. The EFI RNG protocol stuff has been in there already for a
long time. So maybe you misunderstood those? Or I'm misunderstanding
what you're referring to?

Ah yes, I haven't been paying much close attention to the RNG patches themselves but now that I took a closer look I can see you're right.

As a general point, this question of "do we have enough entropy?" or
"are we initialized yet?" is an impossible proposition. Entropy
estimation is impossible, and is only ever a guess, and that guess can
be sometimes wrong, even wildly wrong. So the kernel is increasingly
moving away from /relying/ on that, and is more focused on getting more
sources faster -- incorporating anything it can find, and mixing it into
the output stream more continuously. To that end, if EFI's got a DXE to
do something or another, please hook it up.

Lastly, I think the concern you raised is inappropriate for Ard's
patchset, as it actually doesn't apply to it at all. This patchset is
about hooking an existing DXE up to OVMF, one that is hooked up
elsewhere, but wasn't for OVMF. This alone has nothing to do with the
concern. Rather, the concerns you raised are about the DXE itself. So to
that end, perhaps you should start a new thread or send some patches or
do something to the DXE that you're concerned about (e.g. a basic boring
power-on selftest like what the kernel has or something, if you're extra
worried). Or maybe not, for the reasons I listed above of things being
basically fine.

I know, I'm not yelling at Ard for the (questionable?) choices done in the BaseRngLib code, but I'm concerned this patch may negatively influence any sort of early boot RNG,
particularly for the more naive users of RNG_PROTOCOL, by providing the possibly flawed RDRAND code. If the efi subsystem/EFISTUB code handles this case well by still mixing
in whatever sources it can get before using this entropy, then that's great, but providing things like a non-random RNG_PROTOCOL sounds very broken and very unsafe to me (again invoking
that possible KASLR at very early boot example).

Also the CPUID check seems like an important step towards not-breaking-old-CPUs.
All I'm saying is that we shouldn't just hook up the RNG DXE driver without carefully considering what the code is doing.

Thanks,
Pedro