public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Jordan Justen <jordan.l.justen@intel.com>,
	edk2-devel-01 <edk2-devel@lists.01.org>
Cc: Gary Ching-Pang Lin <glin@suse.com>
Subject: Re: [PATCH 2/3] OvmfPkg: introduce FD_SIZE_4MB (mainly) for Windows HCK
Date: Mon, 1 May 2017 12:51:42 +0200	[thread overview]
Message-ID: <c010a543-1d66-05f1-a441-59e2c06c98c6@redhat.com> (raw)
In-Reply-To: <149358697668.23065.6363402854761002239@jljusten-skl>

On 04/30/17 23:16, Jordan Justen wrote:
> On 2017-04-30 07:42:36, Laszlo Ersek wrote:
>> On 04/30/17 02:48, Jordan Justen wrote:
>>>
>>> I tested a 2MB RELEASE build of OvmfIa32X64 on GCC 4.9 with:
>>
>> RELEASE builds of virtual UEFI firmware are unsupportable in an
>> enterprise distribution. On tenths of occasions I've been able to
>> analyze OVMF and ArmVirtQemu error reports from:
>> - failed ASSERTs, and
>> - DEBUG messages
>> that would have been all lost in a RELEASE build.
>>
>> QEMU (even upstream QEMU) rejects being built with NDEBUG; they consider
>> the asserts so important. For example, in "hw/virtio/virtio.c":
> 
> We're discussing space issues, and a RELEASE build is considered
> unusable? I don't know what that says about the qemu project, but it
> certainly is not how EDK II was designed.

I didn't write "unusable". A RELEASE build can certainly be run by end
users. I wrote "unsupportable".

The norm in an enterprise distro is not that a package is given to end
users and they use it happily ever after without encountering bugs.

The norm is that users encounter bugs or just plain non-intuitive
behavior all the time, and they report them. I have to be able to
support them, preferably without asking for access to their systems.
DEBUG logs and working ASSERTs are very important for this.

> 
> Nevertheless, I agree that it is very nice to be able to enable DEBUG
> mode with the standard features included.

It's not "very nice", it is a supportability requirement.

Here's a real life analogy. My work laptop is a Lenovo ThinkPad W541. On
random occasions it locks up during boot, while in the firmware. It
doesn't print anything at all (which is not surprising if the bug hits
before the video card is bound by the GOP driver).

Now, if the laptop had an actual serial port (which could be printed to
even during SEC, via DebugLib / SerialPortLib), *and* the firmware
shipped on the machine were a DEBUG build, I could perhaps file a
sensible bug report somewhere. It's not the case, so any engineer on the
vendor's side would be clueless. I would be instructed to install a
firmware upgrade, which I wouldn't do, because it might or might not
brick the laptop completely.

And this is supposed to be a "mobile workstation" class laptop.

The fact is that the vendor doesn't really intend to support this
laptop, otherwise they'd have built a serial port into it.

I intend to support OVMF in RHEL-7.4+ the best way I can, and for that,
I absolutely need DEBUGs and ASSERTs to be in place.

BTW, status code reporting is also entirely useless. On my APM Mustang
(an aarch64 development box), there used to be serious firmware bugs,
and I figured I'd look into some of those. The firmware by default only
printed status codes to the serial port, which provided zero value. Only
after I rebuilt & reflashed the firmware with DEBUG_VERBOSE enabled, and
could correlate those messages with the kernel logs, did I manage to fix
some issues.


Regarding the design of edk2; I'm not qualified to comment on that (I
wasn't there), but the fact remains that in many places ASSERT()s are
used to catch common errors. Even if that wasn't the case, and ASSERT()s
were only used for catching impossible situations, I do want those
situations caught and logged in a production / customer / partner
environment too.

> Regarding 'standard
> features', see my IP6/HTTP/TLS question below.
> 
>>>
>>> I also tested a 2MB RELEASE build of OvmfIa32X64 on GCC 4.9 with:
>>>
>>>  * SECURE_BOOT_ENABLE=1
>>>  * NETWORK_IP6_ENABLE=1
>>>  * HTTP_BOOT_ENABLE=1
>>>  * SMM_REQUIRE=1
>>>  * TLS_ENABLE=1
>>>
>>> I don't think we consider those network items as standard
>>> requirements, yet there was still ~373k available. In order to bump
>>> the variables to 120k, we need to use 2 * (120 - 56) => 128k.
>>
>> Do I understand correctly that you suggest adding 64K to the live area,
>> 64K to the spare area, and decreasing FVMAIN_COMPACT by 128k?
>>
>> I think that's a step in the wrong direction.
> 
> It is starting to look less and less like 1MB is a feasible size for
> OVMF. Maybe going forward we'll drop 1MB, and support 2/4MB. If that
> happens, then it would be nice if the 2MB image covered the known
> requirements.

If we change the variable store size in the default 2MB build (at the
expense of FVMAIN_COMPACT, or anything else), without any additional
(non-default) build flags, then all downstreams that rebase / rebuild
edk2 after such an upstream patch will see their preexistent VMs break.

Their existent varstore files will not be compatible with the rebuilt
firmware binary.

So, tieing the varstore structure change to a new build flag is required
in any case, in order not to regress current consumers of the upstream
2MB split build.

> 
>> $ build \
>>   -b DEBUG \
>>   -a IA32 -a X64 \
>>   -p OvmfPkg/OvmfPkgIa32X64.dsc \
>>   -t GCC48 \
>>   -D SMM_REQUIRE \
>>   -D SECURE_BOOT_ENABLE \
>>   -D HTTP_BOOT_ENABLE \
>>   -D NETWORK_IP6_ENABLE \
>>   -D TLS_ENABLE
> 
> Do you enable the last 3 in your production builds? I didn't think it
> was the case, but it would change things...

That's a very good question, and I expected it.

Any sane person being responsible for supporting a package will strive
very hard to minimize the features enabled in the package, in order to
minimize the problem surface / support burden. I tend to consider myself
a sane person, so no, HTTP_BOOT_ENABLE, NETWORK_IP6_ENABLE, and
TLS_ENABLE are not turned on.

(TLS_ENABLE carries even more weight, because it increases the security
attack surface, so turning *that* off is very desirable.)

*But*, I certainly want to keep the *ability* to turn these features on
(and maybe later features, in 2-3 years' time) if a customer or a
partner requests it.

In short, I want to increase my safety margin for supportability as much
as possible, both by minimizing the current feature set and by
maximizing the space reserved for future features.

> 
>>>
>>> Regarding how to 'upgrade' a system from using the smaller storage
>>> size, I don't think there are any good answers. (Which also applies to
>>> the 4MB case.)
>>
>> I agree, on both points.
> 
> How would you plan to support VMs with the old 2MB layout?

That's the point: I don't. I can choose not to support it, beacuse that
layout can be left behind with RHEL-7.3, in which OVMF was still Tech
Preview.

This will likely change in RHEL-7.4 -- this shouldn't be considered a
public promise --, and once that happens, any such change will require
us to:

- add another firmware binary to the OVMF package, built for the new layout,
- add a matching varstore template,
- teach all the layered products (libvirt, prominently) about the new
firmware binary (so that new VMs / new machine types would default to that),
- keep the older fw binaries around and updated forever (until the major
release reaches EOL) beause VMs created earlier depend on them.

I think it's understandable why we don't want to do this. Going to 4MB
does not *guarantee* we won't have to do this, but it does add a lot of
safety.

> 
> I guess it could be easy enough to develop a python script to resize
> the old layout to the new layout, but I'm not sure if it is possible
> for libvirt to handle the need to launch such an upgrade script..

Actually, both issues are hard.

(1) The "resizer" would be able to handle three layers of abstractions
in the varstore file (the QEMU flash drive representation, the FTW
layer, and the variable layer). For example, if you "lose power" (i.e.,
forcibly power down a VM) while it is executing Variable Reclaim, your
varstore would be in an inconsistent / unfinished state (the live area,
the FTW working block, and the spare area would reflect this). Now, the
edk2 driver stack handles this *very* robustly (this is why FTW has been
invented in the first place!), so on next VM restart, everything would
recover. *But*, the resizer would have to handle this situation just as
well -- it would have to resize the varstore file in a manner that the
interrupted reclaim can be resumed / restarted after resizing and VM reboot.

I definitely don't want to write or maintain such a program.

(2) Yes, converting varstore files on OVMF / libvirt / etc package
updates is not viable. Such updates (for both OVMF and libvirt) are
generally permitted while virtual machines are running. The OVMF binary
can be replaced without issues (it is not rewritten in-place, but
deleted and recreated as a new file by RPM, AIUI), plus, even if it were
rewritten in place, I'd verified that the pflash chip for the fw binary
is only loaded at QEMU startup.

However, converting an in-use varstore file is plain impossible
(assuming but not conceding that (1) was covered somehow).

> 
>> Which is why I think it's time to make the big jump now, and be safe for
>> the next several years.
> 
> Why not just go for 8 MB, and give Microsoft, say 1 MB for variables?
> That should be 'future proof', right? :)

I expected this example as well :)

I didn't want to go 8MB all at once because I didn't want to exhaust
those 8MBs presently allowed by QEMU all at once. Using 8MB seems
overkill at this point.

Importantly, I don't claim that your point about overkill is *generally*
invalid. I fully agree that overkill exists. I just put the sweet size
elsewhere than you do. To me, 512KB / 4MB appear a comfortable safety
margin for OVMF, considering the RHEL7 lifecycle, while leaving 4MB free
in QEMU for whatever use case might come up in the future.

But, if you really prefer larger than 4MB, I can do that. :)

> The reality is that there's no good way to tell what Microsoft (or Red
> Hat) may require in the future.

Correct. There's no way to guarantee anything, but we can (and should)
plan for the future.

> Right now we know that Microsoft
> appears to be saying 120k is good for at least the near term.

"Near term" is meaningless for an enterprise distro that's been released
for ~3 years and is about to get ~7 more years of support, and is
planning to introduce full support for a new package.

I linked the message
<http://mid.mail-archive.com/24f63595e68c476eb98cf87e7abfa1bc@BL2PR03MB242.namprd03.prod.outlook.com>
earlier, in which Larry Cleeton @ MS stated, in February 2014, that
Hyper-V still considered 128KB generous.

Fast forward cca. *two and half* years (Sept 2016), and Microsoft's UEFI
plugfest presentation (which you linked, thank you) says,

  Windows Hardware Compatibility Requirements for RS2
  ...
  A total of at least 120 KB of non-volatile NVRAM storage
  memory must be available for NV UEFI variables (authenticated and
  unauthenticated, BS and RT) used by UEFI Secure Boot and Windows.
  ...

The ~120KB varstore size got just demoted, from a "generous" amount to a
required minimum, in the time frame of 2.5 years.

We can't guarantee 512KB will be enough for the next 7 years, but
anything less than that seems short-sighted to me.

> 
> I'm not against us defining a 4MB image size, but it is frustrating to
> be pushed into it by a single test.

Tell me about it.

> You did give an example of a crash
> dump using 10k of variable space, but even then it is not clear to me
> that 56k is insufficient in normal usage.

For normal usage, 56K is okay-ish. Not generous by any means, but sort
of acceptable. However, passing SVVP with OVMF is really-really
important to us, and I reckon other enterprise distros that ship OVMF
might feel the same.

> Regarding your suggested 4MB layout, it seems reasonable. My only
> concern is, if the minimum flash size is bumped.

(ITYM "if the minimum varstore size requirement is bumped".)

> What are the chances
> that 248k will cover it? Unfortunately (or fortunately), no one I've
> asked seem to know of any plans for the requirement to increase beyond
> 120k.

Again, I agree this is a valid concern; there's no way to guarantee
248KB will be enough for 7 years (I keep saying 7 years, that's
important to me, others might care about a different time frame).

Given that 120-128KB has gone from "generous" to "required" in 2.5
years, I feel that providing 248KB for the next 7 years (thereby more
than quadrupling OVMF's current offer) is the low water mark for "prudent".

If you'd like us to go, say, 504KB live varstore, 1MB NV, and 5MB or 6MB
total, I'm game.

Thanks!
Laszlo


  reply	other threads:[~2017-05-01 10:51 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-29 20:14 [PATCH 0/3] OvmfPkg: add FD_SIZE_4MB for Windows HCK SB tests, and for future proofing Laszlo Ersek
2017-04-29 20:14 ` [PATCH 1/3] OvmfPkg/OvmfPkg.fdf.inc: extract VARS_LIVE_SIZE and VARS_SPARE_SIZE macros Laszlo Ersek
2017-04-29 20:14 ` [PATCH 2/3] OvmfPkg: introduce FD_SIZE_4MB (mainly) for Windows HCK Laszlo Ersek
2017-04-30  0:48   ` Jordan Justen
2017-04-30 14:42     ` Laszlo Ersek
2017-04-30 21:16       ` Jordan Justen
2017-05-01 10:51         ` Laszlo Ersek [this message]
2017-05-01 17:15           ` Jordan Justen
2017-05-01 17:23           ` Jordan Justen
2017-05-01 18:40             ` Laszlo Ersek
2017-05-01 19:20               ` Jordan Justen
2017-05-01 23:07                 ` Laszlo Ersek
2017-05-01 23:38                   ` Jordan Justen
2017-05-02 14:39                     ` Laszlo Ersek
2017-05-02 18:22                       ` Jordan Justen
2017-05-02 19:31                         ` Laszlo Ersek
2017-05-02 21:45                           ` Jordan Justen
2017-05-03 13:46                             ` Laszlo Ersek
2017-05-01  0:06     ` Laszlo Ersek
2017-04-29 20:15 ` [PATCH 3/3] OvmfPkg: raise max variable size (auth & non-auth) to 33KB for FD_SIZE_4MB Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c010a543-1d66-05f1-a441-59e2c06c98c6@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox