From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 46F2F21A18AA9 for ; Mon, 1 May 2017 03:51:45 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AB09385547; Mon, 1 May 2017 10:51:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com AB09385547 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=lersek@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com AB09385547 Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-75.phx2.redhat.com [10.3.116.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id 713475C893; Mon, 1 May 2017 10:51:43 +0000 (UTC) To: Jordan Justen , edk2-devel-01 References: <20170429201500.18496-1-lersek@redhat.com> <20170429201500.18496-3-lersek@redhat.com> <149351328512.20670.1563878734495138189@jljusten-skl> <030f8312-35ce-5c86-205c-2ee6c0b5ab8b@redhat.com> <149358697668.23065.6363402854761002239@jljusten-skl> Cc: Gary Ching-Pang Lin From: Laszlo Ersek Message-ID: Date: Mon, 1 May 2017 12:51:42 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <149358697668.23065.6363402854761002239@jljusten-skl> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 01 May 2017 10:51:44 +0000 (UTC) Subject: Re: [PATCH 2/3] OvmfPkg: introduce FD_SIZE_4MB (mainly) for Windows HCK X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 May 2017 10:51:45 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 04/30/17 23:16, Jordan Justen wrote: > On 2017-04-30 07:42:36, Laszlo Ersek wrote: >> On 04/30/17 02:48, Jordan Justen wrote: >>> >>> I tested a 2MB RELEASE build of OvmfIa32X64 on GCC 4.9 with: >> >> RELEASE builds of virtual UEFI firmware are unsupportable in an >> enterprise distribution. On tenths of occasions I've been able to >> analyze OVMF and ArmVirtQemu error reports from: >> - failed ASSERTs, and >> - DEBUG messages >> that would have been all lost in a RELEASE build. >> >> QEMU (even upstream QEMU) rejects being built with NDEBUG; they consider >> the asserts so important. For example, in "hw/virtio/virtio.c": > > We're discussing space issues, and a RELEASE build is considered > unusable? I don't know what that says about the qemu project, but it > certainly is not how EDK II was designed. I didn't write "unusable". A RELEASE build can certainly be run by end users. I wrote "unsupportable". The norm in an enterprise distro is not that a package is given to end users and they use it happily ever after without encountering bugs. The norm is that users encounter bugs or just plain non-intuitive behavior all the time, and they report them. I have to be able to support them, preferably without asking for access to their systems. DEBUG logs and working ASSERTs are very important for this. > > Nevertheless, I agree that it is very nice to be able to enable DEBUG > mode with the standard features included. It's not "very nice", it is a supportability requirement. Here's a real life analogy. My work laptop is a Lenovo ThinkPad W541. On random occasions it locks up during boot, while in the firmware. It doesn't print anything at all (which is not surprising if the bug hits before the video card is bound by the GOP driver). Now, if the laptop had an actual serial port (which could be printed to even during SEC, via DebugLib / SerialPortLib), *and* the firmware shipped on the machine were a DEBUG build, I could perhaps file a sensible bug report somewhere. It's not the case, so any engineer on the vendor's side would be clueless. I would be instructed to install a firmware upgrade, which I wouldn't do, because it might or might not brick the laptop completely. And this is supposed to be a "mobile workstation" class laptop. The fact is that the vendor doesn't really intend to support this laptop, otherwise they'd have built a serial port into it. I intend to support OVMF in RHEL-7.4+ the best way I can, and for that, I absolutely need DEBUGs and ASSERTs to be in place. BTW, status code reporting is also entirely useless. On my APM Mustang (an aarch64 development box), there used to be serious firmware bugs, and I figured I'd look into some of those. The firmware by default only printed status codes to the serial port, which provided zero value. Only after I rebuilt & reflashed the firmware with DEBUG_VERBOSE enabled, and could correlate those messages with the kernel logs, did I manage to fix some issues. Regarding the design of edk2; I'm not qualified to comment on that (I wasn't there), but the fact remains that in many places ASSERT()s are used to catch common errors. Even if that wasn't the case, and ASSERT()s were only used for catching impossible situations, I do want those situations caught and logged in a production / customer / partner environment too. > Regarding 'standard > features', see my IP6/HTTP/TLS question below. > >>> >>> I also tested a 2MB RELEASE build of OvmfIa32X64 on GCC 4.9 with: >>> >>> * SECURE_BOOT_ENABLE=1 >>> * NETWORK_IP6_ENABLE=1 >>> * HTTP_BOOT_ENABLE=1 >>> * SMM_REQUIRE=1 >>> * TLS_ENABLE=1 >>> >>> I don't think we consider those network items as standard >>> requirements, yet there was still ~373k available. In order to bump >>> the variables to 120k, we need to use 2 * (120 - 56) => 128k. >> >> Do I understand correctly that you suggest adding 64K to the live area, >> 64K to the spare area, and decreasing FVMAIN_COMPACT by 128k? >> >> I think that's a step in the wrong direction. > > It is starting to look less and less like 1MB is a feasible size for > OVMF. Maybe going forward we'll drop 1MB, and support 2/4MB. If that > happens, then it would be nice if the 2MB image covered the known > requirements. If we change the variable store size in the default 2MB build (at the expense of FVMAIN_COMPACT, or anything else), without any additional (non-default) build flags, then all downstreams that rebase / rebuild edk2 after such an upstream patch will see their preexistent VMs break. Their existent varstore files will not be compatible with the rebuilt firmware binary. So, tieing the varstore structure change to a new build flag is required in any case, in order not to regress current consumers of the upstream 2MB split build. > >> $ build \ >> -b DEBUG \ >> -a IA32 -a X64 \ >> -p OvmfPkg/OvmfPkgIa32X64.dsc \ >> -t GCC48 \ >> -D SMM_REQUIRE \ >> -D SECURE_BOOT_ENABLE \ >> -D HTTP_BOOT_ENABLE \ >> -D NETWORK_IP6_ENABLE \ >> -D TLS_ENABLE > > Do you enable the last 3 in your production builds? I didn't think it > was the case, but it would change things... That's a very good question, and I expected it. Any sane person being responsible for supporting a package will strive very hard to minimize the features enabled in the package, in order to minimize the problem surface / support burden. I tend to consider myself a sane person, so no, HTTP_BOOT_ENABLE, NETWORK_IP6_ENABLE, and TLS_ENABLE are not turned on. (TLS_ENABLE carries even more weight, because it increases the security attack surface, so turning *that* off is very desirable.) *But*, I certainly want to keep the *ability* to turn these features on (and maybe later features, in 2-3 years' time) if a customer or a partner requests it. In short, I want to increase my safety margin for supportability as much as possible, both by minimizing the current feature set and by maximizing the space reserved for future features. > >>> >>> Regarding how to 'upgrade' a system from using the smaller storage >>> size, I don't think there are any good answers. (Which also applies to >>> the 4MB case.) >> >> I agree, on both points. > > How would you plan to support VMs with the old 2MB layout? That's the point: I don't. I can choose not to support it, beacuse that layout can be left behind with RHEL-7.3, in which OVMF was still Tech Preview. This will likely change in RHEL-7.4 -- this shouldn't be considered a public promise --, and once that happens, any such change will require us to: - add another firmware binary to the OVMF package, built for the new layout, - add a matching varstore template, - teach all the layered products (libvirt, prominently) about the new firmware binary (so that new VMs / new machine types would default to that), - keep the older fw binaries around and updated forever (until the major release reaches EOL) beause VMs created earlier depend on them. I think it's understandable why we don't want to do this. Going to 4MB does not *guarantee* we won't have to do this, but it does add a lot of safety. > > I guess it could be easy enough to develop a python script to resize > the old layout to the new layout, but I'm not sure if it is possible > for libvirt to handle the need to launch such an upgrade script.. Actually, both issues are hard. (1) The "resizer" would be able to handle three layers of abstractions in the varstore file (the QEMU flash drive representation, the FTW layer, and the variable layer). For example, if you "lose power" (i.e., forcibly power down a VM) while it is executing Variable Reclaim, your varstore would be in an inconsistent / unfinished state (the live area, the FTW working block, and the spare area would reflect this). Now, the edk2 driver stack handles this *very* robustly (this is why FTW has been invented in the first place!), so on next VM restart, everything would recover. *But*, the resizer would have to handle this situation just as well -- it would have to resize the varstore file in a manner that the interrupted reclaim can be resumed / restarted after resizing and VM reboot. I definitely don't want to write or maintain such a program. (2) Yes, converting varstore files on OVMF / libvirt / etc package updates is not viable. Such updates (for both OVMF and libvirt) are generally permitted while virtual machines are running. The OVMF binary can be replaced without issues (it is not rewritten in-place, but deleted and recreated as a new file by RPM, AIUI), plus, even if it were rewritten in place, I'd verified that the pflash chip for the fw binary is only loaded at QEMU startup. However, converting an in-use varstore file is plain impossible (assuming but not conceding that (1) was covered somehow). > >> Which is why I think it's time to make the big jump now, and be safe for >> the next several years. > > Why not just go for 8 MB, and give Microsoft, say 1 MB for variables? > That should be 'future proof', right? :) I expected this example as well :) I didn't want to go 8MB all at once because I didn't want to exhaust those 8MBs presently allowed by QEMU all at once. Using 8MB seems overkill at this point. Importantly, I don't claim that your point about overkill is *generally* invalid. I fully agree that overkill exists. I just put the sweet size elsewhere than you do. To me, 512KB / 4MB appear a comfortable safety margin for OVMF, considering the RHEL7 lifecycle, while leaving 4MB free in QEMU for whatever use case might come up in the future. But, if you really prefer larger than 4MB, I can do that. :) > The reality is that there's no good way to tell what Microsoft (or Red > Hat) may require in the future. Correct. There's no way to guarantee anything, but we can (and should) plan for the future. > Right now we know that Microsoft > appears to be saying 120k is good for at least the near term. "Near term" is meaningless for an enterprise distro that's been released for ~3 years and is about to get ~7 more years of support, and is planning to introduce full support for a new package. I linked the message earlier, in which Larry Cleeton @ MS stated, in February 2014, that Hyper-V still considered 128KB generous. Fast forward cca. *two and half* years (Sept 2016), and Microsoft's UEFI plugfest presentation (which you linked, thank you) says, Windows Hardware Compatibility Requirements for RS2 ... A total of at least 120 KB of non-volatile NVRAM storage memory must be available for NV UEFI variables (authenticated and unauthenticated, BS and RT) used by UEFI Secure Boot and Windows. ... The ~120KB varstore size got just demoted, from a "generous" amount to a required minimum, in the time frame of 2.5 years. We can't guarantee 512KB will be enough for the next 7 years, but anything less than that seems short-sighted to me. > > I'm not against us defining a 4MB image size, but it is frustrating to > be pushed into it by a single test. Tell me about it. > You did give an example of a crash > dump using 10k of variable space, but even then it is not clear to me > that 56k is insufficient in normal usage. For normal usage, 56K is okay-ish. Not generous by any means, but sort of acceptable. However, passing SVVP with OVMF is really-really important to us, and I reckon other enterprise distros that ship OVMF might feel the same. > Regarding your suggested 4MB layout, it seems reasonable. My only > concern is, if the minimum flash size is bumped. (ITYM "if the minimum varstore size requirement is bumped".) > What are the chances > that 248k will cover it? Unfortunately (or fortunately), no one I've > asked seem to know of any plans for the requirement to increase beyond > 120k. Again, I agree this is a valid concern; there's no way to guarantee 248KB will be enough for 7 years (I keep saying 7 years, that's important to me, others might care about a different time frame). Given that 120-128KB has gone from "generous" to "required" in 2.5 years, I feel that providing 248KB for the next 7 years (thereby more than quadrupling OVMF's current offer) is the low water mark for "prudent". If you'd like us to go, say, 504KB live varstore, 1MB NV, and 5MB or 6MB total, I'm game. Thanks! Laszlo