From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Sun, 21 Jul 2019 05:26:11 -0700 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2805B30821B3; Sun, 21 Jul 2019 12:26:11 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-20.ams2.redhat.com [10.36.116.20]) by smtp.corp.redhat.com (Postfix) with ESMTP id B038A5C8A7; Sun, 21 Jul 2019 12:26:07 +0000 (UTC) Subject: Re: [edk2-devel] [Patch 0/9] Enable multiple process AutoGen To: bob.c.feng@intel.com References: <20190718061423.30612-1-bob.c.feng@intel.com> Cc: devel@edk2.groups.io, =?UTF-8?Q?Philippe_Mathieu-Daud=c3=a9?= From: "Laszlo Ersek" Message-ID: Date: Sun, 21 Jul 2019 14:26:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190718061423.30612-1-bob.c.feng@intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Sun, 21 Jul 2019 12:26:11 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Hi Bob, On 07/18/19 08:14, Bob Feng wrote: > BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=1875 > > In order to improve the build performance, we implemented > multiple-processes AutoGen. This change will reduce 20% time > for AutoGen phase. > > The design document can be got from: > https://edk2.groups.io/g/devel/files/Designs/2019/0627/Multiple-thread-AutoGen.pdf > > This patch serial pass the build of Ovmf, MinKabylake, MinPurley, > packages under Edk2 repository and intel client and server platforms. I've done some basic regression-testing with this set, applied on top of current master (= commit 8ff68cd5e4c9, "ShellPkg: acpiview: DBG2: Remove redundant forward declarations", 2019-07-19). (1) The build process now seems to produce files called GlobalVar_37d7e986-f7e9-45c2-8067-e371421a626c_AARCH64.bin GlobalVar_37d7e986-f7e9-45c2-8067-e371421a626c_ARM.bin GlobalVar_5a9e7754-d81b-49ea-85ad-69eaa7b1539b_IA32.bin GlobalVar_5a9e7754-d81b-49ea-85ad-69eaa7b1539b_X64.bin in the project root directory. (The GUIDs match the PLATFORM_GUID values from ArmVirtQemu.dsc and OvmfPkg*.dsc.) They seem to come from patch "BaseTools: Enable Multiple Process AutoGen". Can we place these files under the Build/ directory somewhere, instead? (2) The QEMU project now bundles edk2 (as a git submodule for the edk2 tree, and some firmware binaries built from that). The build scripts carried by QEMU set PYTHON_COMMAND to python2, in order to work around TianoCore BZ#1607. For regression-testing this set, I ran $ make -C roms efi with QEMU's edk2 submodule advanced to commit 8ff68cd5e4c9, plus this set applied (see above), and with QEMU itself being at e2b47666fe15. The build itself passes fine, so that's good. But, some (not all) of the firmware binaries are *misbuilt*. I'll elaborate below (independently of QEMU). (3) Consider the following test. (Note that this is independent of the QEMU project entirely.) (3a) Set up a pristine build environment: # Open a new shell, then: $ git clean -ffdx $ git reset --hard $ git submodule deinit --force --all $ git checkout master $ git submodule update --init $ export PYTHON_COMMAND=python2 $ source edksetup.sh $ nice make -C "$EDK_TOOLS_PATH" \ -j $(getconf _NPROCESSORS_ONLN) (3b) Build OvmfPkgIA32 with one set of features, and capture a checksum of the resultant firmware binary: $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \ -t GCC48 $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd 8d56575dd3b5c89ac6f1829ae4f41b55 (3c) *Re*build the same platform with a different set of features, and capture a checksum again: $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \ -t GCC48 -D SECURE_BOOT_ENABLE -D SMM_REQUIRE $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd ed79052e9add242174ccefc78a5bddb3 Okay, now repeat all three steps from zero, but with the present feature patch set applied: (3d) Repeat (3a), but rather than checking out the master branch with "git checkout", check out the topic branch that is the same "master" commit, *plus* the present series applied on top. Don't forget to start a new shell first! (3e) Repeat (3b): $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \ -t GCC48 $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd 8d56575dd3b5c89ac6f1829ae4f41b55 Notice that the checksum is identical to that from (3b), so all's good. (3f) Repeat (3c): $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \ -t GCC48 -D SECURE_BOOT_ENABLE -D SMM_REQUIRE $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd baf08c5a9ecb0adc754f5b48410b6476 This is the problem: the checksum does not match the one from (3c). And, the binary from (3f) is in fact mis-built, it crashes immediately when launched on QEMU/KVM, with "emulation failure". Thus: - Without the patch set, if I build a platform, then *rebuild* it with different -D flags, then both builds produce proper outputs. - With the patch series applied however, only the initial build produces good results. The *rebuild*, with different -D flags, is "poisoned" by the artifacts still laying around from the first build. In other words, this is a "stale cache" problem. (4) Not a show stopper, but a divergence from previous behavior: When one presses Ctrl-S (the "stop" character) in a terminal, the terminal output / autoscrolling is suspended, until Ctrl-Q (the "start" character) is pressed. This is generally used for two things: scrolling back the terminal history manually, for reading something that scrolled off the screen, and for briefly pausing whatever processing the program is doing. The idea being, if the program is blocked in a terminal write operation, it cannot do anything else. The present patch set changes the lastly mentioned behavior, because the logging is now decoupled to a separate thread ("BaseTools: Add LogAgent to support multiple process Autogen"). As a result, if the *output* of LogAgent is blocked, that does not block the *input* of LogAgent, and so the queue / buffer in LogAgent grows without bound, and the build continues even though the terminal output is stopped. This is a common initial symptom of programs with internal queues -- it's usually good to implement internal flow control, under which any given queue has a limit, and if that limit is reached, the queue blocks the producer threads feeding it. Again, this is not a show stopper, just a divergence from previous behavior which we should be aware of, and possibly document somewhere. Importantly, Ctrl-Z (the "suspend" character) works just fine, for *both* purposes described above. (And then people can resume the build with "fg", like always.) (5) Two questions out of interest: - Has this series been tested for restarting builds that were interrupted with Ctrl-C? Can the new build "pick up" where the previous build was interrupted? - Do we intend to offer a flag for disabling this feature? I'm not implying that the old code should be preserved as an alternative, but perhaps some internal constants related to parallelization can be tweaked such that the ultimate behavior becomes serial. Is the "-n 1" option supposed to have that effect? The second question is relevant because people might want to disable the new feature until it stabilizes (see issue (3) for example). Thanks, Laszlo