From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Mon, 22 Jul 2019 12:50:14 -0700 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A60DC308222F; Mon, 22 Jul 2019 19:50:13 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-117-23.ams2.redhat.com [10.36.117.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id AB4A0601B7; Mon, 22 Jul 2019 19:50:09 +0000 (UTC) Subject: Re: [edk2-devel] [Patch 0/9] Enable multiple process AutoGen To: "Feng, Bob C" Cc: "devel@edk2.groups.io" , =?UTF-8?Q?Philippe_Mathieu-Daud=c3=a9?= References: <20190718061423.30612-1-bob.c.feng@intel.com> <08650203BA1BD64D8AD9B6D5D74A85D160B454F1@SHSMSX105.ccr.corp.intel.com> From: "Laszlo Ersek" Message-ID: <620b60c8-ac48-e7be-c265-e127b8101be3@redhat.com> Date: Mon, 22 Jul 2019 21:50:08 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <08650203BA1BD64D8AD9B6D5D74A85D160B454F1@SHSMSX105.ccr.corp.intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Mon, 22 Jul 2019 19:50:13 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 07/22/19 11:11, Feng, Bob C wrote: > Hi Laszlo, >=20 > Thanks for your detailed testing and comments. >=20 > I send out patch serial V2 to resolve comment 1#. V2 also fixed some i= ssues found by others and was generated based on master latest version. >=20 > For 3#, I can reproduce the issue, I'll fix it in patch serial V3. My = understanding is that when I fix 3#, 2# will be resolved accordingly. Rig= ht?=20 Yes. I first discovered the problem with (2), but then I wanted to reproduce the issue in isolation from QEMU. That is section (3). So if you fix (3), I expect the build will work just fine as part of the QEMU tree (2) as well. > For 5#-2, When using "-n 1", Autogen should start one worker subproces= s to generate autogen files that will be the same as run in serial. > I'll make the change in patches V3 OK, sounds good. Thank you, Laszlo >=20 > For the keyboard interrupt handling mentioned in 4# and 5#-1, I'll do m= ore testing on it. And do the fix in V3. =20 > Before sending the patches, I tested Ctrl+C to stop the build but not t= he "pick up". >=20 > Thanks, > Bob >=20 > -----Original Message----- > From: Laszlo Ersek [mailto:lersek@redhat.com]=20 > Sent: Sunday, July 21, 2019 8:26 PM > To: Feng, Bob C > Cc: devel@edk2.groups.io; Philippe Mathieu-Daud=C3=A9 > Subject: Re: [edk2-devel] [Patch 0/9] Enable multiple process AutoGen >=20 > Hi Bob, >=20 > On 07/18/19 08:14, Bob Feng wrote: >> BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=3D1875 >> >> In order to improve the build performance, we implemented=20 >> multiple-processes AutoGen. This change will reduce 20% time for=20 >> AutoGen phase. >> >> The design document can be got from: >> https://edk2.groups.io/g/devel/files/Designs/2019/0627/Multiple-thread >> -AutoGen.pdf >> >> This patch serial pass the build of Ovmf, MinKabylake, MinPurley,=20 >> packages under Edk2 repository and intel client and server platforms. >=20 > I've done some basic regression-testing with this set, applied on top o= f current master (=3D commit 8ff68cd5e4c9, "ShellPkg: acpiview: DBG2: Rem= ove redundant forward declarations", 2019-07-19). >=20 > (1) The build process now seems to produce files called >=20 > GlobalVar_37d7e986-f7e9-45c2-8067-e371421a626c_AARCH64.bin > GlobalVar_37d7e986-f7e9-45c2-8067-e371421a626c_ARM.bin > GlobalVar_5a9e7754-d81b-49ea-85ad-69eaa7b1539b_IA32.bin > GlobalVar_5a9e7754-d81b-49ea-85ad-69eaa7b1539b_X64.bin >=20 > in the project root directory. (The GUIDs match the PLATFORM_GUID > values from ArmVirtQemu.dsc and OvmfPkg*.dsc.) >=20 > They seem to come from patch "BaseTools: Enable Multiple Process > AutoGen". >=20 > Can we place these files under the Build/ directory somewhere, > instead? >=20 > (2) The QEMU project now bundles edk2 (as a git submodule for the edk2 > tree, and some firmware binaries built from that). The build script= s > carried by QEMU set PYTHON_COMMAND to python2, in order to work > around TianoCore BZ#1607. >=20 > For regression-testing this set, I ran >=20 > $ make -C roms efi >=20 > with QEMU's edk2 submodule advanced to commit 8ff68cd5e4c9, plus > this set applied (see above), and with QEMU itself being at > e2b47666fe15. >=20 > The build itself passes fine, so that's good. >=20 > But, some (not all) of the firmware binaries are *misbuilt*. I'll > elaborate below (independently of QEMU). >=20 > (3) Consider the following test. (Note that this is independent of the > QEMU project entirely.) >=20 > (3a) Set up a pristine build environment: >=20 > # Open a new shell, then: > $ git clean -ffdx > $ git reset --hard > $ git submodule deinit --force --all > $ git checkout master > $ git submodule update --init > $ export PYTHON_COMMAND=3Dpython2 > $ source edksetup.sh > $ nice make -C "$EDK_TOOLS_PATH" \ > -j $(getconf _NPROCESSORS_ONLN) >=20 > (3b) Build OvmfPkgIA32 with one set of features, and capture a > checksum of the resultant firmware binary: >=20 > $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \ > -t GCC48 > $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd > 8d56575dd3b5c89ac6f1829ae4f41b55 >=20 > (3c) *Re*build the same platform with a different set of features, > and capture a checksum again: >=20 > $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \ > -t GCC48 -D SECURE_BOOT_ENABLE -D SMM_REQUIRE > $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd > ed79052e9add242174ccefc78a5bddb3 >=20 > Okay, now repeat all three steps from zero, but with the present > feature patch set applied: >=20 > (3d) Repeat (3a), but rather than checking out the master branch > with "git checkout", check out the topic branch that is the > same "master" commit, *plus* the present series applied on top= . >=20 > Don't forget to start a new shell first! >=20 > (3e) Repeat (3b): >=20 > $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \ > -t GCC48 > $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd > 8d56575dd3b5c89ac6f1829ae4f41b55 >=20 > Notice that the checksum is identical to that from (3b), so > all's good. >=20 > (3f) Repeat (3c): >=20 > $ nice build -a IA32 -b NOOPT -p OvmfPkg/OvmfPkgIA32.dsc \ > -t GCC48 -D SECURE_BOOT_ENABLE -D SMM_REQUIRE > $ md5sum Build/OvmfIa32/NOOPT_GCC48/FV/OVMF_CODE.fd > baf08c5a9ecb0adc754f5b48410b6476 >=20 > This is the problem: the checksum does not match the one from > (3c). And, the binary from (3f) is in fact mis-built, it > crashes immediately when launched on QEMU/KVM, with "emulation > failure". >=20 > Thus: >=20 > - Without the patch set, if I build a platform, then *rebuild* it > with different -D flags, then both builds produce proper outputs. >=20 > - With the patch series applied however, only the initial build > produces good results. The *rebuild*, with different -D flags, is > "poisoned" by the artifacts still laying around from the first > build. >=20 > In other words, this is a "stale cache" problem. >=20 > (4) Not a show stopper, but a divergence from previous behavior: >=20 > When one presses Ctrl-S (the "stop" character) in a terminal, the > terminal output / autoscrolling is suspended, until Ctrl-Q (the > "start" character) is pressed. This is generally used for two > things: scrolling back the terminal history manually, for reading > something that scrolled off the screen, and for briefly pausing > whatever processing the program is doing. >=20 > The idea being, if the program is blocked in a terminal write > operation, it cannot do anything else. >=20 > The present patch set changes the lastly mentioned behavior, becaus= e > the logging is now decoupled to a separate thread ("BaseTools: Add > LogAgent to support multiple process Autogen"). As a result, if the > *output* of LogAgent is blocked, that does not block the *input* of > LogAgent, and so the queue / buffer in LogAgent grows without bound= , > and the build continues even though the terminal output is stopped. >=20 > This is a common initial symptom of programs with internal queues -= - > it's usually good to implement internal flow control, under which > any given queue has a limit, and if that limit is reached, the queu= e > blocks the producer threads feeding it. >=20 > Again, this is not a show stopper, just a divergence from previous > behavior which we should be aware of, and possibly document > somewhere. >=20 > Importantly, Ctrl-Z (the "suspend" character) works just fine, for > *both* purposes described above. (And then people can resume the > build with "fg", like always.) >=20 > (5) Two questions out of interest: >=20 > - Has this series been tested for restarting builds that were > interrupted with Ctrl-C? Can the new build "pick up" where the > previous build was interrupted? >=20 > - Do we intend to offer a flag for disabling this feature? I'm not > implying that the old code should be preserved as an alternative, > but perhaps some internal constants related to parallelization ca= n > be tweaked such that the ultimate behavior becomes serial. Is the > "-n 1" option supposed to have that effect? >=20 > The second question is relevant because people might want to disabl= e > the new feature until it stabilizes (see issue (3) for example). >=20 > Thanks, > Laszlo >=20