From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: lersek@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Thu, 08 Aug 2019 15:18:20 -0700 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BF20C3084249; Thu, 8 Aug 2019 22:18:19 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-28.ams2.redhat.com [10.36.116.28]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1FB005D772; Thu, 8 Aug 2019 22:18:17 +0000 (UTC) Subject: Re: [edk2-devel] [Patch 00/10 V8] Enable multiple process AutoGen To: "Feng, Bob C" , "devel@edk2.groups.io" , "leif.lindholm@linaro.org" Cc: Andrew Fish , "Kinney, Michael D" , "Gao, Liming" References: <20190807042537.11928-1-bob.c.feng@intel.com> <20190808134522.GY25813@bivouac.eciton.net> <08650203BA1BD64D8AD9B6D5D74A85D160B559E9@SHSMSX105.ccr.corp.intel.com> From: "Laszlo Ersek" Message-ID: Date: Fri, 9 Aug 2019 00:18:17 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <08650203BA1BD64D8AD9B6D5D74A85D160B559E9@SHSMSX105.ccr.corp.intel.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Thu, 08 Aug 2019 22:18:19 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit On 08/08/19 17:38, Feng, Bob C wrote: > Hi Laszlo and Leif, > > Thanks for you detailed testing and comments. > > I'd like to explain the failure of the test 3#. I can reproduce the failure with your steps and I found this failure can also be reproduced without multiple process autogen patch set. I debugged and found this failure is due to --hash build option. I double tested that if remove --hash build option, the test 3# can pass. Would you please double verified test 3# without --hash? > > I think we can enter a new BZ for the --hash bug. Confirmed -- with "--hash" removed from the build command line, the build is picked up fine after Ctrl-C. (And the firmware binary is sound.) So, my ACK stands. ( And now I remember that in my v3 testing, I also omitted "--hash": http://mid.mail-archive.com/4ea3d3fa-2210-3642-2337-db525312d312@redhat.com https://edk2.groups.io/g/devel/message/44246 At the bottom I stated that I didn't test "--hash". ) Thanks Laszlo > -----Original Message----- > From: devel@edk2.groups.io [mailto:devel@edk2.groups.io] On Behalf Of Leif Lindholm > Sent: Thursday, August 8, 2019 9:45 PM > To: Laszlo Ersek > Cc: Feng, Bob C ; devel@edk2.groups.io; Andrew Fish ; Kinney, Michael D ; Gao, Liming > Subject: Re: [edk2-devel] [Patch 00/10 V8] Enable multiple process AutoGen > > Hi Laszlo, > > Thanks for looping me in. > > On Thu, Aug 08, 2019 at 03:08:22PM +0200, Laszlo Ersek wrote: >> (+ Andrew, Leif, Mike; Liming) >> >> On 08/07/19 06:25, Bob Feng wrote: >> (3) In my normal edk2 clone, I cleaned the tree, applied your patches >> (again on top of commit 96603b4f02b9), and started a build: >> >> $ . edksetup.sh >> $ nice make -C "$EDK_TOOLS_PATH" -j $(getconf _NPROCESSORS_ONLN) $ >> nice -n 19 build \ >> -a IA32 \ >> -p OvmfPkg/OvmfPkgIa32.dsc \ >> -t GCC48 \ >> -b NOOPT \ >> -n 4 \ >> -D SMM_REQUIRE \ >> -D SECURE_BOOT_ENABLE \ >> -D NETWORK_TLS_ENABLE \ >> -D NETWORK_IP6_ENABLE \ >> -D NETWORK_HTTP_BOOT_ENABLE \ >> --report-file=.../build.ovmf.32.report \ >> --log=.../build.ovmf.32.log \ >> --cmd-len=65536 \ >> --hash \ >> --genfds-multi-thread >> >> This command located Python3: >> >>> WORKSPACE = .../edk2 >>> EDK_TOOLS_PATH = .../edk2/BaseTools >>> CONF_PATH = .../edk2/Conf >>> PYTHON_COMMAND = /usr/bin/python3.6 >>> >>> >>> Processing meta-data . >>> Architecture(s) = IA32 >>> Build target = NOOPT >>> Toolchain = GCC48 >> >> The build launched fine. >> >> After 10-20 seconds into the build, I interrupted it with Ctrl-C: >> >>> build.py... >>> : error 7000: Failed to execute command >>> make tbuild >>> [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/ShellPkg/Library/UefiShell >>> Debug1CommandsLib/UefiShellDebug1CommandsLib] >>> >>> >>> build.py... >>> : error 7000: Failed to execute command >>> make tbuild >>> [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/ShellPkg/Library/UefiShell >>> Driver1CommandsLib/UefiShellDriver1CommandsLib] >>> >>> >>> build.py... >>> : error 7000: Failed to execute command >>> make tbuild >>> [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/CryptoPkg/Library/OpensslL >>> ib/OpensslLib] >>> >>> >>> build.py... >>> : error 7000: Failed to execute command >>> make tbuild >>> [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/MdePkg/Library/BaseLib/Bas >>> eLib] >>> >>> - Aborted - >>> Build end time: 14:05:56, Aug.08 2019 Build total time: 00:00:15 >> >> As next step, I repeated the same "build" command as above, in order >> to continue the interrupted build. Unfortunately, this failed: >> >>> WORKSPACE = .../edk2 >>> EDK_TOOLS_PATH = .../edk2/BaseTools >>> CONF_PATH = .../edk2/Conf >>> PYTHON_COMMAND = /usr/bin/python3.6 >>> >>> >>> Processing meta-data >>> .Architecture(s) = IA32 >>> Build target = NOOPT >>> Toolchain = GCC48 >>> >>> Active Platform = .../edk2/OvmfPkg/OvmfPkgIa32.dsc >>> ..... done! >>> >>> Fd File Name:OVMF (.../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/OVMF.fd) >>> >>> Generate Region at Offset 0x0 >>> Region Size = 0x40000 >>> Region Name = DATA >>> >>> Generate Region at Offset 0x40000 >>> Region Size = 0x1000 >>> Region Name = None >>> >>> Generate Region at Offset 0x41000 >>> Region Size = 0x1000 >>> Region Name = DATA >>> >>> Generate Region at Offset 0x42000 >>> Region Size = 0x42000 >>> Region Name = None >>> >>> Generate Region at Offset 0x84000 >>> Region Size = 0x348000 >>> Region Name = FV >>> >>> Generating FVMAIN_COMPACT FV >>> >>> Generating PEIFV FV >>> ###### ['GenFv', '-a', >>> '.../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/Ffs/PEIFV.inf', '-o', >>> '.../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/PEIFV.Fv', '-i', >>> '.../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/PEIFV.inf'] >>> Return Value = 2 >>> GenFv: ERROR 0001: Error opening file >>> >>> .../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/Ffs/52C05B14-0B98-496c-BC3B-0 >>> 4B50211D680PeiCore/52C05B14-0B98-496c-BC3B-04B50211D680.ffs >>> >>> >>> >>> >>> build.py... >>> : error 7000: Failed to generate FV >>> >>> >>> >>> build.py... >>> : error 7000: Failed to execute command >>> >>> >>> - Failed - >>> Build end time: 14:06:25, Aug.08 2019 Build total time: 00:00:06 >> >> To be honest, I'm not sure what to ask for, at this point. >> >> - On one hand, this is certainly not ideal. Continuing a manually >> interrupted build should preferably work -- that's a form of >> incremental build. And, it did work in my v3 testing; see bullet (5) in: >> >> http://mid.mail-archive.com/4ea3d3fa-2210-3642-2337-db525312d312@redhat.com >> https://edk2.groups.io/g/devel/message/44246 >> >> (Is this perhaps a regression from the V6 update, which was related to >> incremental builds?) >> >> - On the other hand, this is not necessarily show-stopper, and I'm >> quite out of capacity for testing further versions of this full patch set. >> Perhaps you can work on this issue incrementally -- bugfixes can be >> accepted during the freeze periods. > > I think there are two (independent) circumstances where I would be happy for the support to be included even given this bug: > 1) The parallel autogen is only invoked (at this point in time) when > requested by an explicit command line parameter. > or > 2) The failure is detected and its cause clearly printed for the user. > > From my reading of the above, neither is true. > > At which point, I think we would either make one of those true, or root cause and fix the actual error, in order to be able to accept this into the tree. Regardless of which side of the stable tag. > > I *really* don't want for us to knowingly end up with a build system that "sometimes breaks sporadically and you need to git clean the repository and try again". > >> I don't feel comfortable giving Tested-by or Regression-tested-by in >> this state, but I also won't block the patch set from being merged. >> >> Note that this problem appears repeatable, and it reproduces using >> Python2 as well. It should be possible for you to reproduce and to >> debug. > > It being reproducible by Python 2 is actually really positive, since it suggests Python 3 async i/o is not involved. > >> (4) In this test, I repeated (3), but instead of interrupting the >> build with Ctrl-C, I introduced a syntax error to one of the C source >> files under OvmfPkg (I simply appended the constant "1" to the end of >> the file). >> >> As expected, the build failed (and correctly stopped, too): >> >>> .../edk2/OvmfPkg/VirtioNetDxe/SnpReceive.c:186:1: error: expected >>> identifier or '(' before numeric constant >>> 1 >>> ^ >>> make: *** >>> [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/OvmfPkg/VirtioNetDxe/Virti >>> oNet/OUTPUT/SnpReceive.obj] Error 1 >>> >>> >>> build.py... >>> : error 7000: Failed to execute command >>> make tbuild >>> [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/OvmfPkg/VirtioNetDxe/Virti >>> oNet] >>> >>> >>> build.py... >>> : error F002: Failed to build module >>> .../edk2/OvmfPkg/VirtioNetDxe/VirtioNet.inf [IA32, GCC48, >>> NOOPT] >>> >>> - Failed - >>> Build end time: 14:29:18, Aug.08 2019 Build total time: 00:00:38 >> >> I undid the syntax error, and repeated the "build" command. >> >> The build resumed fine, and produced a functional OVMF binary. Good. > > Not unexpected, but good to have verified. > >> (5) I also verified that changes to C files, made after the build >> completed successfully for the first time, would cause those files to >> be re-built, if the "build" command was repeated. So that's OK too. >> >> ... All in all, I think the series is mature enough to merge, in order >> to expose it to wider testing by the community, with the soft feature >> freeze just around the corner. The main functionality seems to work, >> there don't seem to be show-stoppers. IMO a BaseTools series doesn't >> have to be *perfect* -- as long as it doesn't get in the way of people >> doing their work, it should be possible to improve upon, incrementally. >> Therefore, from my side, I'm willing to give you a (somewhat reserved) >> >> Acked-by: Laszlo Ersek >> >> for the series. >> >> I suggest seeking feedback from the other stewards as well. >> >> To reiterate, the only issue I have found is that the build could not >> be resumed after I interrupted it with Ctrl-C, in section (3). If >> there is consensus to push the v8 series with that, I would suggest >> filing a TianoCore BZ about issue (3) first, and to reference the BZ >> as a "known issue" in the commit message of patch#4 or patch#5. > > I will throw in a transitional > Nacked-by: Leif Lindholm for now. > > If it can happen from a Ctrl-C, it can happen from an OOM-event, a lost network connection, and a bunch of other things. And we could live with a corrupted state causing breakage on next build attempt - but not an opaque breakage. At a minimum, it needs to be clear what has caused the breakage. > > Best Regards, > > Leif > > >