public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Michael D Kinney" <michael.d.kinney@intel.com>
To: Ard Biesheuvel <ardb@kernel.org>,
	"devel@edk2.groups.io" <devel@edk2.groups.io>
Cc: Leif Lindholm <quic_llindhol@quicinc.com>,
	Rebecca Cran <rebecca@bsdio.com>,
	Liming Gao <gaoliming@byosoft.com.cn>,
	"Feng, Bob C" <bob.c.feng@intel.com>,
	"Chen, Christine" <yuwei.chen@intel.com>,
	"Michael Kubacki" <mikuback@linux.microsoft.com>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>
Subject: Re: [edk2-devel] [Patch 1/1] BaseTools/Scripts/PatchCheck: Error if commit modifies multiple packages
Date: Sun, 18 Feb 2024 03:36:38 +0000	[thread overview]
Message-ID: <CO1PR11MB4929FCF13B25582BC57D4638D2522@CO1PR11MB4929.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CO1PR11MB4929D64E6B9B90B91AD150AAD24D2@CO1PR11MB4929.namprd11.prod.outlook.com>

Hi Ard,

I did some experiments with the 2 methods discussed earlier in 
this thread to disable the specific check being discussed in this
patch. Details of the experiments and analysis of the results 
are included at the end of the email.

1) Use GitHub PR Label
2) Add a special tags to a commit message

My conclusion is that using a PR Label to control CI actions is
not a good approach due to some behaviors in GitHub and that I
recommend that special tags be used in commit messages as a method
to enable/disable specific CI check options.  There is actually
precedence for use of commit message tags in GitHub to skip CI
workflow runs:

https://docs.github.com/en/actions/managing-workflow-runs/skipping-workflow-runs

The proposal is to define a tag name 'Continuous-integration-options:'
that can appear on one or more lines of the commit message and the tag 
value is a list of one or more CI check options. This defines a mechanism
that can be extended as needed. For the specific check under discussion
here, I am proposing an option value of 'PatchCheck.ignore-multi-package'
An example commit message line to disable the multiple package check
for the one commit would be:

    Continuous-integration-options: PatchCheck.ignore-multi-package

When PatchCheck evaluates the commit message, if this tag/value is
present, then the multiple package check is still run, but the log
shows results as WARNING messages, and no errors are raised and CI 
receives a PASS result.

Please provide feedback on this updated proposal. 

Thanks,

Mike

========================================================

GitHub PR Label Experiments
============================
An experiment was done to use a GitHub PR Label to enable/disable the
multiple package check. PatchCheck was ported to a GitHub Action for
this experiment and a label with name 'ignore-multi-package' is used 
to disable the PatchCheck multiple packages check.  PatchCheck is updated
to support a --ignore-multi-option flag that is passed into PatchCheck 
when the label is set.

Description        Result  Label Set PR Link
------------------ ------- --------- ----------------------------------------
Modify 1 package    PASS       NO    https://github.com/mdkinney/edk2/pull/15
Modify 1 package    PASS      YES    https://github.com/mdkinney/edk2/pull/15
Modify 2 packages   FAIL       NO    https://github.com/mdkinney/edk2/pull/16
Modify 2 packages   PASS      YES    https://github.com/mdkinney/edk2/pull/17

Main Branch:
* https://github.com/mdkinney/edk2/tree/CiEnabled
Patch Check with --ignore-multi-package option:
* https://github.com/mdkinney/edk2/blob/CiEnabled/BaseTools/Scripts/PatchCheck.py
GitHub Action Files:
* https://github.com/mdkinney/edk2/blob/CiEnabled/.github/workflows/patchcheck.yml
* https://github.com/mdkinney/edk2/blob/CiEnabled/.github/workflows/patchcheck_call.yml

The expected results were achieved where the label has no impact on a 
PR with commits that only modifies a single package, and the label being
set can change a PR from FAIL->PASS if one or more of the commits in the
PR modify multiple packages.

The disadvantages of this approach are:
1) The scope of the label is the entire PR for all commits instead of
   the specific commits that require the check to be skipped.
2) The information to skip a specific check is only in the GitHub PR
   in the form of a label being set. The git commit history has no information
   on the label settings. This can cause side effects if downstream consumers
   use some of the edk2 CI checkers and may then see unexpected CI failures.
3) CI is run again when commits are merged with a 'push' action. It was
   not verified, but the label information is not provided when a push 
   is performed, so the result of using labels to change CI behavior
   may be a PASS condition when the PR is being reviewed, and a FAIL
   condition when the commits from the PR are merged. This is not an expected
   or acceptable state of the main branch.
4) The GitHub action must add 'labeled' and 'unlabeled' PR types for
   the GitHub action to run when a label is added or removed. There is
   no mechanism to filter based on which specific label was added or
   removed. This means CI must be invoked each time a label is modified
   in a PR, even for labels that are not related to CI behavior. This is
   not efficient for CI and implies that this mechanism does not scale
   well if several of these CI option labels are added over time.
   * Experiments were run to filter based in label inside the GitHub
     Action itself. This *only* worked if a CI related label is modified.
     If a non-CI related label is modified, the GitHub Action is skipped,
     and the result is PASS even if the result of the last run was FAIL.
     This would allow merges of PRs with known issues.
   * The other option is to remove 'labeled' and 'unlabeled' events from
     the GitHub Action. This can work, but has an unexpected behavior from
     the maintainer perspective. Setting a label does not re-run CI.
     Instead, the label has to be set, then either the PR must be closed
     and re-opened, or additional commits have to be added, or commits
     must be force pushed in order for the CI check to be re-run with 
     the label changes. This also implies that that PASS/FAIL state of
     the PR can be out of sync with the current label settings if those
     additional actions are not performed after the labels are modified.
 
Commit Message Tag Experiments
==============================
An experiment was done to use a commit message tag/value to disable the
multiple package check on only the commits that contain a matching
tag/value.  The tag/value used is:

Continuous-Integration-Options: PatchCheck.ignore-multi-package

Description        Result  Tag Present PR Link
------------------ ------- ----------- ----------------------------------------
Modify 1 package    PASS       NO      https://github.com/mdkinney/edk2/pull/30
Modify 2 packages   FAIL       NO      https://github.com/mdkinney/edk2/pull/28
Modify 2 packages   PASS      YES      https://github.com/mdkinney/edk2/pull/29

Main Branch:
* https://github.com/mdkinney/edk2/tree/CommitMessageCiFilter
Patch Check with Continuous-Integration-Options tag parsing
* https://github.com/mdkinney/edk2/blob/CommitMessageCiFilter/BaseTools/Scripts/PatchCheck.py
GitHub Action File:
* https://github.com/mdkinney/edk2/blob/CommitMessageCiFilter/.github/workflows/patchcheck.yml

The expected results were achieved when adding the tag/value to the 
commit message of a commit that modifies multiple packages changes
the PR from FAIL->PASS.

The disadvantages of this approach are:
1) Adds CI specific tags to commit messages and commit history
2) Requires author or maintainer to update commit message with a
   CI specific tag/value if a multiple package check fails and is 
   considered a false positive.

The advantages of this approach are:
1) The scope of the tag/value is a single commit. Not the entire PR.
2) The information about what CI checks to skip are part of the 
   commit message and the commit history so the information can be
   reused if needed in downstream CI.
3) The information is in the commit message, so it is available for both
   PR CI checks and 'push' CI checks.
4) No new behavior from a maintainer perspective on how/when CI checks
   are run and when the status if CI checks are available.

==================================================================

> -----Original Message-----
> From: Kinney, Michael D <michael.d.kinney@intel.com>
> Sent: Wednesday, February 14, 2024 4:49 PM
> To: Ard Biesheuvel <ardb@kernel.org>; devel@edk2.groups.io
> Cc: Leif Lindholm <quic_llindhol@quicinc.com>; Rebecca Cran
> <rebecca@bsdio.com>; Liming Gao <gaoliming@byosoft.com.cn>; Feng, Bob C
> <bob.c.feng@intel.com>; Chen, Christine <yuwei.chen@intel.com>; Michael
> Kubacki <mikuback@linux.microsoft.com>; Kinney, Michael D
> <michael.d.kinney@intel.com>
> Subject: RE: [edk2-devel] [Patch 1/1] BaseTools/Scripts/PatchCheck:
> Error if commit modifies multiple packages
> 
> Hi Ard,
> 
> I agree we do not want to code to be worse.  Which I interpret
> in this context as introducing extra commits that reduce the
> readability of the commits.
> 
> We also need to consider the ease of review of patches and clear
> responsibility for providing reviews.  Especially when we have
> different reviewers/maintainers for different packages. It makes
> it easier to review a patch when patches do not cross package
> boundaries.  Otherwise, what does a Reviewed-by mean if you
> only have context for a portion of the patch?  Does that mean
> that reviewer is confident in all the changes including those
> they may not have any experience with?
> 
> That means there is some tension between readability of commits
> and code review of commits.
> 
> The other topic brought up in this discussion is bisect.
> 
> I understand the value of bisect to help identify the source
> of a bug or behavior change.
> 
> However, the current EDK II CI system does not test at the
> granularity of single commits.  Instead, it tests the entire
> patch series in a PR.
> 
> I have also observed some maintainers combining multiple
> patch series into a single PR.
> 
> I also suspect that there are many patch series in the edk2
> commit history that would not work if bisect was run across
> them today.
> 
> What this means in practice with the current edk2 repository
> history is that we do not know if every commit can be built
> and run. We only know that the EDK II CI tests that are run
> against PRs have passed.
> 
> We do expect build/run at the granularity of a PR merge today.
> However, there is no information in the git history to know
> where the PR merges occur.  If that extra bit of information
> was available, then bisect could select the commits at PR
> merge boundaries to look for bug/behavior change.  The history
> of merged PRs is in github using the following query.
> 
> https://github.com/tianocore/edk2/pulls?q=is%3Apr+is%3Aclosed+is%3Amerg
> ed
> 
> I imagine the set of sha hashes at PR boundaries could be extracted
> from this query and bisect could select from that set of sha hashes
> that are a subset of the total set of sha hashes.  Biset would
> identify the specific PR merge where the bug or behavior change was
> introduced.
> 
> Given the current state of the edk2 repo history, the concern about
> splitting up commits on package boundaries that may cause a bisect
> failure at a commit boundary within a single patch series may not be
> something that needs to be considered.
> 
> A refinement to the proposal is to require patches to not cross
> package boundaries and authors simply need to split into multiple
> commits based on package boundaries without considering bisect within
> a single patch series. That prevents adding extra patches that make
> the changes hard to understand and has the additional benefit of
> making the patch series easier to review and clear ownership for
> reviewing each patch.
> 
> Best regards,
> 
> Mike
> 
> > -----Original Message-----
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Sent: Wednesday, February 14, 2024 3:48 PM
> > To: devel@edk2.groups.io; Kinney, Michael D
> > <michael.d.kinney@intel.com>
> > Cc: Leif Lindholm <quic_llindhol@quicinc.com>; Rebecca Cran
> > <rebecca@bsdio.com>; Liming Gao <gaoliming@byosoft.com.cn>; Feng, Bob
> C
> > <bob.c.feng@intel.com>; Chen, Christine <yuwei.chen@intel.com>;
> Michael
> > Kubacki <mikuback@linux.microsoft.com>
> > Subject: Re: [edk2-devel] [Patch 1/1] BaseTools/Scripts/PatchCheck:
> > Error if commit modifies multiple packages
> >
> > On Thu, 15 Feb 2024 at 00:27, Michael D Kinney
> > <michael.d.kinney@intel.com> wrote:
> > >
> > > Hi Ard,
> > >
> > > Using commit message does provide granularity down to a single
> > commit,
> > > which may be better than PR label which applies across all commits
> > > in a series.
> > >
> > > However, a flag in the commit message can be set by the author who
> > may
> > > not be a maintainer and maintainers would then need to review
> commit
> > > messages for incorrect use of those flags.
> > >
> > > Only maintainers/admins can set labels, so from a permission
> > management
> > > perspective the PR label has an advantage.
> > >
> > > Additional comments below.  There are ways to support bisect and
> meet
> > > the proposed checks. The suggestion uses techniques that Laszlo
> > helped
> > > me with in the past when working on issues like there. I have seen
> > more
> > > complex scenarios than the examples listed below, and have been
> able
> > to
> > > figure out a path through.
> > >
> > > I would agree it is extra work to think about these when working on
> > > the code changes and extra work to reformulate the patches when
> these
> > > conditions are encountered.
> > >
> > > I just want to be clear on the objections.  It is not about if the
> > patches
> > > can be organized to follow this proposal.  The objection is about
> the
> > > extra work required to reformulate patches.
> > >
> >
> > No.
> >
> > The objection is fundamentally about having to appease the CI even if
> > doing so is unreasonable. I don't mind a bit of extra work. I do mind
> > having to make code changes that make the code worse just to tick a
> CI
> > box. (This is why I disabled uncrustify for the ARM packages: many
> > header files which were perfectly legible before were converted into
> a
> > jumble of alphabet soup because uncrustify removed all of the
> > indentation)
> >
> > It is about having the discretion to deviate from the rules in the
> odd
> > case where the cure is worse than the disease.


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#115566): https://edk2.groups.io/g/devel/message/115566
Mute This Topic: https://groups.io/mt/104345509/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



  reply	other threads:[~2024-02-18  3:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-14  1:17 [edk2-devel] [Patch 1/1] BaseTools/Scripts/PatchCheck: Error if commit modifies multiple packages Michael D Kinney
2024-02-14  7:11 ` Ard Biesheuvel
2024-02-14 15:51   ` Michael D Kinney
2024-02-14 15:59     ` Ard Biesheuvel
2024-02-14 16:32       ` Leif Lindholm
2024-02-14 17:16         ` Michael D Kinney
2024-02-14 21:54           ` Ard Biesheuvel
2024-02-14 23:27             ` Michael D Kinney
2024-02-14 23:47               ` Ard Biesheuvel
2024-02-15  0:49                 ` Michael D Kinney
2024-02-18  3:36                   ` Michael D Kinney [this message]
2024-02-18  9:35                     ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CO1PR11MB4929FCF13B25582BC57D4638D2522@CO1PR11MB4929.namprd11.prod.outlook.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox