public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Michael D Kinney" <michael.d.kinney@intel.com>
To: "devel@edk2.groups.io" <devel@edk2.groups.io>,
	"lersek@redhat.com" <lersek@redhat.com>,
	"Justen, Jordan L" <jordan.l.justen@intel.com>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>
Cc: "Feng, Bob C" <bob.c.feng@intel.com>,
	"Gao, Liming" <liming.gao@intel.com>
Subject: Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
Date: Mon, 5 Aug 2019 22:01:14 +0000	[thread overview]
Message-ID: <E92EE9817A31E24EB0585FDF735412F5B9D83B8A@ORSMSX113.amr.corp.intel.com> (raw)
In-Reply-To: <8d511f89-f2a8-4d29-65e3-cc76fee12017@redhat.com>

Laszlo,

The context of this change is only to the PatchCheck.py tool.
and how that tool uses git show.

I agree with the summary of very flexible capabilities in git
to help developers review different types of files.  All of 
those settings that were added to support UNI files in UTF-16
file format were very valuable when we had to review the
text changes to those binary files.  We should be using UTF-8
now, and we can even extend PatchCheck.py to flag an error if
a UNI file is in UTF-16 format.

I still prefer we make this change only to PatchCheck.py to
prevent false positives and print lines of text that can 
not be found in a developer's working directory.  I prefer
this one time change to PatchCheck.py instead of having to
update .gitattributes whenever the git show features are 
extended to convert more binary files to text files.

My expectation is that EDK II Maintainers need to review 
if a binary file is ok or not for a repo.  I would also be
ok with adding general rules to PatchCheck.py to generate
an error if a binary file is added/updated in one of the
text only repos (edk2, edk2-platforms) and not generate
an error if a binary file is added/updated in a repo that
supports binaries (edk-non-osi).

Best regards,

Mike


> -----Original Message-----
> From: devel@edk2.groups.io
> [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek
> Sent: Friday, August 2, 2019 4:58 PM
> To: Justen, Jordan L <jordan.l.justen@intel.com>;
> Kinney, Michael D <michael.d.kinney@intel.com>;
> devel@edk2.groups.io
> Cc: Feng, Bob C <bob.c.feng@intel.com>; Gao, Liming
> <liming.gao@intel.com>
> Subject: Re: [edk2-devel] [Patch 3/3]
> BaseTools/PatchCheck: Disable text conversion in 'git
> show'
> 
> On 08/02/19 20:29, Jordan Justen wrote:
> > First, I hope we don't add lots of large .pdf files
> into the source
> > tree. I see two duplicated > 200k .pdf files in edk2,
> which seems like
> > a waste of space in the edk2 tree.
> >
> > BaseTools/Source/C/BrotliCompress/docs/brotli-
> comparison-study-2015-09
> > -22.pdf
> >
> MdeModulePkg/Library/BrotliCustomDecompressLib/docs/bro
> tli-comparison-
> > study-2015-09-22.pdf
> >
> > Second, I'm not too sure about this change. It seems
> like it might
> > have unintended consequences.
> >
> > One thought I had is that it might break UTF-16
> unicode files diffs,
> > but luckily I think we've pretty much gotten rid of
> UTF-16 files at
> > this point.
> >
> > I also wonder if adding a .pdf attribute like Laszlo
> recommends for
> > .efi might be an alternative solution.
> >
> >
> https://github.com/tianocore/tianocore.github.io/wiki/L
> aszlo's-unkempt
> > -git-guide-for-edk2-contributors-and-
> maintainers#contrib-09
> >
> > We could actually add these setting in the git tree
> in a
> > .gitattributes file, right? Has this been suggested?
> I think Laszlo
> > documented this before we had converted edk2 to git.
> 
> This is a complex question.
> 
> There are files which are binary for all intents and
> purposes, so you never want to see a "textualized"
> version of them. Git can attempt recognizing these
> files, but it can take hints too.
> 
> For these files, once they are safely classified as
> "binary", it is a separate question whether you want a
> changeset shown (or
> email-formatted) for these files to contain base64-like
> binary patches, or just a statement that "binary files
> differ".
> 
> Then there are files that are binary, but can be
> "textualized" for the purposes of diffing and even
> merging. Automated UCS2<->UTF8 conversion is an
> example, and I used that while edk2 still had UCS2-
> encoded UNI files.
> 
> Some examples (not claiming this is going to be an
> exhaustive list -- for that, see gitattributes(5)), to
> be placed in .git/info/attributes (local clone only) or
> .gitattributes (tracked in the repo):
> 
> *.efi -diff
> 
> This classifies *.efi as binary (turns off the
> heuristics).
> 
> Consider gitattributes(5) -- excerpt:
> 
>        diff
>            Unset
>                A path to which the diff attribute is
> unset will
>                generate Binary files differ (or a
> binary patch, if
>                binary patches are enabled).
>            Unspecified
>                A path to which the diff attribute is
> unspecified first
>                gets its contents inspected, and if it
> looks like text
>                and is smaller than
> core.bigFileThreshold, it is
>                treated as text. Otherwise it would
> generate Binary
>                files differ.
> 
> git-show will only report "binary files differ", unless
> we pass --binary to it; in which case it will offer a
> base64-like binary patch.
> git-format-patch will produce that binary patch by
> default, unless we pass --no-binary to it.
> 
> 
> Another example, in .git/info/attributes or
> .gitattributes:
> 
> *.uni diff=uni merge=uni
> 
> The manual says:
> 
>        diff
>            String
>                Diff is shown using the specified diff
> driver. Each
>                driver may specify one or more options,
> as described in
>                the following section. The options for
> the diff driver
>                "foo" are defined by the configuration
> variables in the
>                "diff.foo" section of the Git config
> file.
> 
>        merge
>            String
>                3-way merge is performed using the
> specified custom
>                merge driver. The built-in 3-way merge
> driver can be
>                explicitly specified by asking for
> "text" driver; the
>                built-in "take the current branch"
> driver can be
>                requested with "binary".
> 
> Accordingly, from .git/config:
> 
> [diff "uni"]
>       textconv = iconv -f UCS-2 -t UTF-8
>       binary = true
> 
> [merge "uni"]
>       name = merge driver for UCS-2 encoded text files
>       driver = uni-merge %O %A %B
> 
> And my uni-merge script:
> 
> ---------
> #!/bin/bash
> 
> # Merge UCS-2 encoded text files.
> # This is a git-merge driver. It will overwrite
> "$MINE".
> #
> # Exit status:
> # 0 -- no conflicts
> # 1 -- conflicts
> # 2 -- trouble
> 
> OLD="$1"
> MINE="$2"
> YOURS="$3"
> 
> # Set safe defaults for the trap handler.
> OLD_UTF8=
> MY_UTF8=
> YOUR_UTF8=
> NEW_UTF8=
> EX=2
> 
> trap 'rm -f -- "$OLD_UTF8" "$MY_UTF8" "$YOUR_UTF8"
> "$NEW_UTF8"; exit $EX' EXIT set -e -u -C
> 
> OLD_UTF8=$(mktemp)
> iconv --from UCS-2 --to UTF-8 <"$OLD" >>"$OLD_UTF8"
> 
> MY_UTF8=$(mktemp)
> iconv --from UCS-2 --to UTF-8 <"$MINE" >>"$MY_UTF8"
> 
> YOUR_UTF8=$(mktemp)
> iconv --from UCS-2 --to UTF-8 <"$YOURS" >>"$YOUR_UTF8"
> 
> NEW_UTF8=$(mktemp)
> set +e
> diff3 --merge --label=mine --label=old --label=yours --
> \
>     "$MY_UTF8" "$OLD_UTF8" "$YOUR_UTF8" >>"$NEW_UTF8"
> DIFFRET=$?
> set -e
> 
> if [ 0 -ne "$DIFFRET" ] && [ 1 -ne "$DIFFRET" ]; then
>   exit
> fi
> 
> iconv --from UTF-8 --to UCS-2 <"$NEW_UTF8" >|"$MINE"
> EX=$DIFFRET
> ---------
> 
> My understanding is that the "--no-textconv" option
> would undo the "textconv" that I set under [diff
> "uni"].
> 
> In other words, it tells git that, even if the
> heuristics suggested
> *some* conversion to text (I can't say what that would
> be, in the general case), git should still treat the
> file as binary. (At which point the --no-binary / --
> binary options decide whether we get a base64-like
> encoding, or just a message saying "binary files
> differ").
> Therefore I think (= agree with Jordan) that the effect
> is the same as marking the affected file types in
> .gitattributes as "-diff".
> 
> I guess the relevant question is scope. "--no-textconv"
> doesn't require us to assign "-diff" to specific file
> suffixes, such as *.pdf, *.efi, and so on. But, it also
> doesn't allow us to specify *some* text conversions,
> even if we wanted to (it would override & disable them,
> such as my "iconv" conversion when diffing UCS2 encoded
> UNI files).
> 
> Adding a .gitattributes file to the root of the
> repository, and updating it whenever necessary (for
> example when we add the first .pdf file, the first .efi
> file...), is likely the best.
> 
> (
> 
> My recommendation at
> <https://github.com/tianocore/tianocore.github.io/wiki/
> Laszlo's-unkempt-git-guide-for-edk2-contributors-and-
> maintainers#contrib-09>
> may have targeted the local ".git/info/attributes" file
> because people can have quite different opinions on how
> git should act on various files. In theory, someone
> having simplistic PDF files might even want:
> 
> In the attributes:
> 
> *.pdf diff=pdf
> 
> In the config:
> 
> [diff "pdf"]
>   textconv = sh -c 'pdftotext -- "$1" -' pdftotext-
> wrapper
>   binary = true
> 
> ("pdftotext" converts PDF to text, but its command line
> is awkward -- the hyphen marking stdout must be
> explicit, and must come as second operand. Hence the
> wrapping into "sh" -- and then we need to spell out
> $0 as well, for which I picked "pdftotext-wrapper". The
> "--" separator is a safety measure against filenames
> starting with "-", which could be mis-interpreted as
> options. "--" terminates the option list and starts the
> operand list.)
> 
> )
> 
> Thanks
> Laszlo
> 
> 
> > On 2019-08-01 17:13:14, Michael D Kinney wrote:
> >> https://bugzilla.tianocore.org/show_bug.cgi?id=2044
> >>
> >> 'git show' is used to extrat the patch contents for
> analysis.
> >> Add the flag '--no-textconv' to the 'git show'
> command to disable the
> >> conversion from some binary file types to text
> content.
> >>
> >> Without this change, binary files such as .pdf files
> are converted to
> >> text in the show command and PatchCheck complains
> that the wrong line
> >> endings are used in the patch.
> >>
> >> Cc: Bob Feng <bob.c.feng@intel.com>
> >> Cc: Liming Gao <liming.gao@intel.com>
> >> Cc: Jordan Justen <jordan.l.justen@intel.com>
> >> Signed-off-by: Michael D Kinney
> <michael.d.kinney@intel.com>
> >> ---
> >>  BaseTools/Scripts/PatchCheck.py | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/BaseTools/Scripts/PatchCheck.py
> >> b/BaseTools/Scripts/PatchCheck.py index
> 6b07241bfe..2a4e6f603e 100755
> >> --- a/BaseTools/Scripts/PatchCheck.py
> >> +++ b/BaseTools/Scripts/PatchCheck.py
> >> @@ -543,7 +543,7 @@ class CheckGitCommits:
> >>
> >>      def read_patch_from_git(self, commit):
> >>          # Run git to get the commit patch
> >> -        return self.run_git('show', '--
> pretty=email', commit)
> >> +        return self.run_git('show', '--
> pretty=email',
> >> + '--no-textconv', commit)
> >>
> >>      def run_git(self, *args):
> >>          cmd = [ 'git' ]
> >> --
> >> 2.21.0.windows.1
> >>
> 
> 
> 


  reply	other threads:[~2019-08-05 22:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-02  0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
2019-08-02  0:13 ` [Patch 1/3] BaseTools/PatchCheck: Ignore blank lines in diff Michael D Kinney
2019-08-02  0:13 ` [Patch 2/3] BaseTools/PatchCheck: Add copy from/to keywords Michael D Kinney
2019-08-02  0:13 ` [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show' Michael D Kinney
2019-08-02 18:29   ` Jordan Justen
2019-08-02 18:51     ` Michael D Kinney
2019-08-02 23:57     ` Laszlo Ersek
2019-08-05 22:01       ` Michael D Kinney [this message]
2019-08-07 13:26         ` [edk2-devel] " Laszlo Ersek
2019-08-07 19:42         ` Jordan Justen
2019-08-08 20:57           ` Laszlo Ersek
2019-08-08 21:08             ` Michael D Kinney
2019-08-08 21:15               ` Andrew Fish
2019-08-08 22:34                 ` Laszlo Ersek
2019-08-02  7:59 ` [edk2-devel] [Patch 0/3] BaseTools/PatchCheck: Fix false positives Bob Feng
2019-08-02 18:30 ` Jordan Justen
2019-08-07 11:41 ` Liming Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E92EE9817A31E24EB0585FDF735412F5B9D83B8A@ORSMSX113.amr.corp.intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox