public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [Patch 0/3] BaseTools/PatchCheck: Fix false positives
@ 2019-08-02  0:13 Michael D Kinney
  2019-08-02  0:13 ` [Patch 1/3] BaseTools/PatchCheck: Ignore blank lines in diff Michael D Kinney
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Michael D Kinney @ 2019-08-02  0:13 UTC (permalink / raw)
  To: devel; +Cc: Bob Feng, Liming Gao, Jordan Justen

https://bugzilla.tianocore.org/show_bug.cgi?id=2044

* Add '--no-textconv' to 'git show' command to disable binary->text conversion.
* Ignore blank lines in patch
* Ignore lines with 'copy from ' and 'copy to ' prefixes

Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>

Michael D Kinney (3):
  BaseTools/PatchCheck: Ignore blank lines in diff
  BaseTools/PatchCheck: Add copy from/to keywords
  BaseTools/PatchCheck: Disable text conversion in 'git show'

 BaseTools/Scripts/PatchCheck.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.21.0.windows.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Patch 1/3] BaseTools/PatchCheck: Ignore blank lines in diff
  2019-08-02  0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
@ 2019-08-02  0:13 ` Michael D Kinney
  2019-08-02  0:13 ` [Patch 2/3] BaseTools/PatchCheck: Add copy from/to keywords Michael D Kinney
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Michael D Kinney @ 2019-08-02  0:13 UTC (permalink / raw)
  To: devel; +Cc: Bob Feng, Liming Gao, Jordan Justen

https://bugzilla.tianocore.org/show_bug.cgi?id=2044

When fixes are made for incorrect line endings, there
are cases where the diff contains blank lines.  Ignore
these blank lines.

Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>
---
 BaseTools/Scripts/PatchCheck.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/BaseTools/Scripts/PatchCheck.py b/BaseTools/Scripts/PatchCheck.py
index 6aec15d0f0..4f2ef81ae7 100755
--- a/BaseTools/Scripts/PatchCheck.py
+++ b/BaseTools/Scripts/PatchCheck.py
@@ -270,6 +270,7 @@ class GitDiffCheck:
             if line.startswith('@@ '):
                 self.state = PRE_PATCH
             elif len(line) >= 1 and line[0] not in ' -+' and \
+                 not line.startswith('\r\n') and  \
                  not line.startswith(r'\ No newline ') and not self.binary:
                 for line in self.lines[self.line_num + 1:]:
                     if line.startswith('diff --git'):
@@ -313,6 +314,8 @@ class GitDiffCheck:
                 pass
             elif line.startswith('+'):
                 self.check_added_line(line[1:])
+            elif line.startswith('\r\n'):
+                pass
             elif line.startswith(r'\ No newline '):
                 pass
             elif not line.startswith(' '):
-- 
2.21.0.windows.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch 2/3] BaseTools/PatchCheck: Add copy from/to keywords
  2019-08-02  0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
  2019-08-02  0:13 ` [Patch 1/3] BaseTools/PatchCheck: Ignore blank lines in diff Michael D Kinney
@ 2019-08-02  0:13 ` Michael D Kinney
  2019-08-02  0:13 ` [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show' Michael D Kinney
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Michael D Kinney @ 2019-08-02  0:13 UTC (permalink / raw)
  To: devel; +Cc: Bob Feng, Liming Gao, Jordan Justen

https://bugzilla.tianocore.org/show_bug.cgi?id=2044

When files are very similar, git will copy an existing
file to a new location and then apply differences.  This
is operation identified in the diff with 'copy from' and
'copy to' lines that need to be ignored.

Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>
---
 BaseTools/Scripts/PatchCheck.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/BaseTools/Scripts/PatchCheck.py b/BaseTools/Scripts/PatchCheck.py
index 4f2ef81ae7..6b07241bfe 100755
--- a/BaseTools/Scripts/PatchCheck.py
+++ b/BaseTools/Scripts/PatchCheck.py
@@ -331,6 +331,8 @@ class GitDiffCheck:
         'old mode ',
         'new mode ',
         'similarity index ',
+        'copy from ',
+        'copy to ',
         'rename ',
         )
 
-- 
2.21.0.windows.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-02  0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
  2019-08-02  0:13 ` [Patch 1/3] BaseTools/PatchCheck: Ignore blank lines in diff Michael D Kinney
  2019-08-02  0:13 ` [Patch 2/3] BaseTools/PatchCheck: Add copy from/to keywords Michael D Kinney
@ 2019-08-02  0:13 ` Michael D Kinney
  2019-08-02 18:29   ` Jordan Justen
  2019-08-02  7:59 ` [edk2-devel] [Patch 0/3] BaseTools/PatchCheck: Fix false positives Bob Feng
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Michael D Kinney @ 2019-08-02  0:13 UTC (permalink / raw)
  To: devel; +Cc: Bob Feng, Liming Gao, Jordan Justen

https://bugzilla.tianocore.org/show_bug.cgi?id=2044

'git show' is used to extrat the patch contents for analysis.
Add the flag '--no-textconv' to the 'git show' command to
disable the conversion from some binary file types to text
content.

Without this change, binary files such as .pdf files are
converted to text in the show command and PatchCheck complains
that the wrong line endings are used in the patch.

Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>
---
 BaseTools/Scripts/PatchCheck.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/BaseTools/Scripts/PatchCheck.py b/BaseTools/Scripts/PatchCheck.py
index 6b07241bfe..2a4e6f603e 100755
--- a/BaseTools/Scripts/PatchCheck.py
+++ b/BaseTools/Scripts/PatchCheck.py
@@ -543,7 +543,7 @@ class CheckGitCommits:
 
     def read_patch_from_git(self, commit):
         # Run git to get the commit patch
-        return self.run_git('show', '--pretty=email', commit)
+        return self.run_git('show', '--pretty=email', '--no-textconv', commit)
 
     def run_git(self, *args):
         cmd = [ 'git' ]
-- 
2.21.0.windows.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 0/3] BaseTools/PatchCheck: Fix false positives
  2019-08-02  0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
                   ` (2 preceding siblings ...)
  2019-08-02  0:13 ` [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show' Michael D Kinney
@ 2019-08-02  7:59 ` Bob Feng
  2019-08-02 18:30 ` Jordan Justen
  2019-08-07 11:41 ` Liming Gao
  5 siblings, 0 replies; 17+ messages in thread
From: Bob Feng @ 2019-08-02  7:59 UTC (permalink / raw)
  To: devel@edk2.groups.io, Kinney, Michael D; +Cc: Gao, Liming, Justen, Jordan L

For the patch set,
Reviewed-by: Bob Feng <bob.c.feng@intel.com>

-----Original Message-----
From: devel@edk2.groups.io [mailto:devel@edk2.groups.io] On Behalf Of Michael D Kinney
Sent: Friday, August 2, 2019 8:13 AM
To: devel@edk2.groups.io
Cc: Feng, Bob C <bob.c.feng@intel.com>; Gao, Liming <liming.gao@intel.com>; Justen, Jordan L <jordan.l.justen@intel.com>
Subject: [edk2-devel] [Patch 0/3] BaseTools/PatchCheck: Fix false positives

https://bugzilla.tianocore.org/show_bug.cgi?id=2044

* Add '--no-textconv' to 'git show' command to disable binary->text conversion.
* Ignore blank lines in patch
* Ignore lines with 'copy from ' and 'copy to ' prefixes

Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <liming.gao@intel.com>
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>

Michael D Kinney (3):
  BaseTools/PatchCheck: Ignore blank lines in diff
  BaseTools/PatchCheck: Add copy from/to keywords
  BaseTools/PatchCheck: Disable text conversion in 'git show'

 BaseTools/Scripts/PatchCheck.py | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.21.0.windows.1





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-02  0:13 ` [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show' Michael D Kinney
@ 2019-08-02 18:29   ` Jordan Justen
  2019-08-02 18:51     ` Michael D Kinney
  2019-08-02 23:57     ` Laszlo Ersek
  0 siblings, 2 replies; 17+ messages in thread
From: Jordan Justen @ 2019-08-02 18:29 UTC (permalink / raw)
  To: Michael D Kinney, devel; +Cc: Bob Feng, Liming Gao, Laszlo Ersek

First, I hope we don't add lots of large .pdf files into the source
tree. I see two duplicated > 200k .pdf files in edk2, which seems like
a waste of space in the edk2 tree.

BaseTools/Source/C/BrotliCompress/docs/brotli-comparison-study-2015-09-22.pdf
MdeModulePkg/Library/BrotliCustomDecompressLib/docs/brotli-comparison-study-2015-09-22.pdf

Second, I'm not too sure about this change. It seems like it might
have unintended consequences.

One thought I had is that it might break UTF-16 unicode files diffs,
but luckily I think we've pretty much gotten rid of UTF-16 files at
this point.

I also wonder if adding a .pdf attribute like Laszlo recommends for
.efi might be an alternative solution.

https://github.com/tianocore/tianocore.github.io/wiki/Laszlo's-unkempt-git-guide-for-edk2-contributors-and-maintainers#contrib-09

We could actually add these setting in the git tree in a
.gitattributes file, right? Has this been suggested? I think Laszlo
documented this before we had converted edk2 to git.

-Jordan

On 2019-08-01 17:13:14, Michael D Kinney wrote:
> https://bugzilla.tianocore.org/show_bug.cgi?id=2044
> 
> 'git show' is used to extrat the patch contents for analysis.
> Add the flag '--no-textconv' to the 'git show' command to
> disable the conversion from some binary file types to text
> content.
> 
> Without this change, binary files such as .pdf files are
> converted to text in the show command and PatchCheck complains
> that the wrong line endings are used in the patch.
> 
> Cc: Bob Feng <bob.c.feng@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>
> ---
>  BaseTools/Scripts/PatchCheck.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/BaseTools/Scripts/PatchCheck.py b/BaseTools/Scripts/PatchCheck.py
> index 6b07241bfe..2a4e6f603e 100755
> --- a/BaseTools/Scripts/PatchCheck.py
> +++ b/BaseTools/Scripts/PatchCheck.py
> @@ -543,7 +543,7 @@ class CheckGitCommits:
>  
>      def read_patch_from_git(self, commit):
>          # Run git to get the commit patch
> -        return self.run_git('show', '--pretty=email', commit)
> +        return self.run_git('show', '--pretty=email', '--no-textconv', commit)
>  
>      def run_git(self, *args):
>          cmd = [ 'git' ]
> -- 
> 2.21.0.windows.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 0/3] BaseTools/PatchCheck: Fix false positives
  2019-08-02  0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
                   ` (3 preceding siblings ...)
  2019-08-02  7:59 ` [edk2-devel] [Patch 0/3] BaseTools/PatchCheck: Fix false positives Bob Feng
@ 2019-08-02 18:30 ` Jordan Justen
  2019-08-07 11:41 ` Liming Gao
  5 siblings, 0 replies; 17+ messages in thread
From: Jordan Justen @ 2019-08-02 18:30 UTC (permalink / raw)
  To: Michael D Kinney, devel; +Cc: Bob Feng, Liming Gao

On 2019-08-01 17:13:11, Michael D Kinney wrote:
> https://bugzilla.tianocore.org/show_bug.cgi?id=2044
> 
> * Add '--no-textconv' to 'git show' command to disable binary->text conversion.
> * Ignore blank lines in patch
> * Ignore lines with 'copy from ' and 'copy to ' prefixes
> 
> Cc: Bob Feng <bob.c.feng@intel.com>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>
> 
> Michael D Kinney (3):
>   BaseTools/PatchCheck: Ignore blank lines in diff
>   BaseTools/PatchCheck: Add copy from/to keywords

patches 1 and 2: Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

>   BaseTools/PatchCheck: Disable text conversion in 'git show'
> 
>  BaseTools/Scripts/PatchCheck.py | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> -- 
> 2.21.0.windows.1
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-02 18:29   ` Jordan Justen
@ 2019-08-02 18:51     ` Michael D Kinney
  2019-08-02 23:57     ` Laszlo Ersek
  1 sibling, 0 replies; 17+ messages in thread
From: Michael D Kinney @ 2019-08-02 18:51 UTC (permalink / raw)
  To: Justen, Jordan L, devel@edk2.groups.io, Kinney, Michael D
  Cc: Feng, Bob C, Gao, Liming, Laszlo Ersek

Jordan,

I agree we should not adding PDFs or other binary files
to the edk2 or edk2-platforms repos.

The PDF was just an example, and it looks like git has some
auto conversion behavior in the show command and that could
impact binary file types other than pdf.  This patch only
removes the false positives that are difficult to understand
because I could not find the text it was complaining about.

I also want PatchCheck to be used to check the commit
Messages to repos like edk2-non-osi that does allow binaries.
So removing false positives is important for that use case.

I think we should address patches that add binaries as its
own topic and should be part of code review for maintainers
for repos that should not have binaries.

Thanks,

Mike

> -----Original Message-----
> From: Justen, Jordan L
> Sent: Friday, August 2, 2019 11:29 AM
> To: Kinney, Michael D <michael.d.kinney@intel.com>;
> devel@edk2.groups.io
> Cc: Feng, Bob C <bob.c.feng@intel.com>; Gao, Liming
> <liming.gao@intel.com>; Laszlo Ersek
> <lersek@redhat.com>
> Subject: Re: [Patch 3/3] BaseTools/PatchCheck: Disable
> text conversion in 'git show'
> 
> First, I hope we don't add lots of large .pdf files
> into the source tree. I see two duplicated > 200k .pdf
> files in edk2, which seems like a waste of space in the
> edk2 tree.
> 
> BaseTools/Source/C/BrotliCompress/docs/brotli-
> comparison-study-2015-09-22.pdf
> MdeModulePkg/Library/BrotliCustomDecompressLib/docs/bro
> tli-comparison-study-2015-09-22.pdf
> 
> Second, I'm not too sure about this change. It seems
> like it might have unintended consequences.
> 
> One thought I had is that it might break UTF-16 unicode
> files diffs, but luckily I think we've pretty much
> gotten rid of UTF-16 files at this point.
> 
> I also wonder if adding a .pdf attribute like Laszlo
> recommends for .efi might be an alternative solution.
> 
> https://github.com/tianocore/tianocore.github.io/wiki/L
> aszlo's-unkempt-git-guide-for-edk2-contributors-and-
> maintainers#contrib-09
> 
> We could actually add these setting in the git tree in
> a .gitattributes file, right? Has this been suggested?
> I think Laszlo documented this before we had converted
> edk2 to git.
> 
> -Jordan
> 
> On 2019-08-01 17:13:14, Michael D Kinney wrote:
> > https://bugzilla.tianocore.org/show_bug.cgi?id=2044
> >
> > 'git show' is used to extrat the patch contents for
> analysis.
> > Add the flag '--no-textconv' to the 'git show'
> command to disable the
> > conversion from some binary file types to text
> content.
> >
> > Without this change, binary files such as .pdf files
> are converted to
> > text in the show command and PatchCheck complains
> that the wrong line
> > endings are used in the patch.
> >
> > Cc: Bob Feng <bob.c.feng@intel.com>
> > Cc: Liming Gao <liming.gao@intel.com>
> > Cc: Jordan Justen <jordan.l.justen@intel.com>
> > Signed-off-by: Michael D Kinney
> <michael.d.kinney@intel.com>
> > ---
> >  BaseTools/Scripts/PatchCheck.py | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/BaseTools/Scripts/PatchCheck.py
> > b/BaseTools/Scripts/PatchCheck.py index
> 6b07241bfe..2a4e6f603e 100755
> > --- a/BaseTools/Scripts/PatchCheck.py
> > +++ b/BaseTools/Scripts/PatchCheck.py
> > @@ -543,7 +543,7 @@ class CheckGitCommits:
> >
> >      def read_patch_from_git(self, commit):
> >          # Run git to get the commit patch
> > -        return self.run_git('show', '--
> pretty=email', commit)
> > +        return self.run_git('show', '--
> pretty=email',
> > + '--no-textconv', commit)
> >
> >      def run_git(self, *args):
> >          cmd = [ 'git' ]
> > --
> > 2.21.0.windows.1
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-02 18:29   ` Jordan Justen
  2019-08-02 18:51     ` Michael D Kinney
@ 2019-08-02 23:57     ` Laszlo Ersek
  2019-08-05 22:01       ` [edk2-devel] " Michael D Kinney
  1 sibling, 1 reply; 17+ messages in thread
From: Laszlo Ersek @ 2019-08-02 23:57 UTC (permalink / raw)
  To: Jordan Justen, Michael D Kinney, devel; +Cc: Bob Feng, Liming Gao

On 08/02/19 20:29, Jordan Justen wrote:
> First, I hope we don't add lots of large .pdf files into the source
> tree. I see two duplicated > 200k .pdf files in edk2, which seems like
> a waste of space in the edk2 tree.
>
> BaseTools/Source/C/BrotliCompress/docs/brotli-comparison-study-2015-09-22.pdf
> MdeModulePkg/Library/BrotliCustomDecompressLib/docs/brotli-comparison-study-2015-09-22.pdf
>
> Second, I'm not too sure about this change. It seems like it might
> have unintended consequences.
>
> One thought I had is that it might break UTF-16 unicode files diffs,
> but luckily I think we've pretty much gotten rid of UTF-16 files at
> this point.
>
> I also wonder if adding a .pdf attribute like Laszlo recommends for
> .efi might be an alternative solution.
>
> https://github.com/tianocore/tianocore.github.io/wiki/Laszlo's-unkempt-git-guide-for-edk2-contributors-and-maintainers#contrib-09
>
> We could actually add these setting in the git tree in a
> .gitattributes file, right? Has this been suggested? I think Laszlo
> documented this before we had converted edk2 to git.

This is a complex question.

There are files which are binary for all intents and purposes, so you
never want to see a "textualized" version of them. Git can attempt
recognizing these files, but it can take hints too.

For these files, once they are safely classified as "binary", it is a
separate question whether you want a changeset shown (or
email-formatted) for these files to contain base64-like binary patches,
or just a statement that "binary files differ".

Then there are files that are binary, but can be "textualized" for the
purposes of diffing and even merging. Automated UCS2<->UTF8 conversion
is an example, and I used that while edk2 still had UCS2-encoded UNI
files.

Some examples (not claiming this is going to be an exhaustive list --
for that, see gitattributes(5)), to be placed in .git/info/attributes
(local clone only) or .gitattributes (tracked in the repo):

*.efi -diff

This classifies *.efi as binary (turns off the heuristics).

Consider gitattributes(5) -- excerpt:

       diff
           Unset
               A path to which the diff attribute is unset will
               generate Binary files differ (or a binary patch, if
               binary patches are enabled).
           Unspecified
               A path to which the diff attribute is unspecified first
               gets its contents inspected, and if it looks like text
               and is smaller than core.bigFileThreshold, it is
               treated as text. Otherwise it would generate Binary
               files differ.

git-show will only report "binary files differ", unless we pass --binary
to it; in which case it will offer a base64-like binary patch.
git-format-patch will produce that binary patch by default, unless we
pass --no-binary to it.


Another example, in .git/info/attributes or .gitattributes:

*.uni diff=uni merge=uni

The manual says:

       diff
           String
               Diff is shown using the specified diff driver. Each
               driver may specify one or more options, as described in
               the following section. The options for the diff driver
               "foo" are defined by the configuration variables in the
               "diff.foo" section of the Git config file.

       merge
           String
               3-way merge is performed using the specified custom
               merge driver. The built-in 3-way merge driver can be
               explicitly specified by asking for "text" driver; the
               built-in "take the current branch" driver can be
               requested with "binary".

Accordingly, from .git/config:

[diff "uni"]
      textconv = iconv -f UCS-2 -t UTF-8
      binary = true

[merge "uni"]
      name = merge driver for UCS-2 encoded text files
      driver = uni-merge %O %A %B

And my uni-merge script:

---------
#!/bin/bash

# Merge UCS-2 encoded text files.
# This is a git-merge driver. It will overwrite "$MINE".
#
# Exit status:
# 0 -- no conflicts
# 1 -- conflicts
# 2 -- trouble

OLD="$1"
MINE="$2"
YOURS="$3"

# Set safe defaults for the trap handler.
OLD_UTF8=
MY_UTF8=
YOUR_UTF8=
NEW_UTF8=
EX=2

trap 'rm -f -- "$OLD_UTF8" "$MY_UTF8" "$YOUR_UTF8" "$NEW_UTF8"; exit $EX' EXIT
set -e -u -C

OLD_UTF8=$(mktemp)
iconv --from UCS-2 --to UTF-8 <"$OLD" >>"$OLD_UTF8"

MY_UTF8=$(mktemp)
iconv --from UCS-2 --to UTF-8 <"$MINE" >>"$MY_UTF8"

YOUR_UTF8=$(mktemp)
iconv --from UCS-2 --to UTF-8 <"$YOURS" >>"$YOUR_UTF8"

NEW_UTF8=$(mktemp)
set +e
diff3 --merge --label=mine --label=old --label=yours -- \
    "$MY_UTF8" "$OLD_UTF8" "$YOUR_UTF8" >>"$NEW_UTF8"
DIFFRET=$?
set -e

if [ 0 -ne "$DIFFRET" ] && [ 1 -ne "$DIFFRET" ]; then
  exit
fi

iconv --from UTF-8 --to UCS-2 <"$NEW_UTF8" >|"$MINE"
EX=$DIFFRET
---------

My understanding is that the "--no-textconv" option would undo the
"textconv" that I set under [diff "uni"].

In other words, it tells git that, even if the heuristics suggested
*some* conversion to text (I can't say what that would be, in the
general case), git should still treat the file as binary. (At which
point the --no-binary / --binary options decide whether we get a
base64-like encoding, or just a message saying "binary files differ").
Therefore I think (= agree with Jordan) that the effect is the same as
marking the affected file types in .gitattributes as "-diff".

I guess the relevant question is scope. "--no-textconv" doesn't require
us to assign "-diff" to specific file suffixes, such as *.pdf, *.efi,
and so on. But, it also doesn't allow us to specify *some* text
conversions, even if we wanted to (it would override & disable them,
such as my "iconv" conversion when diffing UCS2 encoded UNI files).

Adding a .gitattributes file to the root of the repository, and updating
it whenever necessary (for example when we add the first .pdf file, the
first .efi file...), is likely the best.

(

My recommendation at
<https://github.com/tianocore/tianocore.github.io/wiki/Laszlo's-unkempt-git-guide-for-edk2-contributors-and-maintainers#contrib-09>
may have targeted the local ".git/info/attributes" file because people
can have quite different opinions on how git should act on various
files. In theory, someone having simplistic PDF files might even want:

In the attributes:

*.pdf diff=pdf

In the config:

[diff "pdf"]
  textconv = sh -c 'pdftotext -- "$1" -' pdftotext-wrapper
  binary = true

("pdftotext" converts PDF to text, but its command line is awkward --
the hyphen marking stdout must be explicit, and must come as second
operand. Hence the wrapping into "sh" -- and then we need to spell out
$0 as well, for which I picked "pdftotext-wrapper". The "--" separator
is a safety measure against filenames starting with "-", which could be
mis-interpreted as options. "--" terminates the option list and starts
the operand list.)

)

Thanks
Laszlo


> On 2019-08-01 17:13:14, Michael D Kinney wrote:
>> https://bugzilla.tianocore.org/show_bug.cgi?id=2044
>>
>> 'git show' is used to extrat the patch contents for analysis.
>> Add the flag '--no-textconv' to the 'git show' command to
>> disable the conversion from some binary file types to text
>> content.
>>
>> Without this change, binary files such as .pdf files are
>> converted to text in the show command and PatchCheck complains
>> that the wrong line endings are used in the patch.
>>
>> Cc: Bob Feng <bob.c.feng@intel.com>
>> Cc: Liming Gao <liming.gao@intel.com>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>
>> ---
>>  BaseTools/Scripts/PatchCheck.py | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/BaseTools/Scripts/PatchCheck.py b/BaseTools/Scripts/PatchCheck.py
>> index 6b07241bfe..2a4e6f603e 100755
>> --- a/BaseTools/Scripts/PatchCheck.py
>> +++ b/BaseTools/Scripts/PatchCheck.py
>> @@ -543,7 +543,7 @@ class CheckGitCommits:
>>
>>      def read_patch_from_git(self, commit):
>>          # Run git to get the commit patch
>> -        return self.run_git('show', '--pretty=email', commit)
>> +        return self.run_git('show', '--pretty=email', '--no-textconv', commit)
>>
>>      def run_git(self, *args):
>>          cmd = [ 'git' ]
>> --
>> 2.21.0.windows.1
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-02 23:57     ` Laszlo Ersek
@ 2019-08-05 22:01       ` Michael D Kinney
  2019-08-07 13:26         ` Laszlo Ersek
  2019-08-07 19:42         ` Jordan Justen
  0 siblings, 2 replies; 17+ messages in thread
From: Michael D Kinney @ 2019-08-05 22:01 UTC (permalink / raw)
  To: devel@edk2.groups.io, lersek@redhat.com, Justen, Jordan L,
	Kinney, Michael D
  Cc: Feng, Bob C, Gao, Liming

Laszlo,

The context of this change is only to the PatchCheck.py tool.
and how that tool uses git show.

I agree with the summary of very flexible capabilities in git
to help developers review different types of files.  All of 
those settings that were added to support UNI files in UTF-16
file format were very valuable when we had to review the
text changes to those binary files.  We should be using UTF-8
now, and we can even extend PatchCheck.py to flag an error if
a UNI file is in UTF-16 format.

I still prefer we make this change only to PatchCheck.py to
prevent false positives and print lines of text that can 
not be found in a developer's working directory.  I prefer
this one time change to PatchCheck.py instead of having to
update .gitattributes whenever the git show features are 
extended to convert more binary files to text files.

My expectation is that EDK II Maintainers need to review 
if a binary file is ok or not for a repo.  I would also be
ok with adding general rules to PatchCheck.py to generate
an error if a binary file is added/updated in one of the
text only repos (edk2, edk2-platforms) and not generate
an error if a binary file is added/updated in a repo that
supports binaries (edk-non-osi).

Best regards,

Mike


> -----Original Message-----
> From: devel@edk2.groups.io
> [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek
> Sent: Friday, August 2, 2019 4:58 PM
> To: Justen, Jordan L <jordan.l.justen@intel.com>;
> Kinney, Michael D <michael.d.kinney@intel.com>;
> devel@edk2.groups.io
> Cc: Feng, Bob C <bob.c.feng@intel.com>; Gao, Liming
> <liming.gao@intel.com>
> Subject: Re: [edk2-devel] [Patch 3/3]
> BaseTools/PatchCheck: Disable text conversion in 'git
> show'
> 
> On 08/02/19 20:29, Jordan Justen wrote:
> > First, I hope we don't add lots of large .pdf files
> into the source
> > tree. I see two duplicated > 200k .pdf files in edk2,
> which seems like
> > a waste of space in the edk2 tree.
> >
> > BaseTools/Source/C/BrotliCompress/docs/brotli-
> comparison-study-2015-09
> > -22.pdf
> >
> MdeModulePkg/Library/BrotliCustomDecompressLib/docs/bro
> tli-comparison-
> > study-2015-09-22.pdf
> >
> > Second, I'm not too sure about this change. It seems
> like it might
> > have unintended consequences.
> >
> > One thought I had is that it might break UTF-16
> unicode files diffs,
> > but luckily I think we've pretty much gotten rid of
> UTF-16 files at
> > this point.
> >
> > I also wonder if adding a .pdf attribute like Laszlo
> recommends for
> > .efi might be an alternative solution.
> >
> >
> https://github.com/tianocore/tianocore.github.io/wiki/L
> aszlo's-unkempt
> > -git-guide-for-edk2-contributors-and-
> maintainers#contrib-09
> >
> > We could actually add these setting in the git tree
> in a
> > .gitattributes file, right? Has this been suggested?
> I think Laszlo
> > documented this before we had converted edk2 to git.
> 
> This is a complex question.
> 
> There are files which are binary for all intents and
> purposes, so you never want to see a "textualized"
> version of them. Git can attempt recognizing these
> files, but it can take hints too.
> 
> For these files, once they are safely classified as
> "binary", it is a separate question whether you want a
> changeset shown (or
> email-formatted) for these files to contain base64-like
> binary patches, or just a statement that "binary files
> differ".
> 
> Then there are files that are binary, but can be
> "textualized" for the purposes of diffing and even
> merging. Automated UCS2<->UTF8 conversion is an
> example, and I used that while edk2 still had UCS2-
> encoded UNI files.
> 
> Some examples (not claiming this is going to be an
> exhaustive list -- for that, see gitattributes(5)), to
> be placed in .git/info/attributes (local clone only) or
> .gitattributes (tracked in the repo):
> 
> *.efi -diff
> 
> This classifies *.efi as binary (turns off the
> heuristics).
> 
> Consider gitattributes(5) -- excerpt:
> 
>        diff
>            Unset
>                A path to which the diff attribute is
> unset will
>                generate Binary files differ (or a
> binary patch, if
>                binary patches are enabled).
>            Unspecified
>                A path to which the diff attribute is
> unspecified first
>                gets its contents inspected, and if it
> looks like text
>                and is smaller than
> core.bigFileThreshold, it is
>                treated as text. Otherwise it would
> generate Binary
>                files differ.
> 
> git-show will only report "binary files differ", unless
> we pass --binary to it; in which case it will offer a
> base64-like binary patch.
> git-format-patch will produce that binary patch by
> default, unless we pass --no-binary to it.
> 
> 
> Another example, in .git/info/attributes or
> .gitattributes:
> 
> *.uni diff=uni merge=uni
> 
> The manual says:
> 
>        diff
>            String
>                Diff is shown using the specified diff
> driver. Each
>                driver may specify one or more options,
> as described in
>                the following section. The options for
> the diff driver
>                "foo" are defined by the configuration
> variables in the
>                "diff.foo" section of the Git config
> file.
> 
>        merge
>            String
>                3-way merge is performed using the
> specified custom
>                merge driver. The built-in 3-way merge
> driver can be
>                explicitly specified by asking for
> "text" driver; the
>                built-in "take the current branch"
> driver can be
>                requested with "binary".
> 
> Accordingly, from .git/config:
> 
> [diff "uni"]
>       textconv = iconv -f UCS-2 -t UTF-8
>       binary = true
> 
> [merge "uni"]
>       name = merge driver for UCS-2 encoded text files
>       driver = uni-merge %O %A %B
> 
> And my uni-merge script:
> 
> ---------
> #!/bin/bash
> 
> # Merge UCS-2 encoded text files.
> # This is a git-merge driver. It will overwrite
> "$MINE".
> #
> # Exit status:
> # 0 -- no conflicts
> # 1 -- conflicts
> # 2 -- trouble
> 
> OLD="$1"
> MINE="$2"
> YOURS="$3"
> 
> # Set safe defaults for the trap handler.
> OLD_UTF8=
> MY_UTF8=
> YOUR_UTF8=
> NEW_UTF8=
> EX=2
> 
> trap 'rm -f -- "$OLD_UTF8" "$MY_UTF8" "$YOUR_UTF8"
> "$NEW_UTF8"; exit $EX' EXIT set -e -u -C
> 
> OLD_UTF8=$(mktemp)
> iconv --from UCS-2 --to UTF-8 <"$OLD" >>"$OLD_UTF8"
> 
> MY_UTF8=$(mktemp)
> iconv --from UCS-2 --to UTF-8 <"$MINE" >>"$MY_UTF8"
> 
> YOUR_UTF8=$(mktemp)
> iconv --from UCS-2 --to UTF-8 <"$YOURS" >>"$YOUR_UTF8"
> 
> NEW_UTF8=$(mktemp)
> set +e
> diff3 --merge --label=mine --label=old --label=yours --
> \
>     "$MY_UTF8" "$OLD_UTF8" "$YOUR_UTF8" >>"$NEW_UTF8"
> DIFFRET=$?
> set -e
> 
> if [ 0 -ne "$DIFFRET" ] && [ 1 -ne "$DIFFRET" ]; then
>   exit
> fi
> 
> iconv --from UTF-8 --to UCS-2 <"$NEW_UTF8" >|"$MINE"
> EX=$DIFFRET
> ---------
> 
> My understanding is that the "--no-textconv" option
> would undo the "textconv" that I set under [diff
> "uni"].
> 
> In other words, it tells git that, even if the
> heuristics suggested
> *some* conversion to text (I can't say what that would
> be, in the general case), git should still treat the
> file as binary. (At which point the --no-binary / --
> binary options decide whether we get a base64-like
> encoding, or just a message saying "binary files
> differ").
> Therefore I think (= agree with Jordan) that the effect
> is the same as marking the affected file types in
> .gitattributes as "-diff".
> 
> I guess the relevant question is scope. "--no-textconv"
> doesn't require us to assign "-diff" to specific file
> suffixes, such as *.pdf, *.efi, and so on. But, it also
> doesn't allow us to specify *some* text conversions,
> even if we wanted to (it would override & disable them,
> such as my "iconv" conversion when diffing UCS2 encoded
> UNI files).
> 
> Adding a .gitattributes file to the root of the
> repository, and updating it whenever necessary (for
> example when we add the first .pdf file, the first .efi
> file...), is likely the best.
> 
> (
> 
> My recommendation at
> <https://github.com/tianocore/tianocore.github.io/wiki/
> Laszlo's-unkempt-git-guide-for-edk2-contributors-and-
> maintainers#contrib-09>
> may have targeted the local ".git/info/attributes" file
> because people can have quite different opinions on how
> git should act on various files. In theory, someone
> having simplistic PDF files might even want:
> 
> In the attributes:
> 
> *.pdf diff=pdf
> 
> In the config:
> 
> [diff "pdf"]
>   textconv = sh -c 'pdftotext -- "$1" -' pdftotext-
> wrapper
>   binary = true
> 
> ("pdftotext" converts PDF to text, but its command line
> is awkward -- the hyphen marking stdout must be
> explicit, and must come as second operand. Hence the
> wrapping into "sh" -- and then we need to spell out
> $0 as well, for which I picked "pdftotext-wrapper". The
> "--" separator is a safety measure against filenames
> starting with "-", which could be mis-interpreted as
> options. "--" terminates the option list and starts the
> operand list.)
> 
> )
> 
> Thanks
> Laszlo
> 
> 
> > On 2019-08-01 17:13:14, Michael D Kinney wrote:
> >> https://bugzilla.tianocore.org/show_bug.cgi?id=2044
> >>
> >> 'git show' is used to extrat the patch contents for
> analysis.
> >> Add the flag '--no-textconv' to the 'git show'
> command to disable the
> >> conversion from some binary file types to text
> content.
> >>
> >> Without this change, binary files such as .pdf files
> are converted to
> >> text in the show command and PatchCheck complains
> that the wrong line
> >> endings are used in the patch.
> >>
> >> Cc: Bob Feng <bob.c.feng@intel.com>
> >> Cc: Liming Gao <liming.gao@intel.com>
> >> Cc: Jordan Justen <jordan.l.justen@intel.com>
> >> Signed-off-by: Michael D Kinney
> <michael.d.kinney@intel.com>
> >> ---
> >>  BaseTools/Scripts/PatchCheck.py | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/BaseTools/Scripts/PatchCheck.py
> >> b/BaseTools/Scripts/PatchCheck.py index
> 6b07241bfe..2a4e6f603e 100755
> >> --- a/BaseTools/Scripts/PatchCheck.py
> >> +++ b/BaseTools/Scripts/PatchCheck.py
> >> @@ -543,7 +543,7 @@ class CheckGitCommits:
> >>
> >>      def read_patch_from_git(self, commit):
> >>          # Run git to get the commit patch
> >> -        return self.run_git('show', '--
> pretty=email', commit)
> >> +        return self.run_git('show', '--
> pretty=email',
> >> + '--no-textconv', commit)
> >>
> >>      def run_git(self, *args):
> >>          cmd = [ 'git' ]
> >> --
> >> 2.21.0.windows.1
> >>
> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 0/3] BaseTools/PatchCheck: Fix false positives
  2019-08-02  0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
                   ` (4 preceding siblings ...)
  2019-08-02 18:30 ` Jordan Justen
@ 2019-08-07 11:41 ` Liming Gao
  5 siblings, 0 replies; 17+ messages in thread
From: Liming Gao @ 2019-08-07 11:41 UTC (permalink / raw)
  To: Kinney, Michael D, devel@edk2.groups.io; +Cc: Feng, Bob C, Justen, Jordan L

Mike:
  For this patch set, I am OK for those changes. Reviewed-by: Liming Gao <liming.gao@intel.com>

  But, I see there are some discussion on patch 3. After we make the conclusion, you can push it. 
  
Thanks
Liming
>-----Original Message-----
>From: Kinney, Michael D
>Sent: Friday, August 02, 2019 8:13 AM
>To: devel@edk2.groups.io
>Cc: Feng, Bob C <bob.c.feng@intel.com>; Gao, Liming
><liming.gao@intel.com>; Justen, Jordan L <jordan.l.justen@intel.com>
>Subject: [Patch 0/3] BaseTools/PatchCheck: Fix false positives
>
>https://bugzilla.tianocore.org/show_bug.cgi?id=2044
>
>* Add '--no-textconv' to 'git show' command to disable binary->text
>conversion.
>* Ignore blank lines in patch
>* Ignore lines with 'copy from ' and 'copy to ' prefixes
>
>Cc: Bob Feng <bob.c.feng@intel.com>
>Cc: Liming Gao <liming.gao@intel.com>
>Cc: Jordan Justen <jordan.l.justen@intel.com>
>Signed-off-by: Michael D Kinney <michael.d.kinney@intel.com>
>
>Michael D Kinney (3):
>  BaseTools/PatchCheck: Ignore blank lines in diff
>  BaseTools/PatchCheck: Add copy from/to keywords
>  BaseTools/PatchCheck: Disable text conversion in 'git show'
>
> BaseTools/Scripts/PatchCheck.py | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
>--
>2.21.0.windows.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-05 22:01       ` [edk2-devel] " Michael D Kinney
@ 2019-08-07 13:26         ` Laszlo Ersek
  2019-08-07 19:42         ` Jordan Justen
  1 sibling, 0 replies; 17+ messages in thread
From: Laszlo Ersek @ 2019-08-07 13:26 UTC (permalink / raw)
  To: Kinney, Michael D, devel@edk2.groups.io, Justen, Jordan L
  Cc: Feng, Bob C, Gao, Liming

On 08/06/19 00:01, Kinney, Michael D wrote:
> Laszlo,
> 
> The context of this change is only to the PatchCheck.py tool.
> and how that tool uses git show.
> 
> I agree with the summary of very flexible capabilities in git
> to help developers review different types of files.  All of 
> those settings that were added to support UNI files in UTF-16
> file format were very valuable when we had to review the
> text changes to those binary files.  We should be using UTF-8
> now, and we can even extend PatchCheck.py to flag an error if
> a UNI file is in UTF-16 format.
> 
> I still prefer we make this change only to PatchCheck.py to
> prevent false positives and print lines of text that can 
> not be found in a developer's working directory.  I prefer
> this one time change to PatchCheck.py instead of having to
> update .gitattributes whenever the git show features are 
> extended to convert more binary files to text files.

More precisely, that would mean that you prefer this one time change to
having to update .gitattributes every time a new type of binary file is
introduced to edk2.

I can see merits in both approaches. From my side, I don't intend to
block this patch.

Acked-by: Laszlo Ersek <lersek@redhat.com>

Thanks
Laszlo

> My expectation is that EDK II Maintainers need to review 
> if a binary file is ok or not for a repo.  I would also be
> ok with adding general rules to PatchCheck.py to generate
> an error if a binary file is added/updated in one of the
> text only repos (edk2, edk2-platforms) and not generate
> an error if a binary file is added/updated in a repo that
> supports binaries (edk-non-osi).
> 
> Best regards,
> 
> Mike
> 
> 
>> -----Original Message-----
>> From: devel@edk2.groups.io
>> [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek
>> Sent: Friday, August 2, 2019 4:58 PM
>> To: Justen, Jordan L <jordan.l.justen@intel.com>;
>> Kinney, Michael D <michael.d.kinney@intel.com>;
>> devel@edk2.groups.io
>> Cc: Feng, Bob C <bob.c.feng@intel.com>; Gao, Liming
>> <liming.gao@intel.com>
>> Subject: Re: [edk2-devel] [Patch 3/3]
>> BaseTools/PatchCheck: Disable text conversion in 'git
>> show'
>>
>> On 08/02/19 20:29, Jordan Justen wrote:
>>> First, I hope we don't add lots of large .pdf files
>> into the source
>>> tree. I see two duplicated > 200k .pdf files in edk2,
>> which seems like
>>> a waste of space in the edk2 tree.
>>>
>>> BaseTools/Source/C/BrotliCompress/docs/brotli-
>> comparison-study-2015-09
>>> -22.pdf
>>>
>> MdeModulePkg/Library/BrotliCustomDecompressLib/docs/bro
>> tli-comparison-
>>> study-2015-09-22.pdf
>>>
>>> Second, I'm not too sure about this change. It seems
>> like it might
>>> have unintended consequences.
>>>
>>> One thought I had is that it might break UTF-16
>> unicode files diffs,
>>> but luckily I think we've pretty much gotten rid of
>> UTF-16 files at
>>> this point.
>>>
>>> I also wonder if adding a .pdf attribute like Laszlo
>> recommends for
>>> .efi might be an alternative solution.
>>>
>>>
>> https://github.com/tianocore/tianocore.github.io/wiki/L
>> aszlo's-unkempt
>>> -git-guide-for-edk2-contributors-and-
>> maintainers#contrib-09
>>>
>>> We could actually add these setting in the git tree
>> in a
>>> .gitattributes file, right? Has this been suggested?
>> I think Laszlo
>>> documented this before we had converted edk2 to git.
>>
>> This is a complex question.
>>
>> There are files which are binary for all intents and
>> purposes, so you never want to see a "textualized"
>> version of them. Git can attempt recognizing these
>> files, but it can take hints too.
>>
>> For these files, once they are safely classified as
>> "binary", it is a separate question whether you want a
>> changeset shown (or
>> email-formatted) for these files to contain base64-like
>> binary patches, or just a statement that "binary files
>> differ".
>>
>> Then there are files that are binary, but can be
>> "textualized" for the purposes of diffing and even
>> merging. Automated UCS2<->UTF8 conversion is an
>> example, and I used that while edk2 still had UCS2-
>> encoded UNI files.
>>
>> Some examples (not claiming this is going to be an
>> exhaustive list -- for that, see gitattributes(5)), to
>> be placed in .git/info/attributes (local clone only) or
>> .gitattributes (tracked in the repo):
>>
>> *.efi -diff
>>
>> This classifies *.efi as binary (turns off the
>> heuristics).
>>
>> Consider gitattributes(5) -- excerpt:
>>
>>        diff
>>            Unset
>>                A path to which the diff attribute is
>> unset will
>>                generate Binary files differ (or a
>> binary patch, if
>>                binary patches are enabled).
>>            Unspecified
>>                A path to which the diff attribute is
>> unspecified first
>>                gets its contents inspected, and if it
>> looks like text
>>                and is smaller than
>> core.bigFileThreshold, it is
>>                treated as text. Otherwise it would
>> generate Binary
>>                files differ.
>>
>> git-show will only report "binary files differ", unless
>> we pass --binary to it; in which case it will offer a
>> base64-like binary patch.
>> git-format-patch will produce that binary patch by
>> default, unless we pass --no-binary to it.
>>
>>
>> Another example, in .git/info/attributes or
>> .gitattributes:
>>
>> *.uni diff=uni merge=uni
>>
>> The manual says:
>>
>>        diff
>>            String
>>                Diff is shown using the specified diff
>> driver. Each
>>                driver may specify one or more options,
>> as described in
>>                the following section. The options for
>> the diff driver
>>                "foo" are defined by the configuration
>> variables in the
>>                "diff.foo" section of the Git config
>> file.
>>
>>        merge
>>            String
>>                3-way merge is performed using the
>> specified custom
>>                merge driver. The built-in 3-way merge
>> driver can be
>>                explicitly specified by asking for
>> "text" driver; the
>>                built-in "take the current branch"
>> driver can be
>>                requested with "binary".
>>
>> Accordingly, from .git/config:
>>
>> [diff "uni"]
>>       textconv = iconv -f UCS-2 -t UTF-8
>>       binary = true
>>
>> [merge "uni"]
>>       name = merge driver for UCS-2 encoded text files
>>       driver = uni-merge %O %A %B
>>
>> And my uni-merge script:
>>
>> ---------
>> #!/bin/bash
>>
>> # Merge UCS-2 encoded text files.
>> # This is a git-merge driver. It will overwrite
>> "$MINE".
>> #
>> # Exit status:
>> # 0 -- no conflicts
>> # 1 -- conflicts
>> # 2 -- trouble
>>
>> OLD="$1"
>> MINE="$2"
>> YOURS="$3"
>>
>> # Set safe defaults for the trap handler.
>> OLD_UTF8=
>> MY_UTF8=
>> YOUR_UTF8=
>> NEW_UTF8=
>> EX=2
>>
>> trap 'rm -f -- "$OLD_UTF8" "$MY_UTF8" "$YOUR_UTF8"
>> "$NEW_UTF8"; exit $EX' EXIT set -e -u -C
>>
>> OLD_UTF8=$(mktemp)
>> iconv --from UCS-2 --to UTF-8 <"$OLD" >>"$OLD_UTF8"
>>
>> MY_UTF8=$(mktemp)
>> iconv --from UCS-2 --to UTF-8 <"$MINE" >>"$MY_UTF8"
>>
>> YOUR_UTF8=$(mktemp)
>> iconv --from UCS-2 --to UTF-8 <"$YOURS" >>"$YOUR_UTF8"
>>
>> NEW_UTF8=$(mktemp)
>> set +e
>> diff3 --merge --label=mine --label=old --label=yours --
>> \
>>     "$MY_UTF8" "$OLD_UTF8" "$YOUR_UTF8" >>"$NEW_UTF8"
>> DIFFRET=$?
>> set -e
>>
>> if [ 0 -ne "$DIFFRET" ] && [ 1 -ne "$DIFFRET" ]; then
>>   exit
>> fi
>>
>> iconv --from UTF-8 --to UCS-2 <"$NEW_UTF8" >|"$MINE"
>> EX=$DIFFRET
>> ---------
>>
>> My understanding is that the "--no-textconv" option
>> would undo the "textconv" that I set under [diff
>> "uni"].
>>
>> In other words, it tells git that, even if the
>> heuristics suggested
>> *some* conversion to text (I can't say what that would
>> be, in the general case), git should still treat the
>> file as binary. (At which point the --no-binary / --
>> binary options decide whether we get a base64-like
>> encoding, or just a message saying "binary files
>> differ").
>> Therefore I think (= agree with Jordan) that the effect
>> is the same as marking the affected file types in
>> .gitattributes as "-diff".
>>
>> I guess the relevant question is scope. "--no-textconv"
>> doesn't require us to assign "-diff" to specific file
>> suffixes, such as *.pdf, *.efi, and so on. But, it also
>> doesn't allow us to specify *some* text conversions,
>> even if we wanted to (it would override & disable them,
>> such as my "iconv" conversion when diffing UCS2 encoded
>> UNI files).
>>
>> Adding a .gitattributes file to the root of the
>> repository, and updating it whenever necessary (for
>> example when we add the first .pdf file, the first .efi
>> file...), is likely the best.
>>
>> (
>>
>> My recommendation at
>> <https://github.com/tianocore/tianocore.github.io/wiki/
>> Laszlo's-unkempt-git-guide-for-edk2-contributors-and-
>> maintainers#contrib-09>
>> may have targeted the local ".git/info/attributes" file
>> because people can have quite different opinions on how
>> git should act on various files. In theory, someone
>> having simplistic PDF files might even want:
>>
>> In the attributes:
>>
>> *.pdf diff=pdf
>>
>> In the config:
>>
>> [diff "pdf"]
>>   textconv = sh -c 'pdftotext -- "$1" -' pdftotext-
>> wrapper
>>   binary = true
>>
>> ("pdftotext" converts PDF to text, but its command line
>> is awkward -- the hyphen marking stdout must be
>> explicit, and must come as second operand. Hence the
>> wrapping into "sh" -- and then we need to spell out
>> $0 as well, for which I picked "pdftotext-wrapper". The
>> "--" separator is a safety measure against filenames
>> starting with "-", which could be mis-interpreted as
>> options. "--" terminates the option list and starts the
>> operand list.)
>>
>> )
>>
>> Thanks
>> Laszlo
>>
>>
>>> On 2019-08-01 17:13:14, Michael D Kinney wrote:
>>>> https://bugzilla.tianocore.org/show_bug.cgi?id=2044
>>>>
>>>> 'git show' is used to extrat the patch contents for
>> analysis.
>>>> Add the flag '--no-textconv' to the 'git show'
>> command to disable the
>>>> conversion from some binary file types to text
>> content.
>>>>
>>>> Without this change, binary files such as .pdf files
>> are converted to
>>>> text in the show command and PatchCheck complains
>> that the wrong line
>>>> endings are used in the patch.
>>>>
>>>> Cc: Bob Feng <bob.c.feng@intel.com>
>>>> Cc: Liming Gao <liming.gao@intel.com>
>>>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>>>> Signed-off-by: Michael D Kinney
>> <michael.d.kinney@intel.com>
>>>> ---
>>>>  BaseTools/Scripts/PatchCheck.py | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/BaseTools/Scripts/PatchCheck.py
>>>> b/BaseTools/Scripts/PatchCheck.py index
>> 6b07241bfe..2a4e6f603e 100755
>>>> --- a/BaseTools/Scripts/PatchCheck.py
>>>> +++ b/BaseTools/Scripts/PatchCheck.py
>>>> @@ -543,7 +543,7 @@ class CheckGitCommits:
>>>>
>>>>      def read_patch_from_git(self, commit):
>>>>          # Run git to get the commit patch
>>>> -        return self.run_git('show', '--
>> pretty=email', commit)
>>>> +        return self.run_git('show', '--
>> pretty=email',
>>>> + '--no-textconv', commit)
>>>>
>>>>      def run_git(self, *args):
>>>>          cmd = [ 'git' ]
>>>> --
>>>> 2.21.0.windows.1
>>>>
>>
>>
>> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-05 22:01       ` [edk2-devel] " Michael D Kinney
  2019-08-07 13:26         ` Laszlo Ersek
@ 2019-08-07 19:42         ` Jordan Justen
  2019-08-08 20:57           ` Laszlo Ersek
  1 sibling, 1 reply; 17+ messages in thread
From: Jordan Justen @ 2019-08-07 19:42 UTC (permalink / raw)
  To: Kinney, Michael D, devel@edk2.groups.io, lersek@redhat.com
  Cc: Feng, Bob C, Gao, Liming

On 2019-08-05 15:01:14, Kinney, Michael D wrote:
> Laszlo,
> 
> The context of this change is only to the PatchCheck.py tool.
> and how that tool uses git show.
> 
> I agree with the summary of very flexible capabilities in git
> to help developers review different types of files.  All of 
> those settings that were added to support UNI files in UTF-16
> file format were very valuable when we had to review the
> text changes to those binary files.  We should be using UTF-8
> now, and we can even extend PatchCheck.py to flag an error if
> a UNI file is in UTF-16 format.

Ah. If we are to the point where we want to actively prevent utf-16 in
the tree, then this sounds like a good idea.

> I still prefer we make this change only to PatchCheck.py to
> prevent false positives and print lines of text that can 
> not be found in a developer's working directory.  I prefer
> this one time change to PatchCheck.py instead of having to
> update .gitattributes whenever the git show features are 
> extended to convert more binary files to text files.

I think it's pretty rare for EDK II to add new binary file types, but
I don't feel too strongly on this. I thought adding Laszlo's settings
to .gitattributes might amount to solving two issues with one change.

Since Laszlo acked this patch, I'll go ahead with:

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

> My expectation is that EDK II Maintainers need to review 
> if a binary file is ok or not for a repo.  I would also be
> ok with adding general rules to PatchCheck.py to generate
> an error if a binary file is added/updated in one of the
> text only repos (edk2, edk2-platforms) and not generate
> an error if a binary file is added/updated in a repo that
> supports binaries (edk-non-osi).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-07 19:42         ` Jordan Justen
@ 2019-08-08 20:57           ` Laszlo Ersek
  2019-08-08 21:08             ` Michael D Kinney
  0 siblings, 1 reply; 17+ messages in thread
From: Laszlo Ersek @ 2019-08-08 20:57 UTC (permalink / raw)
  To: devel, jordan.l.justen, Kinney, Michael D; +Cc: Feng, Bob C, Gao, Liming

On 08/07/19 21:42, Jordan Justen wrote:

> I thought adding Laszlo's settings
> to .gitattributes might amount to solving two issues with one change.

Independently of the present patch, I would be very much in favor of
tracking a .gitattributes file in the project root dir, *if* we could
also automate the following setting:

git config diff.ini.xfuncname     '^\[[A-Za-z0-9_., ]+]'

For me as a reviewer, that setting (on the submitter side) makes a huge
difference. I keep asking people to do it manually, and that gets old
really quick. Unfortunately, I don't know how to automate that config
knob from within the source tree. :(

Thanks,
Laszlo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-08 20:57           ` Laszlo Ersek
@ 2019-08-08 21:08             ` Michael D Kinney
  2019-08-08 21:15               ` Andrew Fish
  0 siblings, 1 reply; 17+ messages in thread
From: Michael D Kinney @ 2019-08-08 21:08 UTC (permalink / raw)
  To: devel@edk2.groups.io, lersek@redhat.com, Justen, Jordan L,
	Kinney, Michael D
  Cc: Feng, Bob C, Gao, Liming

Laszlo,

This sounds like a good improvement.

Do you have a recommended set of settings you would like
to see checked in to the root of each EDK II related 
repository that would be required settings for all EDK II
developers.

Do you want to enter a BZ and prepare a patch with this
new file for the edk2 repo.  Once it is approved and added
to edk2 repo, we can propagate to the other EDK II related
repos.

Thanks,

Mike

> -----Original Message-----
> From: devel@edk2.groups.io
> [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek
> Sent: Thursday, August 8, 2019 1:58 PM
> To: devel@edk2.groups.io; Justen, Jordan L
> <jordan.l.justen@intel.com>; Kinney, Michael D
> <michael.d.kinney@intel.com>
> Cc: Feng, Bob C <bob.c.feng@intel.com>; Gao, Liming
> <liming.gao@intel.com>
> Subject: Re: [edk2-devel] [Patch 3/3]
> BaseTools/PatchCheck: Disable text conversion in 'git
> show'
> 
> On 08/07/19 21:42, Jordan Justen wrote:
> 
> > I thought adding Laszlo's settings
> > to .gitattributes might amount to solving two issues
> with one change.
> 
> Independently of the present patch, I would be very
> much in favor of tracking a .gitattributes file in the
> project root dir, *if* we could also automate the
> following setting:
> 
> git config diff.ini.xfuncname     '^\[[A-Za-z0-9_.,
> ]+]'
> 
> For me as a reviewer, that setting (on the submitter
> side) makes a huge difference. I keep asking people to
> do it manually, and that gets old really quick.
> Unfortunately, I don't know how to automate that config
> knob from within the source tree. :(
> 
> Thanks,
> Laszlo
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-08 21:08             ` Michael D Kinney
@ 2019-08-08 21:15               ` Andrew Fish
  2019-08-08 22:34                 ` Laszlo Ersek
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Fish @ 2019-08-08 21:15 UTC (permalink / raw)
  To: devel, Mike Kinney
  Cc: lersek@redhat.com, Jordan Justen, Feng, Bob C, Gao, Liming

[-- Attachment #1: Type: text/plain, Size: 2176 bytes --]



> On Aug 8, 2019, at 2:08 PM, Michael D Kinney <michael.d.kinney@intel.com> wrote:
> 
> Laszlo,
> 
> This sounds like a good improvement.
> 
> Do you have a recommended set of settings you would like
> to see checked in to the root of each EDK II related 
> repository that would be required settings for all EDK II
> developers.
> 

Crazy idea allert ..... 

Could edksetup.sh have a --git option to set preferred git defaults?

Thanks,

Andrew Fish

> Do you want to enter a BZ and prepare a patch with this
> new file for the edk2 repo.  Once it is approved and added
> to edk2 repo, we can propagate to the other EDK II related
> repos.
> 
> Thanks,
> 
> Mike
> 
>> -----Original Message-----
>> From: devel@edk2.groups.io <mailto:devel@edk2.groups.io>
>> [mailto:devel@edk2.groups.io <mailto:devel@edk2.groups.io>] On Behalf Of Laszlo Ersek
>> Sent: Thursday, August 8, 2019 1:58 PM
>> To: devel@edk2.groups.io <mailto:devel@edk2.groups.io>; Justen, Jordan L
>> <jordan.l.justen@intel.com <mailto:jordan.l.justen@intel.com>>; Kinney, Michael D
>> <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>>
>> Cc: Feng, Bob C <bob.c.feng@intel.com <mailto:bob.c.feng@intel.com>>; Gao, Liming
>> <liming.gao@intel.com <mailto:liming.gao@intel.com>>
>> Subject: Re: [edk2-devel] [Patch 3/3]
>> BaseTools/PatchCheck: Disable text conversion in 'git
>> show'
>> 
>> On 08/07/19 21:42, Jordan Justen wrote:
>> 
>>> I thought adding Laszlo's settings
>>> to .gitattributes might amount to solving two issues
>> with one change.
>> 
>> Independently of the present patch, I would be very
>> much in favor of tracking a .gitattributes file in the
>> project root dir, *if* we could also automate the
>> following setting:
>> 
>> git config diff.ini.xfuncname     '^\[[A-Za-z0-9_.,
>> ]+]'
>> 
>> For me as a reviewer, that setting (on the submitter
>> side) makes a huge difference. I keep asking people to
>> do it manually, and that gets old really quick.
>> Unfortunately, I don't know how to automate that config
>> knob from within the source tree. :(
>> 
>> Thanks,
>> Laszlo
>> 
>> 
> 
> 
> 


[-- Attachment #2: Type: text/html, Size: 15516 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
  2019-08-08 21:15               ` Andrew Fish
@ 2019-08-08 22:34                 ` Laszlo Ersek
  0 siblings, 0 replies; 17+ messages in thread
From: Laszlo Ersek @ 2019-08-08 22:34 UTC (permalink / raw)
  To: Andrew Fish, devel, Mike Kinney, Leif Lindholm
  Cc: Jordan Justen, Feng, Bob C, Gao, Liming

On 08/08/19 23:15, Andrew Fish wrote:
> 
> 
>> On Aug 8, 2019, at 2:08 PM, Michael D Kinney <michael.d.kinney@intel.com> wrote:
>>
>> Laszlo,
>>
>> This sounds like a good improvement.
>>
>> Do you have a recommended set of settings you would like
>> to see checked in to the root of each EDK II related 
>> repository that would be required settings for all EDK II
>> developers.
>>
> 
> Crazy idea allert ..... 
> 
> Could edksetup.sh have a --git option to set preferred git defaults?

Again, here I have to apologize to Leif: Leif has already contributed
the SetupGit.py script, which needs to be run only once, and then the
right settings are put in place permanently.

See the following two commits:

- 5b3e695d8ac5 ("BaseTools: add centralized location for git config
files", 2019-06-12)

- 4eb0acb1e2be ("BaseTools: add script to configure local git options",
2019-06-12)

The first commit adds:

  BaseTools/Conf/diff.order
  BaseTools/Conf/gitattributes

and the second commit adds:

  BaseTools/Scripts/SetupGit.py

Running the "SetupGit.py" script points configuration knobs permanently
to the files added in the preceding commit, plus puts other settings in
place (also permanently).

Again, my (somewhat) lame excuse for forgetting about these commits is
that I've never been *forced* to run the script myself. :)

Thanks
Laszlo

>> Do you want to enter a BZ and prepare a patch with this
>> new file for the edk2 repo.  Once it is approved and added
>> to edk2 repo, we can propagate to the other EDK II related
>> repos.
>>
>> Thanks,
>>
>> Mike
>>
>>> -----Original Message-----
>>> From: devel@edk2.groups.io <mailto:devel@edk2.groups.io>
>>> [mailto:devel@edk2.groups.io <mailto:devel@edk2.groups.io>] On Behalf Of Laszlo Ersek
>>> Sent: Thursday, August 8, 2019 1:58 PM
>>> To: devel@edk2.groups.io <mailto:devel@edk2.groups.io>; Justen, Jordan L
>>> <jordan.l.justen@intel.com <mailto:jordan.l.justen@intel.com>>; Kinney, Michael D
>>> <michael.d.kinney@intel.com <mailto:michael.d.kinney@intel.com>>
>>> Cc: Feng, Bob C <bob.c.feng@intel.com <mailto:bob.c.feng@intel.com>>; Gao, Liming
>>> <liming.gao@intel.com <mailto:liming.gao@intel.com>>
>>> Subject: Re: [edk2-devel] [Patch 3/3]
>>> BaseTools/PatchCheck: Disable text conversion in 'git
>>> show'
>>>
>>> On 08/07/19 21:42, Jordan Justen wrote:
>>>
>>>> I thought adding Laszlo's settings
>>>> to .gitattributes might amount to solving two issues
>>> with one change.
>>>
>>> Independently of the present patch, I would be very
>>> much in favor of tracking a .gitattributes file in the
>>> project root dir, *if* we could also automate the
>>> following setting:
>>>
>>> git config diff.ini.xfuncname     '^\[[A-Za-z0-9_.,
>>> ]+]'
>>>
>>> For me as a reviewer, that setting (on the submitter
>>> side) makes a huge difference. I keep asking people to
>>> do it manually, and that gets old really quick.
>>> Unfortunately, I don't know how to automate that config
>>> knob from within the source tree. :(
>>>
>>> Thanks,
>>> Laszlo
>>>
>>>
>>
>>
>> 
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-08-08 22:34 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-08-02  0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
2019-08-02  0:13 ` [Patch 1/3] BaseTools/PatchCheck: Ignore blank lines in diff Michael D Kinney
2019-08-02  0:13 ` [Patch 2/3] BaseTools/PatchCheck: Add copy from/to keywords Michael D Kinney
2019-08-02  0:13 ` [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show' Michael D Kinney
2019-08-02 18:29   ` Jordan Justen
2019-08-02 18:51     ` Michael D Kinney
2019-08-02 23:57     ` Laszlo Ersek
2019-08-05 22:01       ` [edk2-devel] " Michael D Kinney
2019-08-07 13:26         ` Laszlo Ersek
2019-08-07 19:42         ` Jordan Justen
2019-08-08 20:57           ` Laszlo Ersek
2019-08-08 21:08             ` Michael D Kinney
2019-08-08 21:15               ` Andrew Fish
2019-08-08 22:34                 ` Laszlo Ersek
2019-08-02  7:59 ` [edk2-devel] [Patch 0/3] BaseTools/PatchCheck: Fix false positives Bob Feng
2019-08-02 18:30 ` Jordan Justen
2019-08-07 11:41 ` Liming Gao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox