From: "Laszlo Ersek" <lersek@redhat.com>
To: "Kinney, Michael D" <michael.d.kinney@intel.com>,
"devel@edk2.groups.io" <devel@edk2.groups.io>,
"Justen, Jordan L" <jordan.l.justen@intel.com>
Cc: "Feng, Bob C" <bob.c.feng@intel.com>,
"Gao, Liming" <liming.gao@intel.com>
Subject: Re: [edk2-devel] [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show'
Date: Wed, 7 Aug 2019 15:26:27 +0200 [thread overview]
Message-ID: <944f7f77-5500-aa76-824a-8e207e404ed0@redhat.com> (raw)
In-Reply-To: <E92EE9817A31E24EB0585FDF735412F5B9D83B8A@ORSMSX113.amr.corp.intel.com>
On 08/06/19 00:01, Kinney, Michael D wrote:
> Laszlo,
>
> The context of this change is only to the PatchCheck.py tool.
> and how that tool uses git show.
>
> I agree with the summary of very flexible capabilities in git
> to help developers review different types of files. All of
> those settings that were added to support UNI files in UTF-16
> file format were very valuable when we had to review the
> text changes to those binary files. We should be using UTF-8
> now, and we can even extend PatchCheck.py to flag an error if
> a UNI file is in UTF-16 format.
>
> I still prefer we make this change only to PatchCheck.py to
> prevent false positives and print lines of text that can
> not be found in a developer's working directory. I prefer
> this one time change to PatchCheck.py instead of having to
> update .gitattributes whenever the git show features are
> extended to convert more binary files to text files.
More precisely, that would mean that you prefer this one time change to
having to update .gitattributes every time a new type of binary file is
introduced to edk2.
I can see merits in both approaches. From my side, I don't intend to
block this patch.
Acked-by: Laszlo Ersek <lersek@redhat.com>
Thanks
Laszlo
> My expectation is that EDK II Maintainers need to review
> if a binary file is ok or not for a repo. I would also be
> ok with adding general rules to PatchCheck.py to generate
> an error if a binary file is added/updated in one of the
> text only repos (edk2, edk2-platforms) and not generate
> an error if a binary file is added/updated in a repo that
> supports binaries (edk-non-osi).
>
> Best regards,
>
> Mike
>
>
>> -----Original Message-----
>> From: devel@edk2.groups.io
>> [mailto:devel@edk2.groups.io] On Behalf Of Laszlo Ersek
>> Sent: Friday, August 2, 2019 4:58 PM
>> To: Justen, Jordan L <jordan.l.justen@intel.com>;
>> Kinney, Michael D <michael.d.kinney@intel.com>;
>> devel@edk2.groups.io
>> Cc: Feng, Bob C <bob.c.feng@intel.com>; Gao, Liming
>> <liming.gao@intel.com>
>> Subject: Re: [edk2-devel] [Patch 3/3]
>> BaseTools/PatchCheck: Disable text conversion in 'git
>> show'
>>
>> On 08/02/19 20:29, Jordan Justen wrote:
>>> First, I hope we don't add lots of large .pdf files
>> into the source
>>> tree. I see two duplicated > 200k .pdf files in edk2,
>> which seems like
>>> a waste of space in the edk2 tree.
>>>
>>> BaseTools/Source/C/BrotliCompress/docs/brotli-
>> comparison-study-2015-09
>>> -22.pdf
>>>
>> MdeModulePkg/Library/BrotliCustomDecompressLib/docs/bro
>> tli-comparison-
>>> study-2015-09-22.pdf
>>>
>>> Second, I'm not too sure about this change. It seems
>> like it might
>>> have unintended consequences.
>>>
>>> One thought I had is that it might break UTF-16
>> unicode files diffs,
>>> but luckily I think we've pretty much gotten rid of
>> UTF-16 files at
>>> this point.
>>>
>>> I also wonder if adding a .pdf attribute like Laszlo
>> recommends for
>>> .efi might be an alternative solution.
>>>
>>>
>> https://github.com/tianocore/tianocore.github.io/wiki/L
>> aszlo's-unkempt
>>> -git-guide-for-edk2-contributors-and-
>> maintainers#contrib-09
>>>
>>> We could actually add these setting in the git tree
>> in a
>>> .gitattributes file, right? Has this been suggested?
>> I think Laszlo
>>> documented this before we had converted edk2 to git.
>>
>> This is a complex question.
>>
>> There are files which are binary for all intents and
>> purposes, so you never want to see a "textualized"
>> version of them. Git can attempt recognizing these
>> files, but it can take hints too.
>>
>> For these files, once they are safely classified as
>> "binary", it is a separate question whether you want a
>> changeset shown (or
>> email-formatted) for these files to contain base64-like
>> binary patches, or just a statement that "binary files
>> differ".
>>
>> Then there are files that are binary, but can be
>> "textualized" for the purposes of diffing and even
>> merging. Automated UCS2<->UTF8 conversion is an
>> example, and I used that while edk2 still had UCS2-
>> encoded UNI files.
>>
>> Some examples (not claiming this is going to be an
>> exhaustive list -- for that, see gitattributes(5)), to
>> be placed in .git/info/attributes (local clone only) or
>> .gitattributes (tracked in the repo):
>>
>> *.efi -diff
>>
>> This classifies *.efi as binary (turns off the
>> heuristics).
>>
>> Consider gitattributes(5) -- excerpt:
>>
>> diff
>> Unset
>> A path to which the diff attribute is
>> unset will
>> generate Binary files differ (or a
>> binary patch, if
>> binary patches are enabled).
>> Unspecified
>> A path to which the diff attribute is
>> unspecified first
>> gets its contents inspected, and if it
>> looks like text
>> and is smaller than
>> core.bigFileThreshold, it is
>> treated as text. Otherwise it would
>> generate Binary
>> files differ.
>>
>> git-show will only report "binary files differ", unless
>> we pass --binary to it; in which case it will offer a
>> base64-like binary patch.
>> git-format-patch will produce that binary patch by
>> default, unless we pass --no-binary to it.
>>
>>
>> Another example, in .git/info/attributes or
>> .gitattributes:
>>
>> *.uni diff=uni merge=uni
>>
>> The manual says:
>>
>> diff
>> String
>> Diff is shown using the specified diff
>> driver. Each
>> driver may specify one or more options,
>> as described in
>> the following section. The options for
>> the diff driver
>> "foo" are defined by the configuration
>> variables in the
>> "diff.foo" section of the Git config
>> file.
>>
>> merge
>> String
>> 3-way merge is performed using the
>> specified custom
>> merge driver. The built-in 3-way merge
>> driver can be
>> explicitly specified by asking for
>> "text" driver; the
>> built-in "take the current branch"
>> driver can be
>> requested with "binary".
>>
>> Accordingly, from .git/config:
>>
>> [diff "uni"]
>> textconv = iconv -f UCS-2 -t UTF-8
>> binary = true
>>
>> [merge "uni"]
>> name = merge driver for UCS-2 encoded text files
>> driver = uni-merge %O %A %B
>>
>> And my uni-merge script:
>>
>> ---------
>> #!/bin/bash
>>
>> # Merge UCS-2 encoded text files.
>> # This is a git-merge driver. It will overwrite
>> "$MINE".
>> #
>> # Exit status:
>> # 0 -- no conflicts
>> # 1 -- conflicts
>> # 2 -- trouble
>>
>> OLD="$1"
>> MINE="$2"
>> YOURS="$3"
>>
>> # Set safe defaults for the trap handler.
>> OLD_UTF8=
>> MY_UTF8=
>> YOUR_UTF8=
>> NEW_UTF8=
>> EX=2
>>
>> trap 'rm -f -- "$OLD_UTF8" "$MY_UTF8" "$YOUR_UTF8"
>> "$NEW_UTF8"; exit $EX' EXIT set -e -u -C
>>
>> OLD_UTF8=$(mktemp)
>> iconv --from UCS-2 --to UTF-8 <"$OLD" >>"$OLD_UTF8"
>>
>> MY_UTF8=$(mktemp)
>> iconv --from UCS-2 --to UTF-8 <"$MINE" >>"$MY_UTF8"
>>
>> YOUR_UTF8=$(mktemp)
>> iconv --from UCS-2 --to UTF-8 <"$YOURS" >>"$YOUR_UTF8"
>>
>> NEW_UTF8=$(mktemp)
>> set +e
>> diff3 --merge --label=mine --label=old --label=yours --
>> \
>> "$MY_UTF8" "$OLD_UTF8" "$YOUR_UTF8" >>"$NEW_UTF8"
>> DIFFRET=$?
>> set -e
>>
>> if [ 0 -ne "$DIFFRET" ] && [ 1 -ne "$DIFFRET" ]; then
>> exit
>> fi
>>
>> iconv --from UTF-8 --to UCS-2 <"$NEW_UTF8" >|"$MINE"
>> EX=$DIFFRET
>> ---------
>>
>> My understanding is that the "--no-textconv" option
>> would undo the "textconv" that I set under [diff
>> "uni"].
>>
>> In other words, it tells git that, even if the
>> heuristics suggested
>> *some* conversion to text (I can't say what that would
>> be, in the general case), git should still treat the
>> file as binary. (At which point the --no-binary / --
>> binary options decide whether we get a base64-like
>> encoding, or just a message saying "binary files
>> differ").
>> Therefore I think (= agree with Jordan) that the effect
>> is the same as marking the affected file types in
>> .gitattributes as "-diff".
>>
>> I guess the relevant question is scope. "--no-textconv"
>> doesn't require us to assign "-diff" to specific file
>> suffixes, such as *.pdf, *.efi, and so on. But, it also
>> doesn't allow us to specify *some* text conversions,
>> even if we wanted to (it would override & disable them,
>> such as my "iconv" conversion when diffing UCS2 encoded
>> UNI files).
>>
>> Adding a .gitattributes file to the root of the
>> repository, and updating it whenever necessary (for
>> example when we add the first .pdf file, the first .efi
>> file...), is likely the best.
>>
>> (
>>
>> My recommendation at
>> <https://github.com/tianocore/tianocore.github.io/wiki/
>> Laszlo's-unkempt-git-guide-for-edk2-contributors-and-
>> maintainers#contrib-09>
>> may have targeted the local ".git/info/attributes" file
>> because people can have quite different opinions on how
>> git should act on various files. In theory, someone
>> having simplistic PDF files might even want:
>>
>> In the attributes:
>>
>> *.pdf diff=pdf
>>
>> In the config:
>>
>> [diff "pdf"]
>> textconv = sh -c 'pdftotext -- "$1" -' pdftotext-
>> wrapper
>> binary = true
>>
>> ("pdftotext" converts PDF to text, but its command line
>> is awkward -- the hyphen marking stdout must be
>> explicit, and must come as second operand. Hence the
>> wrapping into "sh" -- and then we need to spell out
>> $0 as well, for which I picked "pdftotext-wrapper". The
>> "--" separator is a safety measure against filenames
>> starting with "-", which could be mis-interpreted as
>> options. "--" terminates the option list and starts the
>> operand list.)
>>
>> )
>>
>> Thanks
>> Laszlo
>>
>>
>>> On 2019-08-01 17:13:14, Michael D Kinney wrote:
>>>> https://bugzilla.tianocore.org/show_bug.cgi?id=2044
>>>>
>>>> 'git show' is used to extrat the patch contents for
>> analysis.
>>>> Add the flag '--no-textconv' to the 'git show'
>> command to disable the
>>>> conversion from some binary file types to text
>> content.
>>>>
>>>> Without this change, binary files such as .pdf files
>> are converted to
>>>> text in the show command and PatchCheck complains
>> that the wrong line
>>>> endings are used in the patch.
>>>>
>>>> Cc: Bob Feng <bob.c.feng@intel.com>
>>>> Cc: Liming Gao <liming.gao@intel.com>
>>>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>>>> Signed-off-by: Michael D Kinney
>> <michael.d.kinney@intel.com>
>>>> ---
>>>> BaseTools/Scripts/PatchCheck.py | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/BaseTools/Scripts/PatchCheck.py
>>>> b/BaseTools/Scripts/PatchCheck.py index
>> 6b07241bfe..2a4e6f603e 100755
>>>> --- a/BaseTools/Scripts/PatchCheck.py
>>>> +++ b/BaseTools/Scripts/PatchCheck.py
>>>> @@ -543,7 +543,7 @@ class CheckGitCommits:
>>>>
>>>> def read_patch_from_git(self, commit):
>>>> # Run git to get the commit patch
>>>> - return self.run_git('show', '--
>> pretty=email', commit)
>>>> + return self.run_git('show', '--
>> pretty=email',
>>>> + '--no-textconv', commit)
>>>>
>>>> def run_git(self, *args):
>>>> cmd = [ 'git' ]
>>>> --
>>>> 2.21.0.windows.1
>>>>
>>
>>
>>
>
next prev parent reply other threads:[~2019-08-07 13:26 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-02 0:13 [Patch 0/3] BaseTools/PatchCheck: Fix false positives Michael D Kinney
2019-08-02 0:13 ` [Patch 1/3] BaseTools/PatchCheck: Ignore blank lines in diff Michael D Kinney
2019-08-02 0:13 ` [Patch 2/3] BaseTools/PatchCheck: Add copy from/to keywords Michael D Kinney
2019-08-02 0:13 ` [Patch 3/3] BaseTools/PatchCheck: Disable text conversion in 'git show' Michael D Kinney
2019-08-02 18:29 ` Jordan Justen
2019-08-02 18:51 ` Michael D Kinney
2019-08-02 23:57 ` Laszlo Ersek
2019-08-05 22:01 ` [edk2-devel] " Michael D Kinney
2019-08-07 13:26 ` Laszlo Ersek [this message]
2019-08-07 19:42 ` Jordan Justen
2019-08-08 20:57 ` Laszlo Ersek
2019-08-08 21:08 ` Michael D Kinney
2019-08-08 21:15 ` Andrew Fish
2019-08-08 22:34 ` Laszlo Ersek
2019-08-02 7:59 ` [edk2-devel] [Patch 0/3] BaseTools/PatchCheck: Fix false positives Bob Feng
2019-08-02 18:30 ` Jordan Justen
2019-08-07 11:41 ` Liming Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=944f7f77-5500-aa76-824a-8e207e404ed0@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox