public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Laszlo Ersek" <lersek@redhat.com>
To: "devel@edk2.groups.io" <devel@edk2.groups.io>,
	"zhijux.fan@intel.com" <zhijux.fan@intel.com>
Cc: "Gao, Liming" <liming.gao@intel.com>,
	"Feng, Bob C" <bob.c.feng@intel.com>,
	Leif Lindholm <leif.lindholm@linaro.org>
Subject: Re: [edk2-devel] [PATCH] BaseTools:fixed Build failed issue for Non-English OS
Date: Thu, 21 Nov 2019 18:53:20 +0100	[thread overview]
Message-ID: <95a79c71-2eb4-af97-a738-170a2668599d@redhat.com> (raw)
In-Reply-To: <FAD0D7E0AE0FA54D987F6E72435CAFD50AFE1622@SHSMSX101.ccr.corp.intel.com>

(+Leif)

On 11/20/19 12:27, Fan, ZhijuX wrote:
> BZ:https://bugzilla.tianocore.org/show_bug.cgi?id=2365
>
> Build failed on Non-English OS if structurePcd is used in platform
> dsc file.
> When the output of some functions is converted to code,
> Because different OS Character encoding form differently,
> there may be problems with some functions
>
> The patch is going to fixed this issue
>
> Cc: Liming Gao <liming.gao@intel.com>
> Cc: Bob Feng <bob.c.feng@intel.com>
> Signed-off-by: Zhiju.Fan <zhijux.fan@intel.com>
> ---
>  BaseTools/Source/Python/Workspace/DscBuildData.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/BaseTools/Source/Python/Workspace/DscBuildData.py b/BaseTools/Source/Python/Workspace/DscBuildData.py
> index 9192077f90..901d95a413 100644
> --- a/BaseTools/Source/Python/Workspace/DscBuildData.py
> +++ b/BaseTools/Source/Python/Workspace/DscBuildData.py
> @@ -1752,7 +1752,7 @@ class DscBuildData(PlatformBuildClassObject):
>          except:
>              EdkLogger.error('Build', COMMAND_FAILURE, 'Can not execute command: %s' % Command)
>          Result = Process.communicate()
> -        return Process.returncode, Result[0].decode(), Result[1].decode()
> +        return Process.returncode, Result[0].decode(errors='ignore'), Result[1].decode(errors='ignore')
>
>      @staticmethod
>      def IntToCString(Value, ValueSize):

Let me quote the full method (pre-patch):


    @staticmethod
    def ExecuteCommand (Command):
        try:
            Process = subprocess.Popen(Command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
        except:
            EdkLogger.error('Build', COMMAND_FAILURE, 'Can not execute command: %s' % Command)
        Result = Process.communicate()
        return Process.returncode, Result[0].decode(), Result[1].decode()

Relevant documentation:

- https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate
- https://docs.python.org/3/library/stdtypes.html#bytes.decode
- https://docs.python.org/3/library/codecs.html#error-handlers

So we launch a subprocess, read its stdout and stderr as byte arrays,
and then decode those byte arrays (interpreted in UTF-8) to string (=
sequences of unicode code points).

If the byte arrays turn out to be invalid (per UTF-8), we currently
fail.

With the patch, we don't fail, but return garbage (or I guess "partial
data" -- see the docs: "malformed data is ignored and encoding or
decoding is continued without further notice").

I think this patch is wrong. Here's why:

- I don't see why the symptom is inherently tied to structured PCDs.

- "Non-English OS" is not precise; it comes down to the encoding of the
  byte stream that the subprocess produces. Even if the current locale
  specifies English-language messages, the problem can surface if the
  subprocess prints *symbols*, with non-UTF-8 encoding. Conversely, if
  the locale specifies non-English messages, but the subprocess prints
  them in correct UTF-8 encoding, the problem will not arise.

- I think the code should attempt to decode the stdout / stderr byte
  arrays using the LC_CTYPE locale category (not fixed UTF-8), and if
  *even that* fails, we should use errors='replace' (or another
  replacement method).


... Finally, please compare commit 8ddec24dea74 ("BaseTools:Updata the
output encoding of the Popen function", 2019-08-01). At this point, I'm
completely confused. Consider:

(a) Commit 8ddec24dea74 removed the "encoding='utf-8'" parameter --
    which has no effect, as "encoding='utf-8'" is the default.

(b) It also removed "errors='ignore'". In effect, that change set
    'errors="strict"'.

(c) Okay... so the change in said commit actually means "be strict about
    UTF-8 decoding failures".  But then the *message* on commit
    8ddec24dea74, "Not all output works in utf-8, so change the encoding
    to the default", makes no sense at all!

(d) Given the above, the present patch, in effect, *reverts* the last
    hunk of commit 8ddec24dea74. The "encoding='utf-8'" parameter
    remains the default, and the patch basically says, "oh yea, you know
    what, I think I'd still like to ignore UTF-8 decoding errors, like I
    used to do before 8ddec24dea74".

In short... 8ddec24dea74 was wrong, at least in part, and now we're
reverting it (in part). The resultant code is still quite problematic in
my opinion (because we shouldn't *completely* ignore invalid UTF-8
sequences), but even if we do what the present patch proposes, can we at
least point out that we are *reverting* part of 8ddec24dea74? And drop
the "structure PCD" and "non-English OS" language?

Thanks
Laszlo


  parent reply	other threads:[~2019-11-21 17:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-20 11:27 [PATCH] BaseTools:fixed Build failed issue for Non-English OS Fan, ZhijuX
2019-11-21  7:08 ` Bob Feng
2019-11-21 17:53 ` Laszlo Ersek [this message]
2019-11-22  6:39   ` [edk2-devel] " Bob Feng
2019-11-22 16:47     ` Laszlo Ersek
2019-11-25  1:37       ` Bob Feng
2019-11-25 12:12         ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=95a79c71-2eb4-af97-a738-170a2668599d@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox