From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail.byosoft.com.cn (mail.byosoft.com.cn [58.240.74.242])
 by mx.groups.io with SMTP id smtpd.web09.9604.1603097718508300281
 for <devel@edk2.groups.io>;
 Mon, 19 Oct 2020 01:55:19 -0700
Authentication-Results: mx.groups.io;
 dkim=missing; spf=none, err=permanent DNS error (domain: byosoft.com.cn, ip: 58.240.74.242, mailfrom: fengyunhua@byosoft.com.cn)
Received: from LAPTOP2AECFQIA ([58.246.60.130])
	(envelope-sender <fengyunhua@byosoft.com.cn>)
	by 192.168.6.13 with ESMTP
	for <bob.c.feng@intel.com>; Mon, 19 Oct 2020 16:55:15 +0800
X-WM-Sender: fengyunhua@byosoft.com.cn
X-WM-AuthFlag: YES
X-WM-AuthUser: fengyunhua@byosoft.com.cn
From: "fengyunhua" <fengyunhua@byosoft.com.cn>
To: <devel@edk2.groups.io>,
	<jian.j.wang@intel.com>
Cc: "'Bob Feng'" <bob.c.feng@intel.com>,
	"'Liming Gao'" <gaoliming@byosoft.com.cn>,
	"'Yuwei Chen'" <yuwei.chen@intel.com>
References: <20201016074124.831-1-jian.j.wang@intel.com>
In-Reply-To: <20201016074124.831-1-jian.j.wang@intel.com>
Subject: =?UTF-8?B?5Zue5aSNOiBbZWRrMi1kZXZlbF0gW1BBVENIXSBCYXNlVG9vbHM6IGZpeCBkZWNvZGluZyBpc3N1ZSBpbiBmaWxlIG9wZXJhdGlvbg==?=
Date: Mon, 19 Oct 2020 16:55:15 +0800
Message-ID: <000001d6a5f5$90086c50$b01944f0$@byosoft.com.cn>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQF5cbjZQyuTn0qE6hqlr6YwESuFG6pZBygQ
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Language: zh-cn

Tested-by: Yunhua Feng <fengyunhua@byosoft.com.cn>


-----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6-----
=E5=8F=91=E4=BB=B6=E4=BA=BA: =
bounce+27952+66316+5049190+8953120@groups.io =
<bounce+27952+66316+5049190+8953120@groups.io> =E4=BB=A3=E8=A1=A8 Wang, =
Jian J
=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: =
2020=E5=B9=B410=E6=9C=8816=E6=97=A5 15:41
=E6=94=B6=E4=BB=B6=E4=BA=BA: devel@edk2.groups.io
=E6=8A=84=E9=80=81: Bob Feng <bob.c.feng@intel.com>; Liming Gao =
<gaoliming@byosoft.com.cn>; Yuwei Chen <yuwei.chen@intel.com>
=E4=B8=BB=E9=A2=98: [edk2-devel] [PATCH] BaseTools: fix decoding issue =
in file operation

The build tool reports failure upon file read, such as calling trim
to clean preprocessed source files, if the tool is running on OS with
non-western code-page and the source file has non-ascii characters.

Even if utf-8 has also problem when encountering some characters
encoded in cp1252 (such 0x92, 0x96, 0xa0, etc).

Currently, the safest way to read file in python code is using
'latin-1' (iso-8859-1) because it uses every byte between 00-FF
and then won't cause encoding/decoding issue. It behaves almost
the same as reading file in binary mode.


cp1252 is similar to latin-1 but it doesn't support encoding '\x80'

to '\xff' and doesn't support decoding following bytes:


  '\x81', '\x8d', '\x8f', '\x90', '\x9d'


So if there're utf-8/16 encoded characters in file, it will fail

sometimes.


Refer to following links for details:

  https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)

  https://en.wikipedia.org/wiki/Windows-1252

  https://kb.iu.edu/d/aepu

  https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html


One can use following python code to verify this.

for i in range(0x100):
    try:
        chr(i).encode('latin-1')
    except:
        print("    %s cannot encode %02x" % ('latin-1', i))

for i in range(0x100):
    try:
        b =3D bytes([i])
        b.decode('latin-1')
    except:
        print("    %s cannot decode %02x" % ('latin-1', i))

This patch add code to enforce using 'latin-1' as encoding argument
of open() in function OpenLongFilePath(), if the open mode is for
text file only. This can solve the file decoding issue completely.


The possible related BZs:

    https://bugzilla.tianocore.org/show_bug.cgi?id=3D1434

    https://bugzilla.tianocore.org/show_bug.cgi?id=3D1637

    https://bugzilla.tianocore.org/show_bug.cgi?id=3D2578

    https://bugzilla.tianocore.org/show_bug.cgi?id=3D2709

    https://bugzilla.tianocore.org/show_bug.cgi?id=3D2829


Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Yuwei Chen <yuwei.chen@intel.com>
Signed-off-by: Jian J Wang <jian.j.wang@intel.com>
---
 BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py =
b/BaseTools/Source/Python/Common/LongFilePathSupport.py
index 38c4396544..c8dce077f2 100644
--- a/BaseTools/Source/Python/Common/LongFilePathSupport.py
+++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py
@@ -30,7 +30,8 @@ def LongFilePath(FileName):
 # wrap open to support opening a long file path

 #

 def OpenLongFilePath(FileName, Mode=3D'r', Buffer=3D -1):

-    return open(LongFilePath(FileName), Mode, Buffer)

+    Encoding =3D None if 'b' in Mode else 'latin-1'

+    return open(LongFilePath(FileName), Mode, Buffer, Encoding)

=20

 def CodecOpenLongFilePath(Filename, Mode=3D'rb', Encoding=3DNone, =
Errors=3D'strict', Buffering=3D1):

     return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, =
Buffering)

--=20
2.24.0.windows.2


-=3D-=3D-=3D-=3D-=3D-=3D
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316
Mute This Topic: https://groups.io/mt/77546105/5049190
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub =
[fengyunhua@byosoft.com.cn]
-=3D-=3D-=3D-=3D-=3D-=3D