From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail.byosoft.com.cn (mail.byosoft.com.cn [58.240.74.242]) by mx.groups.io with SMTP id smtpd.web09.9604.1603097718508300281 for ; Mon, 19 Oct 2020 01:55:19 -0700 Authentication-Results: mx.groups.io; dkim=missing; spf=none, err=permanent DNS error (domain: byosoft.com.cn, ip: 58.240.74.242, mailfrom: fengyunhua@byosoft.com.cn) Received: from LAPTOP2AECFQIA ([58.246.60.130]) (envelope-sender ) by 192.168.6.13 with ESMTP for ; Mon, 19 Oct 2020 16:55:15 +0800 X-WM-Sender: fengyunhua@byosoft.com.cn X-WM-AuthFlag: YES X-WM-AuthUser: fengyunhua@byosoft.com.cn From: "fengyunhua" To: , Cc: "'Bob Feng'" , "'Liming Gao'" , "'Yuwei Chen'" References: <20201016074124.831-1-jian.j.wang@intel.com> In-Reply-To: <20201016074124.831-1-jian.j.wang@intel.com> Subject: =?UTF-8?B?5Zue5aSNOiBbZWRrMi1kZXZlbF0gW1BBVENIXSBCYXNlVG9vbHM6IGZpeCBkZWNvZGluZyBpc3N1ZSBpbiBmaWxlIG9wZXJhdGlvbg==?= Date: Mon, 19 Oct 2020 16:55:15 +0800 Message-ID: <000001d6a5f5$90086c50$b01944f0$@byosoft.com.cn> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQF5cbjZQyuTn0qE6hqlr6YwESuFG6pZBygQ Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Language: zh-cn Tested-by: Yunhua Feng -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- =E5=8F=91=E4=BB=B6=E4=BA=BA: = bounce+27952+66316+5049190+8953120@groups.io = =E4=BB=A3=E8=A1=A8 Wang, = Jian J =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: = 2020=E5=B9=B410=E6=9C=8816=E6=97=A5 15:41 =E6=94=B6=E4=BB=B6=E4=BA=BA: devel@edk2.groups.io =E6=8A=84=E9=80=81: Bob Feng ; Liming Gao = ; Yuwei Chen =E4=B8=BB=E9=A2=98: [edk2-devel] [PATCH] BaseTools: fix decoding issue = in file operation The build tool reports failure upon file read, such as calling trim to clean preprocessed source files, if the tool is running on OS with non-western code-page and the source file has non-ascii characters. Even if utf-8 has also problem when encountering some characters encoded in cp1252 (such 0x92, 0x96, 0xa0, etc). Currently, the safest way to read file in python code is using 'latin-1' (iso-8859-1) because it uses every byte between 00-FF and then won't cause encoding/decoding issue. It behaves almost the same as reading file in binary mode. cp1252 is similar to latin-1 but it doesn't support encoding '\x80' to '\xff' and doesn't support decoding following bytes: '\x81', '\x8d', '\x8f', '\x90', '\x9d' So if there're utf-8/16 encoded characters in file, it will fail sometimes. Refer to following links for details: https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block) https://en.wikipedia.org/wiki/Windows-1252 https://kb.iu.edu/d/aepu https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html One can use following python code to verify this. for i in range(0x100): try: chr(i).encode('latin-1') except: print(" %s cannot encode %02x" % ('latin-1', i)) for i in range(0x100): try: b =3D bytes([i]) b.decode('latin-1') except: print(" %s cannot decode %02x" % ('latin-1', i)) This patch add code to enforce using 'latin-1' as encoding argument of open() in function OpenLongFilePath(), if the open mode is for text file only. This can solve the file decoding issue completely. The possible related BZs: https://bugzilla.tianocore.org/show_bug.cgi?id=3D1434 https://bugzilla.tianocore.org/show_bug.cgi?id=3D1637 https://bugzilla.tianocore.org/show_bug.cgi?id=3D2578 https://bugzilla.tianocore.org/show_bug.cgi?id=3D2709 https://bugzilla.tianocore.org/show_bug.cgi?id=3D2829 Cc: Bob Feng Cc: Liming Gao Cc: Yuwei Chen Signed-off-by: Jian J Wang --- BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py = b/BaseTools/Source/Python/Common/LongFilePathSupport.py index 38c4396544..c8dce077f2 100644 --- a/BaseTools/Source/Python/Common/LongFilePathSupport.py +++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py @@ -30,7 +30,8 @@ def LongFilePath(FileName): # wrap open to support opening a long file path # def OpenLongFilePath(FileName, Mode=3D'r', Buffer=3D -1): - return open(LongFilePath(FileName), Mode, Buffer) + Encoding =3D None if 'b' in Mode else 'latin-1' + return open(LongFilePath(FileName), Mode, Buffer, Encoding) =20 def CodecOpenLongFilePath(Filename, Mode=3D'rb', Encoding=3DNone, = Errors=3D'strict', Buffering=3D1): return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, = Buffering) --=20 2.24.0.windows.2 -=3D-=3D-=3D-=3D-=3D-=3D Groups.io Links: You receive all messages sent to this group. View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316 Mute This Topic: https://groups.io/mt/77546105/5049190 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub = [fengyunhua@byosoft.com.cn] -=3D-=3D-=3D-=3D-=3D-=3D