From: "Bob Feng" <bob.c.feng@intel.com>
To: fengyunhua <fengyunhua@byosoft.com.cn>,
"devel@edk2.groups.io" <devel@edk2.groups.io>,
"Wang, Jian J" <jian.j.wang@intel.com>
Cc: 'Liming Gao' <gaoliming@byosoft.com.cn>,
"Chen, Christine" <yuwei.chen@intel.com>
Subject: Re: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation
Date: Tue, 20 Oct 2020 04:35:00 +0000 [thread overview]
Message-ID: <DM6PR11MB4073F921BEE6A4644E34C04EC91F0@DM6PR11MB4073.namprd11.prod.outlook.com> (raw)
In-Reply-To: <000001d6a5f5$90086c50$b01944f0$@byosoft.com.cn>
This patch is incompatible with python2.
https://docs.python.org/2.7/library/functions.html#open
open(name[, mode[, buffering]])
In Python2, open has no the Encoding argument
Thanks,
Bob
-----Original Message-----
From: fengyunhua <fengyunhua@byosoft.com.cn>
Sent: Monday, October 19, 2020 4:55 PM
To: devel@edk2.groups.io; Wang, Jian J <jian.j.wang@intel.com>
Cc: Feng, Bob C <bob.c.feng@intel.com>; 'Liming Gao' <gaoliming@byosoft.com.cn>; Chen, Christine <yuwei.chen@intel.com>
Subject: 回复: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation
Tested-by: Yunhua Feng <fengyunhua@byosoft.com.cn>
-----邮件原件-----
发件人: bounce+27952+66316+5049190+8953120@groups.io <bounce+27952+66316+5049190+8953120@groups.io> 代表 Wang, Jian J
发送时间: 2020年10月16日 15:41
收件人: devel@edk2.groups.io
抄送: Bob Feng <bob.c.feng@intel.com>; Liming Gao <gaoliming@byosoft.com.cn>; Yuwei Chen <yuwei.chen@intel.com>
主题: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation
The build tool reports failure upon file read, such as calling trim to clean preprocessed source files, if the tool is running on OS with non-western code-page and the source file has non-ascii characters.
Even if utf-8 has also problem when encountering some characters encoded in cp1252 (such 0x92, 0x96, 0xa0, etc).
Currently, the safest way to read file in python code is using 'latin-1' (iso-8859-1) because it uses every byte between 00-FF and then won't cause encoding/decoding issue. It behaves almost the same as reading file in binary mode.
cp1252 is similar to latin-1 but it doesn't support encoding '\x80'
to '\xff' and doesn't support decoding following bytes:
'\x81', '\x8d', '\x8f', '\x90', '\x9d'
So if there're utf-8/16 encoded characters in file, it will fail
sometimes.
Refer to following links for details:
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)
https://en.wikipedia.org/wiki/Windows-1252
https://kb.iu.edu/d/aepu
https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html
One can use following python code to verify this.
for i in range(0x100):
try:
chr(i).encode('latin-1')
except:
print(" %s cannot encode %02x" % ('latin-1', i))
for i in range(0x100):
try:
b = bytes([i])
b.decode('latin-1')
except:
print(" %s cannot decode %02x" % ('latin-1', i))
This patch add code to enforce using 'latin-1' as encoding argument of open() in function OpenLongFilePath(), if the open mode is for text file only. This can solve the file decoding issue completely.
The possible related BZs:
https://bugzilla.tianocore.org/show_bug.cgi?id=1434
https://bugzilla.tianocore.org/show_bug.cgi?id=1637
https://bugzilla.tianocore.org/show_bug.cgi?id=2578
https://bugzilla.tianocore.org/show_bug.cgi?id=2709
https://bugzilla.tianocore.org/show_bug.cgi?id=2829
Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Yuwei Chen <yuwei.chen@intel.com>
Signed-off-by: Jian J Wang <jian.j.wang@intel.com>
---
BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTools/Source/Python/Common/LongFilePathSupport.py
index 38c4396544..c8dce077f2 100644
--- a/BaseTools/Source/Python/Common/LongFilePathSupport.py
+++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py
@@ -30,7 +30,8 @@ def LongFilePath(FileName):
# wrap open to support opening a long file path
#
def OpenLongFilePath(FileName, Mode='r', Buffer= -1):
- return open(LongFilePath(FileName), Mode, Buffer)
+ Encoding = None if 'b' in Mode else 'latin-1'
+ return open(LongFilePath(FileName), Mode, Buffer, Encoding)
def CodecOpenLongFilePath(Filename, Mode='rb', Encoding=None, Errors='strict', Buffering=1):
return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buffering)
--
2.24.0.windows.2
-=-=-=-=-=-=
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316
Mute This Topic: https://groups.io/mt/77546105/5049190
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [fengyunhua@byosoft.com.cn]
-=-=-=-=-=-=
prev parent reply other threads:[~2020-10-20 4:35 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-16 7:41 [PATCH] BaseTools: fix decoding issue in file operation Wang, Jian J
2020-10-19 8:55 ` 回复: [edk2-devel] " fengyunhua
2020-10-20 4:35 ` Bob Feng [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM6PR11MB4073F921BEE6A4644E34C04EC91F0@DM6PR11MB4073.namprd11.prod.outlook.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox