From: "Wang, Jian J" <jian.j.wang@intel.com>
To: devel@edk2.groups.io
Cc: Bob Feng <bob.c.feng@intel.com>,
Liming Gao <gaoliming@byosoft.com.cn>,
Yuwei Chen <yuwei.chen@intel.com>
Subject: [PATCH] BaseTools: fix decoding issue in file operation
Date: Fri, 16 Oct 2020 15:41:24 +0800 [thread overview]
Message-ID: <20201016074124.831-1-jian.j.wang@intel.com> (raw)
The build tool reports failure upon file read, such as calling trim
to clean preprocessed source files, if the tool is running on OS with
non-western code-page and the source file has non-ascii characters.
Even if utf-8 has also problem when encountering some characters
encoded in cp1252 (such 0x92, 0x96, 0xa0, etc).
Currently, the safest way to read file in python code is using
'latin-1' (iso-8859-1) because it uses every byte between 00-FF
and then won't cause encoding/decoding issue. It behaves almost
the same as reading file in binary mode.
cp1252 is similar to latin-1 but it doesn't support encoding '\x80'
to '\xff' and doesn't support decoding following bytes:
'\x81', '\x8d', '\x8f', '\x90', '\x9d'
So if there're utf-8/16 encoded characters in file, it will fail
sometimes.
Refer to following links for details:
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)
https://en.wikipedia.org/wiki/Windows-1252
https://kb.iu.edu/d/aepu
https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html
One can use following python code to verify this.
for i in range(0x100):
try:
chr(i).encode('latin-1')
except:
print(" %s cannot encode %02x" % ('latin-1', i))
for i in range(0x100):
try:
b = bytes([i])
b.decode('latin-1')
except:
print(" %s cannot decode %02x" % ('latin-1', i))
This patch add code to enforce using 'latin-1' as encoding argument
of open() in function OpenLongFilePath(), if the open mode is for
text file only. This can solve the file decoding issue completely.
The possible related BZs:
https://bugzilla.tianocore.org/show_bug.cgi?id=1434
https://bugzilla.tianocore.org/show_bug.cgi?id=1637
https://bugzilla.tianocore.org/show_bug.cgi?id=2578
https://bugzilla.tianocore.org/show_bug.cgi?id=2709
https://bugzilla.tianocore.org/show_bug.cgi?id=2829
Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Yuwei Chen <yuwei.chen@intel.com>
Signed-off-by: Jian J Wang <jian.j.wang@intel.com>
---
BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTools/Source/Python/Common/LongFilePathSupport.py
index 38c4396544..c8dce077f2 100644
--- a/BaseTools/Source/Python/Common/LongFilePathSupport.py
+++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py
@@ -30,7 +30,8 @@ def LongFilePath(FileName):
# wrap open to support opening a long file path
#
def OpenLongFilePath(FileName, Mode='r', Buffer= -1):
- return open(LongFilePath(FileName), Mode, Buffer)
+ Encoding = None if 'b' in Mode else 'latin-1'
+ return open(LongFilePath(FileName), Mode, Buffer, Encoding)
def CodecOpenLongFilePath(Filename, Mode='rb', Encoding=None, Errors='strict', Buffering=1):
return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buffering)
--
2.24.0.windows.2
next reply other threads:[~2020-10-16 7:41 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-16 7:41 Wang, Jian J [this message]
2020-10-19 8:55 ` 回复: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation fengyunhua
2020-10-20 4:35 ` Bob Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201016074124.831-1-jian.j.wang@intel.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox