public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "fengyunhua" <fengyunhua@byosoft.com.cn>
To: <devel@edk2.groups.io>, <jian.j.wang@intel.com>
Cc: "'Bob Feng'" <bob.c.feng@intel.com>,
	"'Liming Gao'" <gaoliming@byosoft.com.cn>,
	"'Yuwei Chen'" <yuwei.chen@intel.com>
Subject: 回复: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation
Date: Mon, 19 Oct 2020 16:55:15 +0800	[thread overview]
Message-ID: <000001d6a5f5$90086c50$b01944f0$@byosoft.com.cn> (raw)
In-Reply-To: <20201016074124.831-1-jian.j.wang@intel.com>

Tested-by: Yunhua Feng <fengyunhua@byosoft.com.cn>


-----邮件原件-----
发件人: bounce+27952+66316+5049190+8953120@groups.io <bounce+27952+66316+5049190+8953120@groups.io> 代表 Wang, Jian J
发送时间: 2020年10月16日 15:41
收件人: devel@edk2.groups.io
抄送: Bob Feng <bob.c.feng@intel.com>; Liming Gao <gaoliming@byosoft.com.cn>; Yuwei Chen <yuwei.chen@intel.com>
主题: [edk2-devel] [PATCH] BaseTools: fix decoding issue in file operation

The build tool reports failure upon file read, such as calling trim
to clean preprocessed source files, if the tool is running on OS with
non-western code-page and the source file has non-ascii characters.

Even if utf-8 has also problem when encountering some characters
encoded in cp1252 (such 0x92, 0x96, 0xa0, etc).

Currently, the safest way to read file in python code is using
'latin-1' (iso-8859-1) because it uses every byte between 00-FF
and then won't cause encoding/decoding issue. It behaves almost
the same as reading file in binary mode.



cp1252 is similar to latin-1 but it doesn't support encoding '\x80'

to '\xff' and doesn't support decoding following bytes:



  '\x81', '\x8d', '\x8f', '\x90', '\x9d'


So if there're utf-8/16 encoded characters in file, it will fail

sometimes.



Refer to following links for details:

  https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)

  https://en.wikipedia.org/wiki/Windows-1252

  https://kb.iu.edu/d/aepu

  https://www.i18nqa.com/debug/table-iso8859-1-vs-windows-1252.html


One can use following python code to verify this.

for i in range(0x100):
    try:
        chr(i).encode('latin-1')
    except:
        print("    %s cannot encode %02x" % ('latin-1', i))

for i in range(0x100):
    try:
        b = bytes([i])
        b.decode('latin-1')
    except:
        print("    %s cannot decode %02x" % ('latin-1', i))

This patch add code to enforce using 'latin-1' as encoding argument
of open() in function OpenLongFilePath(), if the open mode is for
text file only. This can solve the file decoding issue completely.


The possible related BZs:

    https://bugzilla.tianocore.org/show_bug.cgi?id=1434

    https://bugzilla.tianocore.org/show_bug.cgi?id=1637

    https://bugzilla.tianocore.org/show_bug.cgi?id=2578

    https://bugzilla.tianocore.org/show_bug.cgi?id=2709

    https://bugzilla.tianocore.org/show_bug.cgi?id=2829


Cc: Bob Feng <bob.c.feng@intel.com>
Cc: Liming Gao <gaoliming@byosoft.com.cn>
Cc: Yuwei Chen <yuwei.chen@intel.com>
Signed-off-by: Jian J Wang <jian.j.wang@intel.com>
---
 BaseTools/Source/Python/Common/LongFilePathSupport.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/BaseTools/Source/Python/Common/LongFilePathSupport.py b/BaseTools/Source/Python/Common/LongFilePathSupport.py
index 38c4396544..c8dce077f2 100644
--- a/BaseTools/Source/Python/Common/LongFilePathSupport.py
+++ b/BaseTools/Source/Python/Common/LongFilePathSupport.py
@@ -30,7 +30,8 @@ def LongFilePath(FileName):
 # wrap open to support opening a long file path

 #

 def OpenLongFilePath(FileName, Mode='r', Buffer= -1):

-    return open(LongFilePath(FileName), Mode, Buffer)

+    Encoding = None if 'b' in Mode else 'latin-1'

+    return open(LongFilePath(FileName), Mode, Buffer, Encoding)

 

 def CodecOpenLongFilePath(Filename, Mode='rb', Encoding=None, Errors='strict', Buffering=1):

     return codecs.open(LongFilePath(Filename), Mode, Encoding, Errors, Buffering)

-- 
2.24.0.windows.2



-=-=-=-=-=-=
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#66316): https://edk2.groups.io/g/devel/message/66316
Mute This Topic: https://groups.io/mt/77546105/5049190
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [fengyunhua@byosoft.com.cn]
-=-=-=-=-=-=





  reply	other threads:[~2020-10-19  8:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-16  7:41 [PATCH] BaseTools: fix decoding issue in file operation Wang, Jian J
2020-10-19  8:55 ` fengyunhua [this message]
2020-10-20  4:35   ` [edk2-devel] " Bob Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='000001d6a5f5$90086c50$b01944f0$@byosoft.com.cn' \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox