From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.61]) by mx.groups.io with SMTP id smtpd.web10.3110.1575576550088420019 for ; Thu, 05 Dec 2019 12:09:10 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cuYfxXFm; spf=pass (domain: redhat.com, ip: 205.139.110.61, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575576549; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LVPGAOPS1F5/kvzbIDH8l4YDjMgcazJQ6i0cP0VcisY=; b=cuYfxXFmXXBaZs43alSpWx1tj4HqYRipdEudLSnzQWLsYQRvYdhiU06IvRQBO6i4aZZSYp 8d2fdEQbEDnqbbvS9LZC1ZypBH/mmEuEyzuNmtr1KmdV3GqJhFjYj0/HUsYKPg0+D80N2Y ce77Th2m3U3VcxCRxjjgF4obLa1pxoU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-6yuXgoGANumQG5w7jPOZpw-1; Thu, 05 Dec 2019 15:09:07 -0500 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A7611800C7B; Thu, 5 Dec 2019 20:09:06 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-62.ams2.redhat.com [10.36.116.62]) by smtp.corp.redhat.com (Postfix) with ESMTP id 723325D6AE; Thu, 5 Dec 2019 20:09:02 +0000 (UTC) Subject: Re: [edk2-devel] [RFC PATCH] BaseTools: Fix Python3 encoding issue in TestTools To: devel@edk2.groups.io, philmd@redhat.com Cc: Zhiju Fan , Bob Feng , Liming Gao References: <20191204213825.27644-1-philmd@redhat.com> <828306f1-755e-da8a-96f2-af85828e56a4@redhat.com> From: "Laszlo Ersek" Message-ID: <6d91d3ff-91c2-4d2f-1937-6ba09ca0c2af@redhat.com> Date: Thu, 5 Dec 2019 21:09:01 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <828306f1-755e-da8a-96f2-af85828e56a4@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-MC-Unique: 6yuXgoGANumQG5w7jPOZpw-1 X-Mimecast-Spam-Score: 0 Content-Language: en-US Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 12/05/19 19:36, Philippe Mathieu-Daud=C3=A9 wrote: > On 12/4/19 10:38 PM, Philippe Mathieu-Daude wrote: >> Under Centos 7.7 we get: >> >> =C2=A0=C2=A0 Build environment: >> Linux-3.10.0-1062.7.1.el7.x86_64-x86_64-with-centos-7.7.1908-Core >> =C2=A0=C2=A0 [...] >> =C2=A0=C2=A0 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D >> =C2=A0=C2=A0 ERROR: testRandomDataCycles (TianoCompress.Tests) >> =C2=A0=C2=A0 -----------------------------------------------------------= ----------- >> =C2=A0=C2=A0 Traceback (most recent call last): >> =C2=A0=C2=A0=C2=A0=C2=A0 File "edk2/BaseTools/Tests/TianoCompress.py", l= ine 60, in >> testRandomDataCycles >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 self.compressionTestCycle(data) >> =C2=A0=C2=A0=C2=A0=C2=A0 File "edk2/BaseTools/Tests/TianoCompress.py", l= ine 46, in >> compressionTestCycle >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 start =3D self.ReadTmpFile('input') >> =C2=A0=C2=A0=C2=A0=C2=A0 File "edk2/BaseTools/Tests/TestTools.py", line = 139, in ReadTmpFile >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 data =3D f.read() >> =C2=A0=C2=A0=C2=A0=C2=A0 File "/usr/lib64/python3.6/encodings/ascii.py",= line 26, in decode >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return codecs.ascii_decode(input, s= elf.errors)[0] >> =C2=A0=C2=A0 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in >> position 3: ordinal not in range(128) >> >> =C2=A0=C2=A0 -----------------------------------------------------------= ----------- >> >> Fix by specifying the UTF-8 encoding. >> >> Cc: Bob Feng >> Cc: Liming Gao >> Signed-off-by: Philippe Mathieu-Daude >> --- >> RFC because I'm not sure this is the best way to fix this, but >> this is similar to commit 31e3eeb5e3d2d. >> --- >> =C2=A0 BaseTools/Tests/TestTools.py | 2 +- >> =C2=A0 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/BaseTools/Tests/TestTools.py b/BaseTools/Tests/TestTools.py >> index 1099fd4eeaea..41cdb28b0c8c 100644 >> --- a/BaseTools/Tests/TestTools.py >> +++ b/BaseTools/Tests/TestTools.py >> @@ -135,7 +135,7 @@ class BaseToolsTest(unittest.TestCase): >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return open(os.pa= th.join(self.testDir, fileName), mode) >> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 def ReadTmpFile(self, fileName): >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 f =3D open(self.GetTmpFilePa= th(fileName), 'r') >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 f =3D codecs.open(self.GetTm= pFilePath(fileName), 'r', >> encoding=3D'utf-8') >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 data =3D f.read() >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 f.close() >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return data >> >=20 > While this fixes Python3, this also break Python2 :) >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > ERROR: testRandomDataCycles (TianoCompress.Tests) > ---------------------------------------------------------------------- > Traceback (most recent call last): > =C2=A0 File "edk2/BaseTools/Tests/TianoCompress.py", line 60, in > testRandomDataCycles > =C2=A0=C2=A0=C2=A0 self.compressionTestCycle(data) > =C2=A0 File "edk2/BaseTools/Tests/TianoCompress.py", line 46, in > compressionTestCycle > =C2=A0=C2=A0=C2=A0 start =3D self.ReadTmpFile('input') > =C2=A0 File "edk2/BaseTools/Tests/TestTools.py", line 139, in ReadTmpFile > =C2=A0=C2=A0=C2=A0 data =3D f.read() > =C2=A0 File "/usr/lib/python2.7/codecs.py", line 688, in read > =C2=A0=C2=A0=C2=A0 return self.reader.read(size) > =C2=A0 File "/usr/lib/python2.7/codecs.py", line 494, in read > =C2=A0=C2=A0=C2=A0 newchars, decodedbytes =3D self.decode(data, self.erro= rs) > UnicodeDecodeError: 'utf8' codec can't decode byte 0x85 in position 0: > invalid start byte >=20 > This old thread recommend to use io.open: > https://web.archive.org/web/20180715024113/https://mail.python.org/piperm= ail/python-list/2015-March/687124.html >=20 >=20 > And it works in with both 2/3 versions, so I'll respin. I didn't ask before (because, "commit 31e3eeb5e3d2d must have been right, right?"), but now I can't resist anymore: *why* do we have any such character in a *temporary* file's pathname that is not pure ASCII? It seems wrong. Thanks Laszlo