From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 36A6021A04804 for ; Fri, 28 Apr 2017 10:22:57 -0700 (PDT) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Apr 2017 10:22:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,388,1488873600"; d="scan'208";a="81888990" Received: from orsmsx102.amr.corp.intel.com ([10.22.225.129]) by orsmga004.jf.intel.com with ESMTP; 28 Apr 2017 10:22:56 -0700 Received: from orsmsx159.amr.corp.intel.com (10.22.240.24) by ORSMSX102.amr.corp.intel.com (10.22.225.129) with Microsoft SMTP Server (TLS) id 14.3.319.2; Fri, 28 Apr 2017 10:22:55 -0700 Received: from orsmsx113.amr.corp.intel.com ([169.254.9.59]) by ORSMSX159.amr.corp.intel.com ([169.254.11.110]) with mapi id 14.03.0319.002; Fri, 28 Apr 2017 10:22:55 -0700 From: "Kinney, Michael D" To: Tim Lewis , "edk2-devel@lists.01.org" , "Kinney, Michael D" CC: "Carsey, Jaben" , "Shaw, Kevin W" Thread-Topic: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM Thread-Index: AQHSvimGOHKniHBBck6SNVtE4IZySaHXS+yAgAD9tQD//59LUIAAh4iA//+MdrCAAHj7AP//kJzwABb66wAADXOrsP//qPcAgAKkcoCAAGvqUA== Date: Fri, 28 Apr 2017 17:22:54 +0000 Message-ID: References: <1493168839-11708-1-git-send-email-michael.d.kinney@intel.com> <1493168839-11708-2-git-send-email-michael.d.kinney@intel.com> <7236196A5DF6C040855A6D96F556A53F576347@msmail.insydesw.com.tw> <7236196A5DF6C040855A6D96F556A53F576544@msmail.insydesw.com.tw> <7236196A5DF6C040855A6D96F556A53F576599@msmail.insydesw.com.tw> <7236196A5DF6C040855A6D96F556A53F57683A@msmail.insydesw.com.tw> <7236196A5DF6C040855A6D96F556A53F576917@msmail.insydesw.com.tw> <7236196A5DF6C040855A6D96F556A53F5773D0@msmail.insydesw.com.tw> In-Reply-To: <7236196A5DF6C040855A6D96F556A53F5773D0@msmail.insydesw.com.tw> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYTMxMWVmYzAtNDMyNi00OGRhLTg0ZmUtZDVkMzg0Y2ZlNmMxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6InJlU2ZvUW1LdHc0aEtUVU5PTlM1aVdBZ1g0aVhSRGdNQTVRWFBldDVcLzRNPSJ9 dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.22.254.138] MIME-Version: 1.0 Subject: Re: [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Apr 2017 17:22:57 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Tim, Thanks for the additional review on this topic. I will push the UNI spec update. Mike > -----Original Message----- > From: Tim Lewis [mailto:tim.lewis@insyde.com] > Sent: Friday, April 28, 2017 9:48 AM > To: Tim Lewis ; Kinney, Michael D ; > edk2-devel@lists.01.org > Cc: Carsey, Jaben ; Shaw, Kevin W > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on dis= k to be UTF-8 > without a BOM >=20 > Mike -- >=20 > After an internal review, we have found that there are fewer files than p= reviously > thought affected by this change. >=20 > So we have no objections to updating the UNI Spec to match the current ED= K2 tool > behavior? >=20 > Thanks, >=20 > Tim >=20 > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Ti= m Lewis > Sent: Wednesday, April 26, 2017 5:27 PM > To: Kinney, Michael D ; edk2-devel@lists.01.o= rg > Cc: Carsey, Jaben ; Shaw, Kevin W > Subject: Re: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on dis= k to be UTF-8 > without a BOM >=20 > Mike -- >=20 > No, the meta-data (in this case, file extension .uni) was used by tools t= o determine > the format of the file contents, as described in section 2.6. Little-endi= an, UCS-2 was > assumed. >=20 > "When a higher-level protocol supplies mechanisms for handling the endian= ness of > integral data types, it is not necessary to use Unicode encoding schemes = or the byte > order mark. In those cases Unicode text is simply a sequence of integral = data types." >=20 > Of course, the tools had to be updated to accommodate different build sys= tems, and > even alternate encodings. But this doesn't remove the previous behavior. >=20 > Tim >=20 >=20 >=20 > -----Original Message----- > From: Kinney, Michael D [mailto:michael.d.kinney@intel.com] > Sent: Wednesday, April 26, 2017 5:02 PM > To: Tim Lewis ; edk2-devel@lists.01.org; Kinney, Mi= chael D > > Cc: Carsey, Jaben ; Shaw, Kevin W > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on dis= k to be UTF-8 > without a BOM >=20 > Hi Tim, >=20 > For UTF-16 files on disk with no BOM, do you follow the big-endian assump= tion as > documented in the Unicode Specification Section 3.10, D98? >=20 > http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf >=20 > Mike >=20 > > -----Original Message----- > > From: Tim Lewis [mailto:tim.lewis@insyde.com] > > Sent: Wednesday, April 26, 2017 4:13 PM > > To: Kinney, Michael D ; > > edk2-devel@lists.01.org > > Cc: Carsey, Jaben ; Shaw, Kevin W > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on > > disk to be > > UTF-8 without a BOM > > > > Mike -- > > > > I would prefer to update the docs to match actual industry practice. > > EDK2 is not the universe. > > > > Insyde has been using UNI files well before my time here (> 5 years). > > The fact that recent specifications or EDK2 tools (2 years) added BOM > > support it does not remove the backward compatibility issue. > > > > The Unicode specification usage of "not recommended" is referring > > specifically to its usage for byte-order. The full sentence (from 2.6) > > is: "Use of a BOM is neither required nor recommended [for byte order > > determination] for UTF-8, but may be encountered in contexts where > > UTF-8 data is converted from other encoding forms that use a BOM or > > where the BOM is used as a UTF-8 signature" Editorial comment mine. In = this case, > the BOM marker would appear as a UTF-8 signature. > > This would distinguish it from ASCII or any of the multi-byte encoding > > schemes used. > > > > Tim > > > > -----Original Message----- > > From: Kinney, Michael D [mailto:michael.d.kinney@intel.com] > > Sent: Wednesday, April 26, 2017 3:47 PM > > To: Tim Lewis ; edk2-devel@lists.01.org; Kinney, > > Michael D > > Cc: Carsey, Jaben ; Shaw, Kevin W > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on > > disk to be > > UTF-8 without a BOM > > > > Hi Tim, > > > > The recommendation for UTF-8 usage is to not use a BOM, which is why > > no BOM for > > UTF-8 was selected for EDK II. > > > > The current task is to update docs to match the current tool behavior. > > > > The EDK II repos on GitHub have .uni files in UTF-8 format without a > > BOM to support easier patch review. > > > > There are ways to use GIT features to auto-convert .uni files when > > pulling content from EDK II repos and pushing commits. > > That may or may not help with the specific issue you are raising. > > > > If you have ideas on a tool change request to EDK II that would > > provide compatibility with current EDK II tool behavior and support > > UTF-16LE without a BOM, then let's work that through in a Bugzilla > > feature request. If we find a solution, we can update the docs and too= ls again. > > > > Do you have any objections to updating the UNI Spec to match the > > current tool behavior? > > > > Thanks, > > > > Mike > > > > > -----Original Message----- > > > From: Tim Lewis [mailto:tim.lewis@insyde.com] > > > Sent: Wednesday, April 26, 2017 11:54 AM > > > To: Kinney, Michael D ; > > > edk2-devel@lists.01.org > > > Cc: Carsey, Jaben ; Shaw, Kevin W > > > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files > > > on disk to be > > > UTF-8 without a BOM > > > > > > Mike -- > > > > > > This is not about files in the EDK II repository. This is about > > > files created based on the spec, and created with other sets of > > > tools. Go back to early 2015, to the Build spec (1.22, etc.), > > > Appendix G, which is where the UNI stuff used to live. > > > > > > The point is: files which worked before, and, at worst, generated a > > > warning before, now are interpreted incorrectly even though they > > > have correct > > data. > > > > > > Making ASCII (or UTF-8) the default without a BOM is the breaking cha= nge. > > > > > > Tim > > > > > > -----Original Message----- > > > From: Kinney, Michael D [mailto:michael.d.kinney@intel.com] > > > Sent: Wednesday, April 26, 2017 11:47 AM > > > To: Tim Lewis ; edk2-devel@lists.01.org; > > > Kinney, Michael D > > > Cc: Carsey, Jaben ; Shaw, Kevin W > > > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files > > > on disk to be > > > UTF-8 without a BOM > > > > > > Tim, > > > > > > If you look at the entire file history of the EDK II, you will see > > > that the BOM has always been present in the UTF-16LE formatted files. > > > > > > The build tools were updated in 2015 to *add* support for UTF-8 file. > > > The .uni files in the EDK II project were then converted from > > > UTF-16LE with a BOM to UTF-8 without a BOM. This provided an easier > > > developer experience when using GIT to do email patch review of .uni = files. > > > > > > It is possible I am missing something here. Can you please provide > > > a pointer to the EDK II commit(s) where BOMs were added to UTF-16LE .= uni files. > > > > > > Thanks, > > > > > > Mike > > > > > > > -----Original Message----- > > > > From: Tim Lewis [mailto:tim.lewis@insyde.com] > > > > Sent: Wednesday, April 26, 2017 11:34 AM > > > > To: Kinney, Michael D ; > > > > edk2-devel@lists.01.org > > > > Cc: Carsey, Jaben ; Shaw, Kevin W > > > > > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files > > > > on disk to be > > > > UTF-8 without a BOM > > > > > > > > Mike -- > > > > > > > > I understand that EDK2 has decided to add BOM markers two years ago= . > > > > Adding a BOM didn't change the default. The problem is (a) there > > > > are still hundreds of files extant in our codebase which were > > > > created prior to the 2015 changes and still in use, and (b) this > > > > change is not backward > > > compatible for these files. > > > > > > > > Tim > > > > > > > > -----Original Message----- > > > > From: Kinney, Michael D [mailto:michael.d.kinney@intel.com] > > > > Sent: Wednesday, April 26, 2017 11:11 AM > > > > To: Tim Lewis ; edk2-devel@lists.01.org; > > > > Kinney, Michael D > > > > Cc: Carsey, Jaben ; Shaw, Kevin W > > > > > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files > > > > on disk to be > > > > UTF-8 without a BOM > > > > > > > > Hi Tim, > > > > > > > > This is not a request for a new change. Instead, the intent of > > > > this document change is to update the document to reflect the > > > > implemented behavior of the EDK II tools. The EDK II tool updates > > > > to add UTF-8 file support were completed with the patches listed > > > > below. Notice that the main one for normal build support was check= ed in almost > 2 years ago. > > > > > > > > BaseTools - UniClassObject - 6/23/2015 > > > > * > > > > https://github.com/tianocore/edk2/commit/d80e451b187c9d33cbd771253 > > > > fb > > > > d5 > > > > 119670f75c6 > > > > * > > > > https://github.com/tianocore/edk2/commit/be264422c95c781a345978f17 > > > > b7 > > > > e8 > > > > 0b91f816eda > > > > > > > > BaseTools - ECC - 12/29/2015 > > > > * > > > > https://github.com/tianocore/edk2/commit/975889279df2eb3d3338cb88a > > > > fb > > > > 3f > > > > aa71ddde4d6 > > > > > > > > BaseTools - UPT - 4/25/2016 > > > > * > > > > https://github.com/tianocore/edk2/commit/4a21fb3b67a0ef1655b43e936 > > > > 8b > > > > 6b > > > > 697bbf327af > > > > > > > > This was intended to be a 100% backwards compatible change. > > > > > > > > All .uni files in the EDK II project in UTF-16LE format have > > > > always use a > > BOM. > > > > Please checkout UDK2015 or older UDKs and you will see all .uni > > > > files start with 0xff 0xfe. > > > > > > > > Thanks, > > > > > > > > Mike > > > > > > > > > -----Original Message----- > > > > > From: Tim Lewis [mailto:tim.lewis@insyde.com] > > > > > Sent: Wednesday, April 26, 2017 9:15 AM > > > > > To: Kinney, Michael D ; > > > > > edk2-devel@lists.01.org > > > > > Cc: Carsey, Jaben ; Shaw, Kevin W > > > > > > > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni > > > > > files on disk to be > > > > > UTF-8 without a BOM > > > > > > > > > > Mike -- > > > > > > > > > > This breaks our existing build tools, which assume that a file > > > > > without a BOM is UTF-16. > > > > > > > > > > Tim > > > > > > > > > > -----Original Message----- > > > > > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On > > > > > Behalf Of Michael Kinney > > > > > Sent: Tuesday, April 25, 2017 6:07 PM > > > > > To: edk2-devel@lists.01.org > > > > > Cc: Jaben Carsey ; Kevin W Shaw > > > > > > > > > > Subject: [edk2] [edk2-UniSpecification PATCH] Allow .uni files > > > > > on disk to be UTF- > > > > > 8 without a BOM > > > > > > > > > > https://bugzilla.tianocore.org/show_bug.cgi?id=3D507 > > > > > > > > > > Cc: Jaben Carsey > > > > > Cc: Yonghong Zhu > > > > > Cc: Kevin W Shaw > > > > > Contributed-under: TianoCore Contribution Agreement 1.1 > > > > > Signed-off-by: Michael Kinney > > > > > --- > > > > > 2_unicode_strings_file_format.md | 9 ++++++--- > > > > > README.md | 27 ++++++++++++++------------= - > > > > > 2 files changed, 20 insertions(+), 16 deletions(-) > > > > > > > > > > diff --git a/2_unicode_strings_file_format.md > > > > > b/2_unicode_strings_file_format.md > > > > > index 0150c85..7a4a019 100644 > > > > > --- a/2_unicode_strings_file_format.md > > > > > +++ b/2_unicode_strings_file_format.md > > > > > @@ -33,7 +33,8 @@ > > > > > > > > > > EDK II Unicode files are used for mapping token names to > > > > > localized strings that are identified by an RFC4646 language code= . > > > > > The format for storing EDK II - Unicode files is UTF-16LE. The > > > > > character content must be > > > > UCS-2. > > > > > +Unicode files on disk is UTF-8 (without a BOM character) or > > > > > +UTF-16LE (with a BOM character). The character content must be U= CS-2. > > > > > > > > > > Strings ends are determined by the first of the following items = found: > > > > > > > > > > @@ -44,11 +45,13 @@ Strings ends are determined by the first of > > > > > the following items found: > > > > > > > > > > Comments may appear anywhere within the string file. > > > > > > > > > > -All the files must begin with a Unicode BOM character. > > > > > +All UTF-16LE files must begin with a Unicode BOM character. > > > > > +All UTF-8 files must not begin with a Unicode BOM character. > > > > > > > > > > ********** > > > > > **NOTE:** Please make sure you select an editor that supports > > > > > UCS-2 characters - that can be stored in a UTF-16LE file. > > > > > +that can be stored in either a UTF-8 (without a BOM character) > > > > > +or a UTF-16LE file (with a BOM character). > > > > > ********** > > > > > > > > > > ## 2.1 Common EBNF > > > > > diff --git a/README.md b/README.md index 63842a1..015aef1 100644 > > > > > --- a/README.md > > > > > +++ b/README.md > > > > > @@ -77,16 +77,17 @@ Copyright (c) 2016-2017, Intel Corporation. > > > > > All rights reserved. > > > > > > > > > > ### Revision History > > > > > > > > > > -| Revision | Description > > > > > | Date | > > > > > -| ----------------- | > > > > > -| ---------------------------------------------------------- > > > > > ------------------------------ | --------------- | > > > > > -| 1.0 | Initial Release. > > > > > | February 2014 | > > > > > -| 1.1 | Updated EBNF to follow syntax specified in= EBNF by > > the > > > > > ANTLR project. | August 2014 | > > > > > -| | Added content related to EDK II Meta-Data > > > > > -| Unicode > > > files. > > > > > | | > > > > > -| | Restructured document. > > > > > | | > > > > > -| | Removed security and C format GUID > > > > > -| definitions, not > > > > > required for HII or other UNI files. | | > > > > > -| | Removed invalid escape code sequences. > > > > > | | > > > > > -| 1.2 | Added optional font formatting > > > > > | September 2014 | > > > > > -| 1.2 Errata A | Correct misspelling of: `STR_PROPERTIES_MO= DULE_NAME` > > > > > | April 2015 | > > > > > -| 1.3 | Added: Syntax for non-ascii characters ins= ide quoted > > > > > strings. | March 2016 | > > > > > -| | Removed: Info on specific consumers (.INF > > > > > -| & > > > > > -| .DEC) > > > > removed. > > > > > | | > > > > > -| 1.4 | Convert to GitBook format > > > > > | March 2017 | > > > > > +| Revision | Description > > > > > | Date | > > > > > +| ----------------- | > > > > > +| ---------------------------------------------------------- > > > > > ------------------------------------------------------------ | > > > > > --------------- | > > > > > +| 1.0 | Initial Release. > > > > > | February 2014 | > > > > > +| 1.1 | Updated EBNF to follow syntax specified in= EBNF by > > the > > > > > ANTLR project. |= August > > 2014 > > > > > | > > > > > +| | Added content related to EDK II Meta-Data > > > > > +| Unicode > > > files. > > > > > | | > > > > > +| | Restructured document. > > > > > | | > > > > > +| | Removed security and C format GUID > > > > > +| definitions, not > > > > > required for HII or other UNI files. = | > > > > > | > > > > > +| | Removed invalid escape code sequences. > > > > > | | > > > > > +| 1.2 | Added optional font formatting > > > > > | September 2014 | > > > > > +| 1.2 Errata A | Correct misspelling of: `STR_PROPERTIES_MO= DULE_NAME` > > > > > | April 2015 | > > > > > +| 1.3 | Added: Syntax for non-ascii characters ins= ide quoted > > > > > strings. = | March > > > 2016 > > > > > | > > > > > +| | Removed: Info on specific consumers (.INF > > > > > +| & > > > > > +| .DEC) > > > > removed. > > > > > | | > > > > > +| 1.4 | Convert to GitBook format > > > > > | April 2017 | > > > > > +| | > > > > > +| [#507](https://bugzilla.tianocore.org/show_bug.cgi?id=3D507) > > > > > UNI Spec: Clarify that .uni files maybe UTF-8 without a BOM | > > > | > > > > > -- > > > > > 2.6.3.windows.1 > > > > > > > > > > _______________________________________________ > > > > > edk2-devel mailing list > > > > > edk2-devel@lists.01.org > > > > > https://lists.01.org/mailman/listinfo/edk2-devel > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel