From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from msmail.insydesw.com.tw (ms.insydesw.com [211.75.113.220]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 981162095C6B6 for ; Wed, 26 Apr 2017 11:53:50 -0700 (PDT) Received: from msmail.insydesw.com.tw ([fe80::74f7:f173:f4aa:9a05]) by msmail.insydesw.com.tw ([fe80::74f7:f173:f4aa:9a05%11]) with mapi id 14.01.0438.000; Thu, 27 Apr 2017 02:53:48 +0800 From: Tim Lewis To: "Kinney, Michael D" , "edk2-devel@lists.01.org" CC: "Carsey, Jaben" , "Shaw, Kevin W" Thread-Topic: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM Thread-Index: AQHSvil74G8S0zh6EEy+8PD2lASjlqHWUHiAgAGDraD//5pvgIAAiZNQ//+AWICAAIY7cA== Date: Wed, 26 Apr 2017 18:53:48 +0000 Message-ID: <7236196A5DF6C040855A6D96F556A53F576599@msmail.insydesw.com.tw> References: <1493168839-11708-1-git-send-email-michael.d.kinney@intel.com> <1493168839-11708-2-git-send-email-michael.d.kinney@intel.com> <7236196A5DF6C040855A6D96F556A53F576347@msmail.insydesw.com.tw> <7236196A5DF6C040855A6D96F556A53F576544@msmail.insydesw.com.tw> In-Reply-To: Accept-Language: en-US, zh-TW X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.100.107] MIME-Version: 1.0 Subject: Re: [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Apr 2017 18:53:51 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Mike -- This is not about files in the EDK II repository. This is about files creat= ed based on the spec, and created with other sets of tools. Go back to earl= y 2015, to the Build spec (1.22, etc.), Appendix G, which is where the UNI = stuff used to live. The point is: files which worked before, and, at worst, generated a warning= before, now are interpreted incorrectly even though they have correct data= . Making ASCII (or UTF-8) the default without a BOM is the breaking change. Tim=20 -----Original Message----- From: Kinney, Michael D [mailto:michael.d.kinney@intel.com]=20 Sent: Wednesday, April 26, 2017 11:47 AM To: Tim Lewis ; edk2-devel@lists.01.org; Kinney, Mich= ael D Cc: Carsey, Jaben ; Shaw, Kevin W Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk = to be UTF-8 without a BOM Tim, If you look at the entire file history of the EDK II, you will see that the= BOM has always been present in the UTF-16LE formatted files. The build tools were updated in 2015 to *add* support for UTF-8 file. The .uni files in the EDK II project were then converted from UTF-16LE with= a BOM to UTF-8 without a BOM. This provided an easier developer experienc= e when using GIT to do email patch review of .uni files. It is possible I am missing something here. Can you please provide a point= er to the EDK II commit(s) where BOMs were added to UTF-16LE .uni files. Thanks, Mike > -----Original Message----- > From: Tim Lewis [mailto:tim.lewis@insyde.com] > Sent: Wednesday, April 26, 2017 11:34 AM > To: Kinney, Michael D ;=20 > edk2-devel@lists.01.org > Cc: Carsey, Jaben ; Shaw, Kevin W=20 > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on=20 > disk to be > UTF-8 without a BOM >=20 > Mike -- >=20 > I understand that EDK2 has decided to add BOM markers two years ago.=20 > Adding a BOM didn't change the default. The problem is (a) there are=20 > still hundreds of files extant in our codebase which were created=20 > prior to the 2015 changes and still in use, and (b) this change is not ba= ckward compatible for these files. >=20 > Tim >=20 > -----Original Message----- > From: Kinney, Michael D [mailto:michael.d.kinney@intel.com] > Sent: Wednesday, April 26, 2017 11:11 AM > To: Tim Lewis ; edk2-devel@lists.01.org; Kinney,=20 > Michael D > Cc: Carsey, Jaben ; Shaw, Kevin W=20 > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on=20 > disk to be > UTF-8 without a BOM >=20 > Hi Tim, >=20 > This is not a request for a new change. Instead, the intent of this=20 > document change is to update the document to reflect the implemented=20 > behavior of the EDK II tools. The EDK II tool updates to add UTF-8=20 > file support were completed with the patches listed below. Notice=20 > that the main one for normal build support was checked in almost 2 years = ago. >=20 > BaseTools - UniClassObject - 6/23/2015 > * > https://github.com/tianocore/edk2/commit/d80e451b187c9d33cbd771253fbd5 > 119670f75c6 > * > https://github.com/tianocore/edk2/commit/be264422c95c781a345978f17b7e8 > 0b91f816eda >=20 > BaseTools - ECC - 12/29/2015 > * > https://github.com/tianocore/edk2/commit/975889279df2eb3d3338cb88afb3f > aa71ddde4d6 >=20 > BaseTools - UPT - 4/25/2016 > * > https://github.com/tianocore/edk2/commit/4a21fb3b67a0ef1655b43e9368b6b > 697bbf327af >=20 > This was intended to be a 100% backwards compatible change. >=20 > All .uni files in the EDK II project in UTF-16LE format have always use a= BOM. > Please checkout UDK2015 or older UDKs and you will see all .uni files=20 > start with 0xff 0xfe. >=20 > Thanks, >=20 > Mike >=20 > > -----Original Message----- > > From: Tim Lewis [mailto:tim.lewis@insyde.com] > > Sent: Wednesday, April 26, 2017 9:15 AM > > To: Kinney, Michael D ;=20 > > edk2-devel@lists.01.org > > Cc: Carsey, Jaben ; Shaw, Kevin W=20 > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files=20 > > on disk to be > > UTF-8 without a BOM > > > > Mike -- > > > > This breaks our existing build tools, which assume that a file=20 > > without a BOM is UTF-16. > > > > Tim > > > > -----Original Message----- > > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf=20 > > Of Michael Kinney > > Sent: Tuesday, April 25, 2017 6:07 PM > > To: edk2-devel@lists.01.org > > Cc: Jaben Carsey ; Kevin W Shaw=20 > > > > Subject: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on=20 > > disk to be UTF- > > 8 without a BOM > > > > https://bugzilla.tianocore.org/show_bug.cgi?id=3D507 > > > > Cc: Jaben Carsey > > Cc: Yonghong Zhu > > Cc: Kevin W Shaw > > Contributed-under: TianoCore Contribution Agreement 1.1 > > Signed-off-by: Michael Kinney > > --- > > 2_unicode_strings_file_format.md | 9 ++++++--- > > README.md | 27 ++++++++++++++------------- > > 2 files changed, 20 insertions(+), 16 deletions(-) > > > > diff --git a/2_unicode_strings_file_format.md > > b/2_unicode_strings_file_format.md > > index 0150c85..7a4a019 100644 > > --- a/2_unicode_strings_file_format.md > > +++ b/2_unicode_strings_file_format.md > > @@ -33,7 +33,8 @@ > > > > EDK II Unicode files are used for mapping token names to localized=20 > > strings that are identified by an RFC4646 language code. The format=20 > > for storing EDK II - Unicode files is UTF-16LE. The character=20 > > content must be > UCS-2. > > +Unicode files on disk is UTF-8 (without a BOM character) or=20 > > +UTF-16LE (with a BOM character). The character content must be UCS-2. > > > > Strings ends are determined by the first of the following items found: > > > > @@ -44,11 +45,13 @@ Strings ends are determined by the first of the=20 > > following items found: > > > > Comments may appear anywhere within the string file. > > > > -All the files must begin with a Unicode BOM character. > > +All UTF-16LE files must begin with a Unicode BOM character. > > +All UTF-8 files must not begin with a Unicode BOM character. > > > > ********** > > **NOTE:** Please make sure you select an editor that supports UCS-2=20 > > characters - that can be stored in a UTF-16LE file. > > +that can be stored in either a UTF-8 (without a BOM character) or a=20 > > +UTF-16LE file (with a BOM character). > > ********** > > > > ## 2.1 Common EBNF > > diff --git a/README.md b/README.md > > index 63842a1..015aef1 100644 > > --- a/README.md > > +++ b/README.md > > @@ -77,16 +77,17 @@ Copyright (c) 2016-2017, Intel Corporation. All=20 > > rights reserved. > > > > ### Revision History > > > > -| Revision | Description > > | Date | > > -| ----------------- | > > -| ---------------------------------------------------------- > > ------------------------------ | --------------- | > > -| 1.0 | Initial Release. > > | February 2014 | > > -| 1.1 | Updated EBNF to follow syntax specified in EBNF = by the > > ANTLR project. | August 2014 | > > -| | Added content related to EDK II Meta-Data Unicod= e files. > > | | > > -| | Restructured document. > > | | > > -| | Removed security and C format GUID=20 > > -| definitions, not > > required for HII or other UNI files. | | > > -| | Removed invalid escape code sequences. > > | | > > -| 1.2 | Added optional font formatting > > | September 2014 | > > -| 1.2 Errata A | Correct misspelling of: `STR_PROPERTIES_MODULE_N= AME` > > | April 2015 | > > -| 1.3 | Added: Syntax for non-ascii characters inside qu= oted > > strings. | March 2016 | > > -| | Removed: Info on specific consumers (.INF &=20 > > -| .DEC) > removed. > > | | > > -| 1.4 | Convert to GitBook format > > | March 2017 | > > +| Revision | Description > > | Date | > > +| ----------------- | > > +| ---------------------------------------------------------- > > ------------------------------------------------------------ | > > --------------- | > > +| 1.0 | Initial Release. > > | February 2014 | > > +| 1.1 | Updated EBNF to follow syntax specified in EBNF = by the > > ANTLR project. | Augus= t 2014 > > | > > +| | Added content related to EDK II Meta-Data Unicod= e files. > > | | > > +| | Restructured document. > > | | > > +| | Removed security and C format GUID=20 > > +| definitions, not > > required for HII or other UNI files. | > > | > > +| | Removed invalid escape code sequences. > > | | > > +| 1.2 | Added optional font formatting > > | September 2014 | > > +| 1.2 Errata A | Correct misspelling of: `STR_PROPERTIES_MODULE_N= AME` > > | April 2015 | > > +| 1.3 | Added: Syntax for non-ascii characters inside qu= oted > > strings. | Mar= ch 2016 > > | > > +| | Removed: Info on specific consumers (.INF &=20 > > +| .DEC) > removed. > > | | > > +| 1.4 | Convert to GitBook format > > | April 2017 | > > +| | > > +| [#507](https://bugzilla.tianocore.org/show_bug.cgi?id=3D507) > > UNI Spec: Clarify that .uni files maybe UTF-8 without a BOM | = | > > -- > > 2.6.3.windows.1 > > > > _______________________________________________ > > edk2-devel mailing list > > edk2-devel@lists.01.org > > https://lists.01.org/mailman/listinfo/edk2-devel