From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 9D93220082F16 for ; Wed, 26 Apr 2017 11:25:36 -0700 (PDT) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Apr 2017 11:25:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,255,1488873600"; d="scan'208";a="1161337564" Received: from orsmsx104.amr.corp.intel.com ([10.22.225.131]) by fmsmga002.fm.intel.com with ESMTP; 26 Apr 2017 11:25:35 -0700 Received: from orsmsx113.amr.corp.intel.com ([169.254.9.59]) by ORSMSX104.amr.corp.intel.com ([169.254.4.196]) with mapi id 14.03.0319.002; Wed, 26 Apr 2017 11:25:35 -0700 From: "Kinney, Michael D" To: Tim Lewis , "Carsey, Jaben" , "edk2-devel@lists.01.org" , "Kinney, Michael D" CC: "Shaw, Kevin W" Thread-Topic: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM Thread-Index: AQHSvimGOHKniHBBck6SNVtE4IZySaHXS+yAgAD9tQCAABjugIAAAnCA//+QMJA= Date: Wed, 26 Apr 2017 18:25:34 +0000 Message-ID: References: <1493168839-11708-1-git-send-email-michael.d.kinney@intel.com> <1493168839-11708-2-git-send-email-michael.d.kinney@intel.com> <7236196A5DF6C040855A6D96F556A53F576347@msmail.insydesw.com.tw> <7236196A5DF6C040855A6D96F556A53F5764A8@msmail.insydesw.com.tw> In-Reply-To: <7236196A5DF6C040855A6D96F556A53F5764A8@msmail.insydesw.com.tw> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNzg3NTU0ODctYWE5OC00YTEzLTk5NTctOTNmMDUyYjZhNThmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IjR6MXhFcU5CajcwZG1VMlNwdFd2dEZIUWpNVm94MkcrTXI3aXNnUStGYk09In0= dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [10.22.254.139] MIME-Version: 1.0 Subject: Re: [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Apr 2017 18:25:36 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Tim, The document change request under review here is against this 1.3 spec. Here is the document history on this topic I have been able to find.=20 The Multi-String .UNI File Format Specification Version 1.3, March 2016 https://github.com/tianocore/tianocore.github.io/wiki/EDK%20II%20Specificat= ions has the following 2 statements in CH 2: * The format for storing EDK II Unicode files is UTF-16LE * All the files must begin with a Unicode BOM character. The Multi-String .UNI File Format Specification Version 1.2 Errata A, April= 2015 https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-Specifications= -Archived has the same 2 statements in CH 2: * The format for storing EDK II Unicode files is UTF-16LE * All the files must begin with a Unicode BOM character. The Multi-String .UNI File Format Specification, Revision 1.0, February 201= 4 http://cran.org.uk/edk2/docs/specs/UNI_File_Spec_1_0.pdf has the following statements in CH2: * All the files must begin with the binary character, 0xFEFF (big-endian). I do not see any versions of the .UNI spec that do not require a BOM. Mike > -----Original Message----- > From: Tim Lewis [mailto:tim.lewis@insyde.com] > Sent: Wednesday, April 26, 2017 10:53 AM > To: Carsey, Jaben ; Kinney, Michael D > ; edk2-devel@lists.01.org > Cc: Shaw, Kevin W > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on dis= k to be > UTF-8 without a BOM >=20 > The original UNI specifications (for example, the Multi-String .UNI File = Format > Specification, February 2014, Revision 1.0) did not require it, and the f= act is > that tools accept files without the BOM happily today. >=20 > I believe that requiring the BOM is a good step forward, but assuming UTF= -8 when > one is not present won't help the vast quantities of existing UNI files o= ut > there. >=20 > Tim >=20 > -----Original Message----- > From: Carsey, Jaben [mailto:jaben.carsey@intel.com] > Sent: Wednesday, April 26, 2017 10:45 AM > To: Tim Lewis ; Kinney, Michael D > ; edk2-devel@lists.01.org > Cc: Shaw, Kevin W > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on dis= k to be > UTF-8 without a BOM >=20 > Tim, >=20 > Doesn't that assumption/behavior violate the current spec? > "All the files must begin with a Unicode BOM character." >=20 > -Jaben >=20 > > -----Original Message----- > > From: Tim Lewis [mailto:tim.lewis@insyde.com] > > Sent: Wednesday, April 26, 2017 9:15 AM > > To: Kinney, Michael D ; edk2- > > devel@lists.01.org > > Cc: Carsey, Jaben ; Shaw, Kevin W > > > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on > > disk to be UTF-8 without a BOM > > Importance: High > > > > Mike -- > > > > This breaks our existing build tools, which assume that a file without > > a BOM is UTF-16. > > > > Tim > > > > -----Original Message----- > > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of > > Michael Kinney > > Sent: Tuesday, April 25, 2017 6:07 PM > > To: edk2-devel@lists.01.org > > Cc: Jaben Carsey ; Kevin W Shaw > > > > Subject: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk > > to be > > UTF-8 without a BOM > > > > https://bugzilla.tianocore.org/show_bug.cgi?id=3D507 > > > > Cc: Jaben Carsey > > Cc: Yonghong Zhu > > Cc: Kevin W Shaw > > Contributed-under: TianoCore Contribution Agreement 1.1 > > Signed-off-by: Michael Kinney > > --- > > 2_unicode_strings_file_format.md | 9 ++++++--- > > README.md | 27 ++++++++++++++------------- > > 2 files changed, 20 insertions(+), 16 deletions(-) > > > > diff --git a/2_unicode_strings_file_format.md > > b/2_unicode_strings_file_format.md > > index 0150c85..7a4a019 100644 > > --- a/2_unicode_strings_file_format.md > > +++ b/2_unicode_strings_file_format.md > > @@ -33,7 +33,8 @@ > > > > EDK II Unicode files are used for mapping token names to localized > > strings that are identified by an RFC4646 language code. The format > > for storing EDK II -Unicode files is UTF-16LE. The character content mu= st be > UCS-2. > > +Unicode files on disk is UTF-8 (without a BOM character) or UTF-16LE > > +(with a BOM character). The character content must be UCS-2. > > > > Strings ends are determined by the first of the following items found: > > > > @@ -44,11 +45,13 @@ Strings ends are determined by the first of the > > following items found: > > > > Comments may appear anywhere within the string file. > > > > -All the files must begin with a Unicode BOM character. > > +All UTF-16LE files must begin with a Unicode BOM character. > > +All UTF-8 files must not begin with a Unicode BOM character. > > > > ********** > > **NOTE:** Please make sure you select an editor that supports UCS-2 > > characters -that can be stored in a UTF-16LE file. > > +that can be stored in either a UTF-8 (without a BOM character) or a > > +UTF-16LE file (with a BOM character). > > ********** > > > > ## 2.1 Common EBNF > > diff --git a/README.md b/README.md > > index 63842a1..015aef1 100644 > > --- a/README.md > > +++ b/README.md > > @@ -77,16 +77,17 @@ Copyright (c) 2016-2017, Intel Corporation. All > > rights reserved. > > > > ### Revision History > > > > -| Revision | Description > | Date > > | > > -| ----------------- | > > -| ------------------------------------------------------------------- > > -| ---- > > ----------------- | --------------- | > > -| 1.0 | Initial Release. > | February > > 2014 | > > -| 1.1 | Updated EBNF to follow syntax specified in EBNF = by the > ANTLR > > project. | August 2014 | > > -| | Added content related to EDK II Meta-Data Unicod= e files. > > | | > > -| | Restructured document. > | > > | > > -| | Removed security and C format GUID definitions, > > -| not required > > for HII or other UNI files. | | > > -| | Removed invalid escape code sequences. > > | | > > -| 1.2 | Added optional font formatting > | > > September 2014 | > > -| 1.2 Errata A | Correct misspelling of: > > `STR_PROPERTIES_MODULE_NAME` | Apri= l 2015 > | > > -| 1.3 | Added: Syntax for non-ascii characters inside qu= oted > strings. > > | March 2016 | > > -| | Removed: Info on specific consumers (.INF & .DEC= ) > removed. > > | | > > -| 1.4 | Convert to GitBook format > | > > March 2017 | > > +| Revision | Description > > | Date | > > +| ----------------- | > > +| ------------------------------------------------------------------- > > +| --- > > ------------------------------------------------ | --------------- | > > +| 1.0 | Initial Release. > > | February 2014 | > > +| 1.1 | Updated EBNF to follow syntax specified in EBNF = by the > > ANTLR project. | Augus= t 2014 > | > > +| | Added content related to EDK II Meta-Data Unicod= e files. > > | | > > +| | Restructured document. > > | | > > +| | Removed security and C format GUID definitions, > > +| not required > > for HII or other UNI files. | = | > > +| | Removed invalid escape code sequences. > > | | > > +| 1.2 | Added optional font formatting > > | September 2014 | > > +| 1.2 Errata A | Correct misspelling of: > > `STR_PROPERTIES_MODULE_NAME` > | April > > 2015 | > > +| 1.3 | Added: Syntax for non-ascii characters inside qu= oted > strings. > > | March 2016 | > > +| | Removed: Info on specific consumers (.INF & .DEC= ) > removed. > > | | > > +| 1.4 | Convert to GitBook format > > | April 2017 | > > +| | > > +| [#507](https://bugzilla.tianocore.org/show_bug.cgi?id=3D507) > > UNI Spec: Clarify that .uni files maybe UTF-8 without a BOM | = | > > -- > > 2.6.3.windows.1 > > > > _______________________________________________ > > edk2-devel mailing list > > edk2-devel@lists.01.org > > https://lists.01.org/mailman/listinfo/edk2-devel