From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from msmail.insydesw.com.tw (ms.insydesw.com [211.75.113.220]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 105C7219493DE for ; Wed, 26 Apr 2017 10:53:22 -0700 (PDT) Received: from msmail.insydesw.com.tw ([fe80::74f7:f173:f4aa:9a05]) by msmail.insydesw.com.tw ([fe80::74f7:f173:f4aa:9a05%11]) with mapi id 14.01.0438.000; Thu, 27 Apr 2017 01:53:19 +0800 From: Tim Lewis To: "Carsey, Jaben" , "Kinney, Michael D" , "edk2-devel@lists.01.org" CC: "Shaw, Kevin W" Thread-Topic: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM Thread-Index: AQHSvil74G8S0zh6EEy+8PD2lASjlqHWUHiAgAGDraD//5L1gIAAhtYg Date: Wed, 26 Apr 2017 17:53:19 +0000 Message-ID: <7236196A5DF6C040855A6D96F556A53F5764A8@msmail.insydesw.com.tw> References: <1493168839-11708-1-git-send-email-michael.d.kinney@intel.com> <1493168839-11708-2-git-send-email-michael.d.kinney@intel.com> <7236196A5DF6C040855A6D96F556A53F576347@msmail.insydesw.com.tw> In-Reply-To: Accept-Language: en-US, zh-TW X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [192.168.100.107] MIME-Version: 1.0 Subject: Re: [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Apr 2017 17:53:22 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable The original UNI specifications (for example, the Multi-String .UNI File Fo= rmat Specification, February 2014, Revision 1.0) did not require it, and th= e fact is that tools accept files without the BOM happily today. I believe that requiring the BOM is a good step forward, but assuming UTF-8= when one is not present won't help the vast quantities of existing UNI fil= es out there. Tim -----Original Message----- From: Carsey, Jaben [mailto:jaben.carsey@intel.com]=20 Sent: Wednesday, April 26, 2017 10:45 AM To: Tim Lewis ; Kinney, Michael D ; edk2-devel@lists.01.org Cc: Shaw, Kevin W Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk = to be UTF-8 without a BOM Tim,=20 Doesn't that assumption/behavior violate the current spec? "All the files must begin with a Unicode BOM character." -Jaben > -----Original Message----- > From: Tim Lewis [mailto:tim.lewis@insyde.com] > Sent: Wednesday, April 26, 2017 9:15 AM > To: Kinney, Michael D ; edk2-=20 > devel@lists.01.org > Cc: Carsey, Jaben ; Shaw, Kevin W=20 > > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on=20 > disk to be UTF-8 without a BOM > Importance: High >=20 > Mike -- >=20 > This breaks our existing build tools, which assume that a file without=20 > a BOM is UTF-16. >=20 > Tim >=20 > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of=20 > Michael Kinney > Sent: Tuesday, April 25, 2017 6:07 PM > To: edk2-devel@lists.01.org > Cc: Jaben Carsey ; Kevin W Shaw=20 > > Subject: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk=20 > to be > UTF-8 without a BOM >=20 > https://bugzilla.tianocore.org/show_bug.cgi?id=3D507 >=20 > Cc: Jaben Carsey > Cc: Yonghong Zhu > Cc: Kevin W Shaw > Contributed-under: TianoCore Contribution Agreement 1.1 > Signed-off-by: Michael Kinney > --- > 2_unicode_strings_file_format.md | 9 ++++++--- > README.md | 27 ++++++++++++++------------- > 2 files changed, 20 insertions(+), 16 deletions(-) >=20 > diff --git a/2_unicode_strings_file_format.md > b/2_unicode_strings_file_format.md > index 0150c85..7a4a019 100644 > --- a/2_unicode_strings_file_format.md > +++ b/2_unicode_strings_file_format.md > @@ -33,7 +33,8 @@ >=20 > EDK II Unicode files are used for mapping token names to localized=20 > strings that are identified by an RFC4646 language code. The format=20 > for storing EDK II -Unicode files is UTF-16LE. The character content must= be UCS-2. > +Unicode files on disk is UTF-8 (without a BOM character) or UTF-16LE=20 > +(with a BOM character). The character content must be UCS-2. >=20 > Strings ends are determined by the first of the following items found: >=20 > @@ -44,11 +45,13 @@ Strings ends are determined by the first of the=20 > following items found: >=20 > Comments may appear anywhere within the string file. >=20 > -All the files must begin with a Unicode BOM character. > +All UTF-16LE files must begin with a Unicode BOM character. > +All UTF-8 files must not begin with a Unicode BOM character. >=20 > ********** > **NOTE:** Please make sure you select an editor that supports UCS-2=20 > characters -that can be stored in a UTF-16LE file. > +that can be stored in either a UTF-8 (without a BOM character) or a=20 > +UTF-16LE file (with a BOM character). > ********** >=20 > ## 2.1 Common EBNF > diff --git a/README.md b/README.md > index 63842a1..015aef1 100644 > --- a/README.md > +++ b/README.md > @@ -77,16 +77,17 @@ Copyright (c) 2016-2017, Intel Corporation. All=20 > rights reserved. >=20 > ### Revision History >=20 > -| Revision | Description = | Date > | > -| ----------------- |=20 > -| ------------------------------------------------------------------- > -| ---- > ----------------- | --------------- | > -| 1.0 | Initial Release. = | February > 2014 | > -| 1.1 | Updated EBNF to follow syntax specified in EBNF by= the ANTLR > project. | August 2014 | > -| | Added content related to EDK II Meta-Data Unicode = files. > | | > -| | Restructured document. = | > | > -| | Removed security and C format GUID definitions,=20 > -| not required > for HII or other UNI files. | | > -| | Removed invalid escape code sequences. > | | > -| 1.2 | Added optional font formatting = | > September 2014 | > -| 1.2 Errata A | Correct misspelling of: > `STR_PROPERTIES_MODULE_NAME` | April = 2015 | > -| 1.3 | Added: Syntax for non-ascii characters inside quot= ed strings. > | March 2016 | > -| | Removed: Info on specific consumers (.INF & .DEC) = removed. > | | > -| 1.4 | Convert to GitBook format = | > March 2017 | > +| Revision | Description > | Date | > +| ----------------- |=20 > +| ------------------------------------------------------------------- > +| --- > ------------------------------------------------ | --------------- | > +| 1.0 | Initial Release. > | February 2014 | > +| 1.1 | Updated EBNF to follow syntax specified in EBNF by= the > ANTLR project. | August = 2014 | > +| | Added content related to EDK II Meta-Data Unicode = files. > | | > +| | Restructured document. > | | > +| | Removed security and C format GUID definitions,=20 > +| not required > for HII or other UNI files. | = | > +| | Removed invalid escape code sequences. > | | > +| 1.2 | Added optional font formatting > | September 2014 | > +| 1.2 Errata A | Correct misspelling of: > `STR_PROPERTIES_MODULE_NAME` = | April > 2015 | > +| 1.3 | Added: Syntax for non-ascii characters inside quot= ed strings. > | March 2016 | > +| | Removed: Info on specific consumers (.INF & .DEC) = removed. > | | > +| 1.4 | Convert to GitBook format > | April 2017 | > +| |=20 > +| [#507](https://bugzilla.tianocore.org/show_bug.cgi?id=3D507) > UNI Spec: Clarify that .uni files maybe UTF-8 without a BOM | = | > -- > 2.6.3.windows.1 >=20 > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel