From: "Kinney, Michael D" <michael.d.kinney@intel.com>
To: Tim Lewis <tim.lewis@insyde.com>,
"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
"Kinney, Michael D" <michael.d.kinney@intel.com>
Cc: "Carsey, Jaben" <jaben.carsey@intel.com>,
"Shaw, Kevin W" <kevin.w.shaw@intel.com>
Subject: Re: [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM
Date: Wed, 26 Apr 2017 18:46:51 +0000 [thread overview]
Message-ID: <E92EE9817A31E24EB0585FDF735412F57D16D5B0@ORSMSX113.amr.corp.intel.com> (raw)
In-Reply-To: <7236196A5DF6C040855A6D96F556A53F576544@msmail.insydesw.com.tw>
Tim,
If you look at the entire file history of the EDK II, you will see
that the BOM has always been present in the UTF-16LE formatted files.
The build tools were updated in 2015 to *add* support for UTF-8 file.
The .uni files in the EDK II project were then converted from UTF-16LE
with a BOM to UTF-8 without a BOM. This provided an easier developer
experience when using GIT to do email patch review of .uni files.
It is possible I am missing something here. Can you please provide
a pointer to the EDK II commit(s) where BOMs were added to UTF-16LE
.uni files.
Thanks,
Mike
> -----Original Message-----
> From: Tim Lewis [mailto:tim.lewis@insyde.com]
> Sent: Wednesday, April 26, 2017 11:34 AM
> To: Kinney, Michael D <michael.d.kinney@intel.com>; edk2-devel@lists.01.org
> Cc: Carsey, Jaben <jaben.carsey@intel.com>; Shaw, Kevin W
> <kevin.w.shaw@intel.com>
> Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be
> UTF-8 without a BOM
>
> Mike --
>
> I understand that EDK2 has decided to add BOM markers two years ago. Adding a BOM
> didn't change the default. The problem is (a) there are still hundreds of files
> extant in our codebase which were created prior to the 2015 changes and still in
> use, and (b) this change is not backward compatible for these files.
>
> Tim
>
> -----Original Message-----
> From: Kinney, Michael D [mailto:michael.d.kinney@intel.com]
> Sent: Wednesday, April 26, 2017 11:11 AM
> To: Tim Lewis <tim.lewis@insyde.com>; edk2-devel@lists.01.org; Kinney, Michael D
> <michael.d.kinney@intel.com>
> Cc: Carsey, Jaben <jaben.carsey@intel.com>; Shaw, Kevin W
> <kevin.w.shaw@intel.com>
> Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be
> UTF-8 without a BOM
>
> Hi Tim,
>
> This is not a request for a new change. Instead, the intent of this document
> change is to update the document to reflect the implemented behavior of the EDK
> II tools. The EDK II tool updates to add UTF-8 file support were completed with
> the patches listed below. Notice that the main one for normal build support was
> checked in almost 2 years ago.
>
> BaseTools - UniClassObject - 6/23/2015
> *
> https://github.com/tianocore/edk2/commit/d80e451b187c9d33cbd771253fbd5119670f75c6
> *
> https://github.com/tianocore/edk2/commit/be264422c95c781a345978f17b7e80b91f816eda
>
> BaseTools - ECC - 12/29/2015
> *
> https://github.com/tianocore/edk2/commit/975889279df2eb3d3338cb88afb3faa71ddde4d6
>
> BaseTools - UPT - 4/25/2016
> *
> https://github.com/tianocore/edk2/commit/4a21fb3b67a0ef1655b43e9368b6b697bbf327af
>
> This was intended to be a 100% backwards compatible change.
>
> All .uni files in the EDK II project in UTF-16LE format have always use a BOM.
> Please checkout UDK2015 or older UDKs and you will see all .uni files start with
> 0xff 0xfe.
>
> Thanks,
>
> Mike
>
> > -----Original Message-----
> > From: Tim Lewis [mailto:tim.lewis@insyde.com]
> > Sent: Wednesday, April 26, 2017 9:15 AM
> > To: Kinney, Michael D <michael.d.kinney@intel.com>;
> > edk2-devel@lists.01.org
> > Cc: Carsey, Jaben <jaben.carsey@intel.com>; Shaw, Kevin W
> > <kevin.w.shaw@intel.com>
> > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on
> > disk to be
> > UTF-8 without a BOM
> >
> > Mike --
> >
> > This breaks our existing build tools, which assume that a file without
> > a BOM is UTF-16.
> >
> > Tim
> >
> > -----Original Message-----
> > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> > Michael Kinney
> > Sent: Tuesday, April 25, 2017 6:07 PM
> > To: edk2-devel@lists.01.org
> > Cc: Jaben Carsey <jaben.carsey@intel.com>; Kevin W Shaw
> > <kevin.w.shaw@intel.com>
> > Subject: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk
> > to be UTF-
> > 8 without a BOM
> >
> > https://bugzilla.tianocore.org/show_bug.cgi?id=507
> >
> > Cc: Jaben Carsey <jaben.carsey@intel.com>
> > Cc: Yonghong Zhu <yonghong.zhu@intel.com>
> > Cc: Kevin W Shaw <kevin.w.shaw@intel.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Michael Kinney <michael.d.kinney@intel.com>
> > ---
> > 2_unicode_strings_file_format.md | 9 ++++++---
> > README.md | 27 ++++++++++++++-------------
> > 2 files changed, 20 insertions(+), 16 deletions(-)
> >
> > diff --git a/2_unicode_strings_file_format.md
> > b/2_unicode_strings_file_format.md
> > index 0150c85..7a4a019 100644
> > --- a/2_unicode_strings_file_format.md
> > +++ b/2_unicode_strings_file_format.md
> > @@ -33,7 +33,8 @@
> >
> > EDK II Unicode files are used for mapping token names to localized
> > strings that are identified by an RFC4646 language code. The format
> > for storing EDK II - Unicode files is UTF-16LE. The character content must be
> UCS-2.
> > +Unicode files on disk is UTF-8 (without a BOM character) or UTF-16LE
> > +(with a BOM character). The character content must be UCS-2.
> >
> > Strings ends are determined by the first of the following items found:
> >
> > @@ -44,11 +45,13 @@ Strings ends are determined by the first of the
> > following items found:
> >
> > Comments may appear anywhere within the string file.
> >
> > -All the files must begin with a Unicode BOM character.
> > +All UTF-16LE files must begin with a Unicode BOM character.
> > +All UTF-8 files must not begin with a Unicode BOM character.
> >
> > **********
> > **NOTE:** Please make sure you select an editor that supports UCS-2
> > characters - that can be stored in a UTF-16LE file.
> > +that can be stored in either a UTF-8 (without a BOM character) or a
> > +UTF-16LE file (with a BOM character).
> > **********
> >
> > ## 2.1 Common EBNF
> > diff --git a/README.md b/README.md
> > index 63842a1..015aef1 100644
> > --- a/README.md
> > +++ b/README.md
> > @@ -77,16 +77,17 @@ Copyright (c) 2016-2017, Intel Corporation. All
> > rights reserved.
> >
> > ### Revision History
> >
> > -| Revision | Description
> > | Date |
> > -| ----------------- |
> > -| ----------------------------------------------------------
> > ------------------------------ | --------------- |
> > -| 1.0 | Initial Release.
> > | February 2014 |
> > -| 1.1 | Updated EBNF to follow syntax specified in EBNF by the
> > ANTLR project. | August 2014 |
> > -| | Added content related to EDK II Meta-Data Unicode files.
> > | |
> > -| | Restructured document.
> > | |
> > -| | Removed security and C format GUID definitions,
> > -| not
> > required for HII or other UNI files. | |
> > -| | Removed invalid escape code sequences.
> > | |
> > -| 1.2 | Added optional font formatting
> > | September 2014 |
> > -| 1.2 Errata A | Correct misspelling of: `STR_PROPERTIES_MODULE_NAME`
> > | April 2015 |
> > -| 1.3 | Added: Syntax for non-ascii characters inside quoted
> > strings. | March 2016 |
> > -| | Removed: Info on specific consumers (.INF & .DEC)
> removed.
> > | |
> > -| 1.4 | Convert to GitBook format
> > | March 2017 |
> > +| Revision | Description
> > | Date |
> > +| ----------------- |
> > +| ----------------------------------------------------------
> > ------------------------------------------------------------ |
> > --------------- |
> > +| 1.0 | Initial Release.
> > | February 2014 |
> > +| 1.1 | Updated EBNF to follow syntax specified in EBNF by the
> > ANTLR project. | August 2014
> > |
> > +| | Added content related to EDK II Meta-Data Unicode files.
> > | |
> > +| | Restructured document.
> > | |
> > +| | Removed security and C format GUID definitions,
> > +| not
> > required for HII or other UNI files. |
> > |
> > +| | Removed invalid escape code sequences.
> > | |
> > +| 1.2 | Added optional font formatting
> > | September 2014 |
> > +| 1.2 Errata A | Correct misspelling of: `STR_PROPERTIES_MODULE_NAME`
> > | April 2015 |
> > +| 1.3 | Added: Syntax for non-ascii characters inside quoted
> > strings. | March 2016
> > |
> > +| | Removed: Info on specific consumers (.INF & .DEC)
> removed.
> > | |
> > +| 1.4 | Convert to GitBook format
> > | April 2017 |
> > +| |
> > +| [#507](https://bugzilla.tianocore.org/show_bug.cgi?id=507)
> > UNI Spec: Clarify that .uni files maybe UTF-8 without a BOM | |
> > --
> > 2.6.3.windows.1
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel
next prev parent reply other threads:[~2017-04-26 18:46 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-26 1:07 [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM Michael Kinney
2017-04-26 1:07 ` Michael Kinney
2017-04-26 16:15 ` Tim Lewis
2017-04-26 17:44 ` Carsey, Jaben
2017-04-26 17:53 ` Tim Lewis
2017-04-26 18:25 ` Kinney, Michael D
2017-04-26 18:11 ` Kinney, Michael D
2017-04-26 18:34 ` Tim Lewis
2017-04-26 18:46 ` Kinney, Michael D [this message]
2017-04-26 18:53 ` Tim Lewis
2017-04-26 22:47 ` Kinney, Michael D
2017-04-26 23:13 ` Tim Lewis
2017-04-27 0:02 ` Kinney, Michael D
2017-04-27 0:26 ` Tim Lewis
2017-04-28 16:47 ` Tim Lewis
2017-04-28 17:22 ` Kinney, Michael D
2017-04-26 2:10 ` Zhu, Yonghong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E92EE9817A31E24EB0585FDF735412F57D16D5B0@ORSMSX113.amr.corp.intel.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox