public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Tim Lewis <tim.lewis@insyde.com>
To: "Kinney, Michael D" <michael.d.kinney@intel.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Cc: "Carsey, Jaben" <jaben.carsey@intel.com>,
	"Shaw, Kevin W" <kevin.w.shaw@intel.com>
Subject: Re: [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM
Date: Wed, 26 Apr 2017 18:34:19 +0000	[thread overview]
Message-ID: <7236196A5DF6C040855A6D96F556A53F576544@msmail.insydesw.com.tw> (raw)
In-Reply-To: <E92EE9817A31E24EB0585FDF735412F57D16D545@ORSMSX113.amr.corp.intel.com>

Mike --

I understand that EDK2 has decided to add BOM markers two years ago. Adding a BOM didn't change the default. The problem is (a) there are still hundreds of files extant in our codebase which were created prior to the 2015 changes and still in use, and (b) this change is not backward compatible for these files. 

Tim

-----Original Message-----
From: Kinney, Michael D [mailto:michael.d.kinney@intel.com] 
Sent: Wednesday, April 26, 2017 11:11 AM
To: Tim Lewis <tim.lewis@insyde.com>; edk2-devel@lists.01.org; Kinney, Michael D <michael.d.kinney@intel.com>
Cc: Carsey, Jaben <jaben.carsey@intel.com>; Shaw, Kevin W <kevin.w.shaw@intel.com>
Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM

Hi Tim,

This is not a request for a new change.  Instead, the intent of this document change is to update the document to reflect the implemented behavior of the EDK II tools.  The EDK II tool updates to add UTF-8 file support were completed with the patches listed below.  Notice that the main one for normal build support was checked in almost 2 years ago. 

BaseTools - UniClassObject - 6/23/2015
* https://github.com/tianocore/edk2/commit/d80e451b187c9d33cbd771253fbd5119670f75c6
* https://github.com/tianocore/edk2/commit/be264422c95c781a345978f17b7e80b91f816eda

BaseTools - ECC - 12/29/2015
* https://github.com/tianocore/edk2/commit/975889279df2eb3d3338cb88afb3faa71ddde4d6

BaseTools - UPT - 4/25/2016
* https://github.com/tianocore/edk2/commit/4a21fb3b67a0ef1655b43e9368b6b697bbf327af

This was intended to be a 100% backwards compatible change.

All .uni files in the EDK II project in UTF-16LE format have always use a BOM.
Please checkout UDK2015 or older UDKs and you will see all .uni files start with 0xff 0xfe.

Thanks,

Mike

> -----Original Message-----
> From: Tim Lewis [mailto:tim.lewis@insyde.com]
> Sent: Wednesday, April 26, 2017 9:15 AM
> To: Kinney, Michael D <michael.d.kinney@intel.com>; 
> edk2-devel@lists.01.org
> Cc: Carsey, Jaben <jaben.carsey@intel.com>; Shaw, Kevin W 
> <kevin.w.shaw@intel.com>
> Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on 
> disk to be
> UTF-8 without a BOM
> 
> Mike --
> 
> This breaks our existing build tools, which assume that a file without 
> a BOM is UTF-16.
> 
> Tim
> 
> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of 
> Michael Kinney
> Sent: Tuesday, April 25, 2017 6:07 PM
> To: edk2-devel@lists.01.org
> Cc: Jaben Carsey <jaben.carsey@intel.com>; Kevin W Shaw 
> <kevin.w.shaw@intel.com>
> Subject: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk 
> to be UTF-
> 8 without a BOM
> 
> https://bugzilla.tianocore.org/show_bug.cgi?id=507
> 
> Cc: Jaben Carsey <jaben.carsey@intel.com>
> Cc: Yonghong Zhu <yonghong.zhu@intel.com>
> Cc: Kevin W Shaw <kevin.w.shaw@intel.com>
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Michael Kinney <michael.d.kinney@intel.com>
> ---
>  2_unicode_strings_file_format.md |  9 ++++++---
>  README.md                        | 27 ++++++++++++++-------------
>  2 files changed, 20 insertions(+), 16 deletions(-)
> 
> diff --git a/2_unicode_strings_file_format.md 
> b/2_unicode_strings_file_format.md
> index 0150c85..7a4a019 100644
> --- a/2_unicode_strings_file_format.md
> +++ b/2_unicode_strings_file_format.md
> @@ -33,7 +33,8 @@
> 
>  EDK II Unicode files are used for mapping token names to localized 
> strings that are identified by an RFC4646 language code. The format 
> for storing EDK II - Unicode files is UTF-16LE. The character content must be UCS-2.
> +Unicode files on disk is UTF-8 (without a BOM character) or UTF-16LE 
> +(with a BOM character). The character content must be UCS-2.
> 
>  Strings ends are determined by the first of the following items found:
> 
> @@ -44,11 +45,13 @@ Strings ends are determined by the first of the 
> following items found:
> 
>  Comments may appear anywhere within the string file.
> 
> -All the files must begin with a Unicode BOM character.
> +All UTF-16LE files must begin with a Unicode BOM character.
> +All UTF-8 files must not begin with a Unicode BOM character.
> 
>  **********
>  **NOTE:** Please make sure you select an editor that supports UCS-2 
> characters - that can be stored in a UTF-16LE file.
> +that can be stored in either a UTF-8 (without a BOM character) or a 
> +UTF-16LE file (with a BOM character).
>  **********
> 
>  ## 2.1 Common EBNF
> diff --git a/README.md b/README.md
> index 63842a1..015aef1 100644
> --- a/README.md
> +++ b/README.md
> @@ -77,16 +77,17 @@ Copyright (c) 2016-2017, Intel Corporation. All 
> rights reserved.
> 
>  ### Revision History
> 
> -| Revision          | Description
> | Date            |
> -| ----------------- | 
> -| ----------------------------------------------------------
> ------------------------------ | --------------- |
> -| 1.0               | Initial Release.
> | February 2014   |
> -| 1.1               | Updated EBNF to follow syntax specified in EBNF by the
> ANTLR project.                    | August 2014     |
> -|                   | Added content related to EDK II Meta-Data Unicode files.
> |                 |
> -|                   | Restructured document.
> |                 |
> -|                   | Removed security and C format GUID definitions, 
> -| not
> required for HII or other UNI files. |                 |
> -|                   | Removed invalid escape code sequences.
> |                 |
> -| 1.2               | Added optional font formatting
> | September 2014  |
> -| 1.2 Errata A      | Correct misspelling of: `STR_PROPERTIES_MODULE_NAME`
> | April 2015      |
> -| 1.3               | Added: Syntax for non-ascii characters inside quoted
> strings.                            | March 2016      |
> -|                   | Removed: Info on specific consumers (.INF & .DEC) removed.
> |                 |
> -| 1.4               | Convert to GitBook format
> | March 2017      |
> +| Revision          | Description
> | Date            |
> +| ----------------- | 
> +| ----------------------------------------------------------
> ------------------------------------------------------------ | 
> --------------- |
> +| 1.0               | Initial Release.
> | February 2014   |
> +| 1.1               | Updated EBNF to follow syntax specified in EBNF by the
> ANTLR project.                                                  | August 2014
> |
> +|                   | Added content related to EDK II Meta-Data Unicode files.
> |                 |
> +|                   | Restructured document.
> |                 |
> +|                   | Removed security and C format GUID definitions, 
> +| not
> required for HII or other UNI files.                               |
> |
> +|                   | Removed invalid escape code sequences.
> |                 |
> +| 1.2               | Added optional font formatting
> | September 2014  |
> +| 1.2 Errata A      | Correct misspelling of: `STR_PROPERTIES_MODULE_NAME`
> | April 2015      |
> +| 1.3               | Added: Syntax for non-ascii characters inside quoted
> strings.                                                          | March 2016
> |
> +|                   | Removed: Info on specific consumers (.INF & .DEC) removed.
> |                 |
> +| 1.4               | Convert to GitBook format
> | April 2017      |
> +|                   | 
> +| [#507](https://bugzilla.tianocore.org/show_bug.cgi?id=507)
> UNI Spec: Clarify that .uni files maybe UTF-8 without a BOM |                 |
> --
> 2.6.3.windows.1
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel


  reply	other threads:[~2017-04-26 18:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26  1:07 [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM Michael Kinney
2017-04-26  1:07 ` Michael Kinney
2017-04-26 16:15   ` Tim Lewis
2017-04-26 17:44     ` Carsey, Jaben
2017-04-26 17:53       ` Tim Lewis
2017-04-26 18:25         ` Kinney, Michael D
2017-04-26 18:11     ` Kinney, Michael D
2017-04-26 18:34       ` Tim Lewis [this message]
2017-04-26 18:46         ` Kinney, Michael D
2017-04-26 18:53           ` Tim Lewis
2017-04-26 22:47             ` Kinney, Michael D
2017-04-26 23:13               ` Tim Lewis
2017-04-27  0:02                 ` Kinney, Michael D
2017-04-27  0:26                   ` Tim Lewis
2017-04-28 16:47                     ` Tim Lewis
2017-04-28 17:22                       ` Kinney, Michael D
2017-04-26  2:10 ` Zhu, Yonghong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7236196A5DF6C040855A6D96F556A53F576544@msmail.insydesw.com.tw \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox