public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Kinney, Michael D" <michael.d.kinney@intel.com>
To: Tim Lewis <tim.lewis@insyde.com>,
	"Carsey, Jaben" <jaben.carsey@intel.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	"Kinney, Michael D" <michael.d.kinney@intel.com>
Cc: "Shaw, Kevin W" <kevin.w.shaw@intel.com>
Subject: Re: [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM
Date: Wed, 26 Apr 2017 18:25:34 +0000	[thread overview]
Message-ID: <E92EE9817A31E24EB0585FDF735412F57D16D578@ORSMSX113.amr.corp.intel.com> (raw)
In-Reply-To: <7236196A5DF6C040855A6D96F556A53F5764A8@msmail.insydesw.com.tw>

Tim,

The document change request under review here is against this 1.3 spec.

Here is the document history on this topic I have been able to find. 

The Multi-String .UNI File Format Specification Version 1.3, March 2016
https://github.com/tianocore/tianocore.github.io/wiki/EDK%20II%20Specifications
has the following 2 statements in CH 2:

* The format for storing EDK II Unicode files is UTF-16LE
* All the files must begin with a Unicode BOM character.

The Multi-String .UNI File Format Specification Version 1.2 Errata A, April 2015
https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-Specifications-Archived
has the same 2 statements in CH 2:

* The format for storing EDK II Unicode files is UTF-16LE
* All the files must begin with a Unicode BOM character.


The Multi-String .UNI File Format Specification, Revision 1.0, February 2014
http://cran.org.uk/edk2/docs/specs/UNI_File_Spec_1_0.pdf
has the following statements in CH2:

* All the files must begin with the binary character, 0xFEFF (big-endian).

I do not see any versions of the .UNI spec that do not require a BOM.

Mike

> -----Original Message-----
> From: Tim Lewis [mailto:tim.lewis@insyde.com]
> Sent: Wednesday, April 26, 2017 10:53 AM
> To: Carsey, Jaben <jaben.carsey@intel.com>; Kinney, Michael D
> <michael.d.kinney@intel.com>; edk2-devel@lists.01.org
> Cc: Shaw, Kevin W <kevin.w.shaw@intel.com>
> Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be
> UTF-8 without a BOM
> 
> The original UNI specifications (for example, the Multi-String .UNI File Format
> Specification, February 2014, Revision 1.0) did not require it, and the fact is
> that tools accept files without the BOM happily today.
> 
> I believe that requiring the BOM is a good step forward, but assuming UTF-8 when
> one is not present won't help the vast quantities of existing UNI files out
> there.
> 
> Tim
> 
> -----Original Message-----
> From: Carsey, Jaben [mailto:jaben.carsey@intel.com]
> Sent: Wednesday, April 26, 2017 10:45 AM
> To: Tim Lewis <tim.lewis@insyde.com>; Kinney, Michael D
> <michael.d.kinney@intel.com>; edk2-devel@lists.01.org
> Cc: Shaw, Kevin W <kevin.w.shaw@intel.com>
> Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk to be
> UTF-8 without a BOM
> 
> Tim,
> 
> Doesn't that assumption/behavior violate the current spec?
> "All the files must begin with a Unicode BOM character."
> 
> -Jaben
> 
> > -----Original Message-----
> > From: Tim Lewis [mailto:tim.lewis@insyde.com]
> > Sent: Wednesday, April 26, 2017 9:15 AM
> > To: Kinney, Michael D <michael.d.kinney@intel.com>; edk2-
> > devel@lists.01.org
> > Cc: Carsey, Jaben <jaben.carsey@intel.com>; Shaw, Kevin W
> > <kevin.w.shaw@intel.com>
> > Subject: RE: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on
> > disk to be UTF-8 without a BOM
> > Importance: High
> >
> > Mike --
> >
> > This breaks our existing build tools, which assume that a file without
> > a BOM is UTF-16.
> >
> > Tim
> >
> > -----Original Message-----
> > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> > Michael Kinney
> > Sent: Tuesday, April 25, 2017 6:07 PM
> > To: edk2-devel@lists.01.org
> > Cc: Jaben Carsey <jaben.carsey@intel.com>; Kevin W Shaw
> > <kevin.w.shaw@intel.com>
> > Subject: [edk2] [edk2-UniSpecification PATCH] Allow .uni files on disk
> > to be
> > UTF-8 without a BOM
> >
> > https://bugzilla.tianocore.org/show_bug.cgi?id=507
> >
> > Cc: Jaben Carsey <jaben.carsey@intel.com>
> > Cc: Yonghong Zhu <yonghong.zhu@intel.com>
> > Cc: Kevin W Shaw <kevin.w.shaw@intel.com>
> > Contributed-under: TianoCore Contribution Agreement 1.1
> > Signed-off-by: Michael Kinney <michael.d.kinney@intel.com>
> > ---
> >  2_unicode_strings_file_format.md |  9 ++++++---
> >  README.md                        | 27 ++++++++++++++-------------
> >  2 files changed, 20 insertions(+), 16 deletions(-)
> >
> > diff --git a/2_unicode_strings_file_format.md
> > b/2_unicode_strings_file_format.md
> > index 0150c85..7a4a019 100644
> > --- a/2_unicode_strings_file_format.md
> > +++ b/2_unicode_strings_file_format.md
> > @@ -33,7 +33,8 @@
> >
> >  EDK II Unicode files are used for mapping token names to localized
> > strings that  are identified by an RFC4646 language code. The format
> > for storing EDK II -Unicode files is UTF-16LE. The character content must be
> UCS-2.
> > +Unicode files on disk is UTF-8 (without a BOM character) or UTF-16LE
> > +(with a BOM character). The character content must be UCS-2.
> >
> >  Strings ends are determined by the first of the following items found:
> >
> > @@ -44,11 +45,13 @@ Strings ends are determined by the first of the
> > following items found:
> >
> >  Comments may appear anywhere within the string file.
> >
> > -All the files must begin with a Unicode BOM character.
> > +All UTF-16LE files must begin with a Unicode BOM character.
> > +All UTF-8 files must not begin with a Unicode BOM character.
> >
> >  **********
> >  **NOTE:** Please make sure you select an editor that supports UCS-2
> > characters -that can be stored in a UTF-16LE file.
> > +that can be stored in either a UTF-8 (without a BOM character) or a
> > +UTF-16LE file (with a BOM character).
> >  **********
> >
> >  ## 2.1 Common EBNF
> > diff --git a/README.md b/README.md
> > index 63842a1..015aef1 100644
> > --- a/README.md
> > +++ b/README.md
> > @@ -77,16 +77,17 @@ Copyright (c) 2016-2017, Intel Corporation. All
> > rights reserved.
> >
> >  ### Revision History
> >
> > -| Revision          | Description
> | Date
> > |
> > -| ----------------- |
> > -| -------------------------------------------------------------------
> > -| ----
> > ----------------- | --------------- |
> > -| 1.0               | Initial Release.
> | February
> > 2014   |
> > -| 1.1               | Updated EBNF to follow syntax specified in EBNF by the
> ANTLR
> > project.                    | August 2014     |
> > -|                   | Added content related to EDK II Meta-Data Unicode files.
> > |                 |
> > -|                   | Restructured document.
> |
> > |
> > -|                   | Removed security and C format GUID definitions,
> > -| not required
> > for HII or other UNI files. |                 |
> > -|                   | Removed invalid escape code sequences.
> > |                 |
> > -| 1.2               | Added optional font formatting
> |
> > September 2014  |
> > -| 1.2 Errata A      | Correct misspelling of:
> > `STR_PROPERTIES_MODULE_NAME`                                     | April 2015
> |
> > -| 1.3               | Added: Syntax for non-ascii characters inside quoted
> strings.
> > | March 2016      |
> > -|                   | Removed: Info on specific consumers (.INF & .DEC)
> removed.
> > |                 |
> > -| 1.4               | Convert to GitBook format
> |
> > March 2017      |
> > +| Revision          | Description
> > | Date            |
> > +| ----------------- |
> > +| -------------------------------------------------------------------
> > +| ---
> > ------------------------------------------------ | --------------- |
> > +| 1.0               | Initial Release.
> > | February 2014   |
> > +| 1.1               | Updated EBNF to follow syntax specified in EBNF by the
> > ANTLR project.                                                  | August 2014
> |
> > +|                   | Added content related to EDK II Meta-Data Unicode files.
> > |                 |
> > +|                   | Restructured document.
> > |                 |
> > +|                   | Removed security and C format GUID definitions,
> > +| not required
> > for HII or other UNI files.                               |                 |
> > +|                   | Removed invalid escape code sequences.
> > |                 |
> > +| 1.2               | Added optional font formatting
> > | September 2014  |
> > +| 1.2 Errata A      | Correct misspelling of:
> > `STR_PROPERTIES_MODULE_NAME`
> | April
> > 2015      |
> > +| 1.3               | Added: Syntax for non-ascii characters inside quoted
> strings.
> > | March 2016      |
> > +|                   | Removed: Info on specific consumers (.INF & .DEC)
> removed.
> > |                 |
> > +| 1.4               | Convert to GitBook format
> > | April 2017      |
> > +|                   |
> > +| [#507](https://bugzilla.tianocore.org/show_bug.cgi?id=507)
> > UNI Spec: Clarify that .uni files maybe UTF-8 without a BOM |                 |
> > --
> > 2.6.3.windows.1
> >
> > _______________________________________________
> > edk2-devel mailing list
> > edk2-devel@lists.01.org
> > https://lists.01.org/mailman/listinfo/edk2-devel


  reply	other threads:[~2017-04-26 18:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26  1:07 [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM Michael Kinney
2017-04-26  1:07 ` Michael Kinney
2017-04-26 16:15   ` Tim Lewis
2017-04-26 17:44     ` Carsey, Jaben
2017-04-26 17:53       ` Tim Lewis
2017-04-26 18:25         ` Kinney, Michael D [this message]
2017-04-26 18:11     ` Kinney, Michael D
2017-04-26 18:34       ` Tim Lewis
2017-04-26 18:46         ` Kinney, Michael D
2017-04-26 18:53           ` Tim Lewis
2017-04-26 22:47             ` Kinney, Michael D
2017-04-26 23:13               ` Tim Lewis
2017-04-27  0:02                 ` Kinney, Michael D
2017-04-27  0:26                   ` Tim Lewis
2017-04-28 16:47                     ` Tim Lewis
2017-04-28 17:22                       ` Kinney, Michael D
2017-04-26  2:10 ` Zhu, Yonghong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E92EE9817A31E24EB0585FDF735412F57D16D578@ORSMSX113.amr.corp.intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox