From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 3D47721951C94 for ; Tue, 25 Apr 2017 18:07:23 -0700 (PDT) Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 25 Apr 2017 18:07:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.37,252,1488873600"; d="scan'208";a="1160978922" Received: from ewhitney-mobl.amr.corp.intel.com (HELO mdkinney-MOBL.amr.corp.intel.com) ([10.255.230.13]) by fmsmga002.fm.intel.com with ESMTP; 25 Apr 2017 18:07:22 -0700 From: Michael Kinney To: edk2-devel@lists.01.org Cc: Jaben Carsey , Yonghong Zhu , Kevin W Shaw Date: Tue, 25 Apr 2017 18:07:19 -0700 Message-Id: <1493168839-11708-2-git-send-email-michael.d.kinney@intel.com> X-Mailer: git-send-email 2.6.3.windows.1 In-Reply-To: <1493168839-11708-1-git-send-email-michael.d.kinney@intel.com> References: <1493168839-11708-1-git-send-email-michael.d.kinney@intel.com> Subject: [edk2-UniSpecification PATCH] Allow .uni files on disk to be UTF-8 without a BOM X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Apr 2017 01:07:23 -0000 https://bugzilla.tianocore.org/show_bug.cgi?id=507 Cc: Jaben Carsey Cc: Yonghong Zhu Cc: Kevin W Shaw Contributed-under: TianoCore Contribution Agreement 1.1 Signed-off-by: Michael Kinney --- 2_unicode_strings_file_format.md | 9 ++++++--- README.md | 27 ++++++++++++++------------- 2 files changed, 20 insertions(+), 16 deletions(-) diff --git a/2_unicode_strings_file_format.md b/2_unicode_strings_file_format.md index 0150c85..7a4a019 100644 --- a/2_unicode_strings_file_format.md +++ b/2_unicode_strings_file_format.md @@ -33,7 +33,8 @@ EDK II Unicode files are used for mapping token names to localized strings that are identified by an RFC4646 language code. The format for storing EDK II -Unicode files is UTF-16LE. The character content must be UCS-2. +Unicode files on disk is UTF-8 (without a BOM character) or UTF-16LE (with a BOM +character). The character content must be UCS-2. Strings ends are determined by the first of the following items found: @@ -44,11 +45,13 @@ Strings ends are determined by the first of the following items found: Comments may appear anywhere within the string file. -All the files must begin with a Unicode BOM character. +All UTF-16LE files must begin with a Unicode BOM character. +All UTF-8 files must not begin with a Unicode BOM character. ********** **NOTE:** Please make sure you select an editor that supports UCS-2 characters -that can be stored in a UTF-16LE file. +that can be stored in either a UTF-8 (without a BOM character) or a UTF-16LE +file (with a BOM character). ********** ## 2.1 Common EBNF diff --git a/README.md b/README.md index 63842a1..015aef1 100644 --- a/README.md +++ b/README.md @@ -77,16 +77,17 @@ Copyright (c) 2016-2017, Intel Corporation. All rights reserved. ### Revision History -| Revision | Description | Date | -| ----------------- | ---------------------------------------------------------------------------------------- | --------------- | -| 1.0 | Initial Release. | February 2014 | -| 1.1 | Updated EBNF to follow syntax specified in EBNF by the ANTLR project. | August 2014 | -| | Added content related to EDK II Meta-Data Unicode files. | | -| | Restructured document. | | -| | Removed security and C format GUID definitions, not required for HII or other UNI files. | | -| | Removed invalid escape code sequences. | | -| 1.2 | Added optional font formatting | September 2014 | -| 1.2 Errata A | Correct misspelling of: `STR_PROPERTIES_MODULE_NAME` | April 2015 | -| 1.3 | Added: Syntax for non-ascii characters inside quoted strings. | March 2016 | -| | Removed: Info on specific consumers (.INF & .DEC) removed. | | -| 1.4 | Convert to GitBook format | March 2017 | +| Revision | Description | Date | +| ----------------- | ---------------------------------------------------------------------------------------------------------------------- | --------------- | +| 1.0 | Initial Release. | February 2014 | +| 1.1 | Updated EBNF to follow syntax specified in EBNF by the ANTLR project. | August 2014 | +| | Added content related to EDK II Meta-Data Unicode files. | | +| | Restructured document. | | +| | Removed security and C format GUID definitions, not required for HII or other UNI files. | | +| | Removed invalid escape code sequences. | | +| 1.2 | Added optional font formatting | September 2014 | +| 1.2 Errata A | Correct misspelling of: `STR_PROPERTIES_MODULE_NAME` | April 2015 | +| 1.3 | Added: Syntax for non-ascii characters inside quoted strings. | March 2016 | +| | Removed: Info on specific consumers (.INF & .DEC) removed. | | +| 1.4 | Convert to GitBook format | April 2017 | +| | [#507](https://bugzilla.tianocore.org/show_bug.cgi?id=507) UNI Spec: Clarify that .uni files maybe UTF-8 without a BOM | | -- 2.6.3.windows.1