From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.93; helo=mga11.intel.com; envelope-from=michael.d.kinney@intel.com; receiver=edk2-devel@lists.01.org Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 69FF32118DC3F for ; Thu, 8 Nov 2018 08:46:02 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 08:46:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="106998903" Received: from orsmsx110.amr.corp.intel.com ([10.22.240.8]) by orsmga002.jf.intel.com with ESMTP; 08 Nov 2018 08:46:00 -0800 Received: from orsmsx113.amr.corp.intel.com ([169.254.9.125]) by ORSMSX110.amr.corp.intel.com ([169.254.10.166]) with mapi id 14.03.0415.000; Thu, 8 Nov 2018 08:46:00 -0800 From: "Kinney, Michael D" To: Sean Brogan , "Gao, Liming" , "edk2-devel@lists.01.org" Thread-Topic: Edk2 uni file encoding Thread-Index: AdR20C6rSm7ksjUET/Kmuyj3hkLhKAAV0kSAAAKf6tAAFCHAsA== Date: Thu, 8 Nov 2018 16:46:00 +0000 Message-ID: References: <4A89E2EF3DFEDB4C8BFDE51014F606A14E366631@SHSMSX104.ccr.corp.intel.com> In-Reply-To: Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.22.254.138] MIME-Version: 1.0 Subject: Re: Edk2 uni file encoding X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Nov 2018 16:46:02 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Sean, As a clarification. The UNI specs does list 2 on-disk formats. This was done so tools could support both in the transition from UTF-16LE with BOM to UTF-8 without BOM. The strong recommendation is for all EDK II open source packages to use UTF-8 without a BOM. Since platform packages not maintained in EDK II could be pulling forward UNI files in UTF-16LE, we have not changed the UNI spec or tools to consider UTF-16LE as unsupported. Doing patch email reviews of UNI files in UTF-16LE is a challenge so requiring UTF-8 without a BOM make this much easier. The EDK II open source package conversion to UTF-8 without a BO was performed in late 2015. Here is one example: https://github.com/tianocore/edk2/commit/3f5287971ffdb5c42e3325a3a94c101f08= d3a02a#diff-14d2171dacfcac1fd2e1b1f7b885e530 A helper python script was added to help perform these conversions: https://github.com/tianocore/edk2/blob/master/BaseTools/Scripts/ConvertUni.= py At some point, it may make sense to *require* UTF-8 without a=20 BOM for all UNI files and all tools and for tools to reject UNI files that are not in UTF-8 without a BOM format. Mike > -----Original Message----- > From: edk2-devel [mailto:edk2-devel- > bounces@lists.01.org] On Behalf Of Sean Brogan via > edk2-devel > Sent: Wednesday, November 7, 2018 11:11 PM > To: Gao, Liming > Cc: edk2-devel@lists.01.org > Subject: Re: [edk2] Edk2 uni file encoding >=20 > Liming, > That was exactly what I was looking for. >=20 > Thanks > Sean >=20 >=20 >=20 >=20 > -----Original Message----- > From: Gao, Liming > Sent: Wednesday, November 7, 2018 10:01 PM > To: Sean Brogan > Cc: edk2-devel@lists.01.org > Subject: RE: Edk2 uni file encoding >=20 > Sean: > EDKII UNI spec > (https://na01.safelinks.protection.outlook.com/?url=3Dhtt > ps%3A%2F%2Fgithub.com%2Ftianocore%2Ftianocore.github.io > %2Fwiki%2FEDK-II- > Specifications&data=3D02%7C01%7Csean.brogan%40microso > ft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f988bf86f > 141af91ab2d7cd011db47%7C1%7C0%7C636772536983024335& > sdata=3Dveov60rbEtr3ub7RcreuFuqJvc4%2BdtAowph7kBGXW54%3D& > amp;reserved=3D0) Chapter 2 defines UNI file format. > EdkCompatibilityPkg is obsolete. BZ > https://na01.safelinks.protection.outlook.com/?url=3Dhttp > s%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D > 1103&data=3D02%7C01%7Csean.brogan%40microsoft.com%7C5 > ffeb105737e4c00150208d6453fa46a%7C72f988bf86f141af91ab2 > d7cd011db47%7C1%7C0%7C636772536983024335&sdata=3DLOLe > zJzuK9kwu8QK78UM5nnCD%2FZEY5fxr1VQzk8sqY8%3D&reserv > ed=3D0 is submitted to delete EdkCompatibilityPkg from > edk2/master. We will work on it. >=20 > EDK II Unicode files are used for mapping token names > to localized strings that are identified by an RFC4646 > language code. The format for storing EDK II Unicode > files on disk is UTF-8 (without a BOM character) or > UTF-16LE (with a BOM character). The character content > must be UCS-2. >=20 > Thanks > Liming > >-----Original Message----- > >From: edk2-devel [mailto:edk2-devel- > bounces@lists.01.org] On Behalf Of > >Sean Brogan via edk2-devel > >Sent: Thursday, November 08, 2018 7:00 AM > >To: edk2-devel@lists.01.org > >Subject: [edk2] Edk2 uni file encoding > > > >Is there a definitive answer for the file encoding for > all UNI files in edk2? > >If not I would like to propose one. Incorrect > encoding causes tool > >issues and is something we can easily check for and > fix. > > > >Proposal: All UNI files in edk2 should be > > > > > > 1. UTF-8 > >Or > > > > 1. Use a BOM and be UTF-16 > > > >https://na01.safelinks.protection.outlook.com/?url=3Dhtt > ps%3A%2F%2Fen.wik > >ipedia.org%2Fwiki%2FByte_order_mark&data=3D02%7C01%7 > Csean.brogan%40mi > >crosoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f98 > 8bf86f141af91ab2d > >7cd011db47%7C1%7C0%7C636772536983024335&sdata=3D1IET > 4LN5l9FfMscffzgk0 > >t7IqYGyYNU9IrZafvi9osU%3D&reserved=3D0 > > > >Results from searching edk2: > >1 - UTF-16 LE BOM file: > >EdkCompatibilityPkg\Compatibility\FrameworkHiiOnUefiHi > iThunk\Strings.un > >i > >919 - Without BOM and decoded as UTF-8 > > > >Thoughts? > > > >Future question: Can we make rule for all other > standard file types > >(c, h, dec, dsc, fdf, inf,)? > > > >Thanks > >Sean > > > > > > > >_______________________________________________ > >edk2-devel mailing list > >edk2-devel@lists.01.org > >https://na01.safelinks.protection.outlook.com/?url=3Dhtt > ps%3A%2F%2Flists. > >01.org%2Fmailman%2Flistinfo%2Fedk2- > devel&data=3D02%7C01%7Csean.brogan > >%40microsoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C > 72f988bf86f141af9 > >1ab2d7cd011db47%7C1%7C0%7C636772536983024335&sdata > =3DHhfPaCyS0sKHu1fF > >Gkfh%2FQ4pm34X68YKiaM6IN7%2Fzj0%3D&reserved=3D0 > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel