From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.136; helo=mga12.intel.com; envelope-from=isaac.w.oram@intel.com; receiver=edk2-devel@lists.01.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 5324121A07A80 for ; Thu, 8 Nov 2018 08:55:29 -0800 (PST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2018 08:55:28 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,480,1534834800"; d="scan'208";a="279461638" Received: from orsmsx103.amr.corp.intel.com ([10.22.225.130]) by fmsmga006.fm.intel.com with ESMTP; 08 Nov 2018 08:55:28 -0800 Received: from orsmsx111.amr.corp.intel.com (10.22.240.12) by ORSMSX103.amr.corp.intel.com (10.22.225.130) with Microsoft SMTP Server (TLS) id 14.3.408.0; Thu, 8 Nov 2018 08:55:28 -0800 Received: from orsmsx116.amr.corp.intel.com ([169.254.7.124]) by ORSMSX111.amr.corp.intel.com ([169.254.12.187]) with mapi id 14.03.0415.000; Thu, 8 Nov 2018 08:55:27 -0800 From: "Oram, Isaac W" To: "Kinney, Michael D" , Sean Brogan , "Gao, Liming" , "edk2-devel@lists.01.org" Thread-Topic: Edk2 uni file encoding Thread-Index: AdR20C6rSm7ksjUET/Kmuyj3hkLhKAAV0kSAAAKf6tAAFCHAsAAARi9Q Date: Thu, 8 Nov 2018 16:55:27 +0000 Message-ID: <3155A53C14BABF45A364D10949B7414C8A7967BB@ORSMSX116.amr.corp.intel.com> References: <4A89E2EF3DFEDB4C8BFDE51014F606A14E366631@SHSMSX104.ccr.corp.intel.com> In-Reply-To: Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: request-justification,no-action dlp-version: 11.0.400.15 x-originating-ip: [10.22.254.140] MIME-Version: 1.0 Subject: Re: Edk2 uni file encoding X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Nov 2018 16:55:29 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable This info is also somewhat stated in the coding standards. https://github.= com/tianocore/tianocore.github.io/wiki/EDK-II-Specifications=20 5.1.3 Files may only contain the ASCII characters 0x0A, 0x0D, and 0x20 thro= ugh 0x7E Files should be saved using either ASCII or UTF8 encoding. It would be good for one of you who knows the detailed differences to clari= fy that text and link to the UNI spec as appropriate. Regards, Isaac -----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Kinn= ey, Michael D Sent: Thursday, November 8, 2018 8:46 AM To: Sean Brogan ; Gao, Liming ; edk2-devel@lists.01.org Subject: Re: [edk2] Edk2 uni file encoding Sean, As a clarification. The UNI specs does list 2 on-disk formats. This was done so tools could support both in the transition from UTF-16LE w= ith BOM to UTF-8 without BOM. The strong recommendation is for all EDK II open source packages to use UTF= -8 without a BOM. Since platform packages not maintained in EDK II could b= e pulling forward UNI files in UTF-16LE, we have not changed the UNI spec o= r tools to consider UTF-16LE as unsupported. Doing patch email reviews of UNI files in UTF-16LE is a challenge so requir= ing UTF-8 without a BOM make this much easier. The EDK II open source package conversion to UTF-8 without a BO was perform= ed in late 2015. Here is one example: https://github.com/tianocore/edk2/commit/3f5287971ffdb5c42e3325a3a94c101f08= d3a02a#diff-14d2171dacfcac1fd2e1b1f7b885e530 A helper python script was added to help perform these conversions: https://github.com/tianocore/edk2/blob/master/BaseTools/Scripts/ConvertUni.= py At some point, it may make sense to *require* UTF-8 without a BOM for all U= NI files and all tools and for tools to reject UNI files that are not in UT= F-8 without a BOM format. Mike > -----Original Message----- > From: edk2-devel [mailto:edk2-devel- > bounces@lists.01.org] On Behalf Of Sean Brogan via edk2-devel > Sent: Wednesday, November 7, 2018 11:11 PM > To: Gao, Liming > Cc: edk2-devel@lists.01.org > Subject: Re: [edk2] Edk2 uni file encoding >=20 > Liming, > That was exactly what I was looking for. >=20 > Thanks > Sean >=20 >=20 >=20 >=20 > -----Original Message----- > From: Gao, Liming > Sent: Wednesday, November 7, 2018 10:01 PM > To: Sean Brogan > Cc: edk2-devel@lists.01.org > Subject: RE: Edk2 uni file encoding >=20 > Sean: > EDKII UNI spec > (https://na01.safelinks.protection.outlook.com/?url=3Dhtt > ps%3A%2F%2Fgithub.com%2Ftianocore%2Ftianocore.github.io > %2Fwiki%2FEDK-II- > Specifications&data=3D02%7C01%7Csean.brogan%40microso > ft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f988bf86f > 141af91ab2d7cd011db47%7C1%7C0%7C636772536983024335& > sdata=3Dveov60rbEtr3ub7RcreuFuqJvc4%2BdtAowph7kBGXW54%3D& > amp;reserved=3D0) Chapter 2 defines UNI file format. > EdkCompatibilityPkg is obsolete. BZ > https://na01.safelinks.protection.outlook.com/?url=3Dhttp > s%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D > 1103&data=3D02%7C01%7Csean.brogan%40microsoft.com%7C5 > ffeb105737e4c00150208d6453fa46a%7C72f988bf86f141af91ab2 > d7cd011db47%7C1%7C0%7C636772536983024335&sdata=3DLOLe > zJzuK9kwu8QK78UM5nnCD%2FZEY5fxr1VQzk8sqY8%3D&reserv > ed=3D0 is submitted to delete EdkCompatibilityPkg from edk2/master. We=20 > will work on it. >=20 > EDK II Unicode files are used for mapping token names to localized=20 > strings that are identified by an RFC4646 language code. The format=20 > for storing EDK II Unicode files on disk is UTF-8 (without a BOM=20 > character) or UTF-16LE (with a BOM character). The character content=20 > must be UCS-2. >=20 > Thanks > Liming > >-----Original Message----- > >From: edk2-devel [mailto:edk2-devel- > bounces@lists.01.org] On Behalf Of > >Sean Brogan via edk2-devel > >Sent: Thursday, November 08, 2018 7:00 AM > >To: edk2-devel@lists.01.org > >Subject: [edk2] Edk2 uni file encoding > > > >Is there a definitive answer for the file encoding for > all UNI files in edk2? > >If not I would like to propose one. Incorrect > encoding causes tool > >issues and is something we can easily check for and > fix. > > > >Proposal: All UNI files in edk2 should be > > > > > > 1. UTF-8 > >Or > > > > 1. Use a BOM and be UTF-16 > > > >https://na01.safelinks.protection.outlook.com/?url=3Dhtt > ps%3A%2F%2Fen.wik > >ipedia.org%2Fwiki%2FByte_order_mark&data=3D02%7C01%7 > Csean.brogan%40mi > >crosoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f98 > 8bf86f141af91ab2d > >7cd011db47%7C1%7C0%7C636772536983024335&sdata=3D1IET > 4LN5l9FfMscffzgk0 > >t7IqYGyYNU9IrZafvi9osU%3D&reserved=3D0 > > > >Results from searching edk2: > >1 - UTF-16 LE BOM file: > >EdkCompatibilityPkg\Compatibility\FrameworkHiiOnUefiHi > iThunk\Strings.un > >i > >919 - Without BOM and decoded as UTF-8 > > > >Thoughts? > > > >Future question: Can we make rule for all other > standard file types > >(c, h, dec, dsc, fdf, inf,)? > > > >Thanks > >Sean > > > > > > > >_______________________________________________ > >edk2-devel mailing list > >edk2-devel@lists.01.org > >https://na01.safelinks.protection.outlook.com/?url=3Dhtt > ps%3A%2F%2Flists. > >01.org%2Fmailman%2Flistinfo%2Fedk2- > devel&data=3D02%7C01%7Csean.brogan > >%40microsoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C > 72f988bf86f141af9 > >1ab2d7cd011db47%7C1%7C0%7C636772536983024335&sdata > =3DHhfPaCyS0sKHu1fF > >Gkfh%2FQ4pm34X68YKiaM6IN7%2Fzj0%3D&reserved=3D0 > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel