public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Oram, Isaac W" <isaac.w.oram@intel.com>
To: "Kinney, Michael D" <michael.d.kinney@intel.com>,
	Sean Brogan <sean.brogan@microsoft.com>,
	"Gao, Liming" <liming.gao@intel.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Subject: Re: Edk2 uni file encoding
Date: Thu, 8 Nov 2018 16:55:27 +0000	[thread overview]
Message-ID: <3155A53C14BABF45A364D10949B7414C8A7967BB@ORSMSX116.amr.corp.intel.com> (raw)
In-Reply-To: <E92EE9817A31E24EB0585FDF735412F5B8B2C266@ORSMSX113.amr.corp.intel.com>

This info is also somewhat stated in the coding standards.  https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-Specifications 

5.1.3 Files may only contain the ASCII characters 0x0A, 0x0D, and 0x20 through 0x7E
	Files should be saved using either ASCII or UTF8 encoding.

It would be good for one of you who knows the detailed differences to clarify that text and link to the UNI spec as appropriate.

Regards,
Isaac

-----Original Message-----
From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Kinney, Michael D
Sent: Thursday, November 8, 2018 8:46 AM
To: Sean Brogan <sean.brogan@microsoft.com>; Gao, Liming <liming.gao@intel.com>; edk2-devel@lists.01.org
Subject: Re: [edk2] Edk2 uni file encoding

Sean,

As a clarification.  The UNI specs does list 2 on-disk formats.
This was done so tools could support both in the transition from UTF-16LE with BOM to UTF-8 without BOM.

The strong recommendation is for all EDK II open source packages to use UTF-8 without a BOM.  Since platform packages not maintained in EDK II could be pulling forward UNI files in UTF-16LE, we have not changed the UNI spec or tools to consider UTF-16LE as unsupported.

Doing patch email reviews of UNI files in UTF-16LE is a challenge so requiring UTF-8 without a BOM make this much easier.

The EDK II open source package conversion to UTF-8 without a BO was performed in late 2015.  Here is one example:

https://github.com/tianocore/edk2/commit/3f5287971ffdb5c42e3325a3a94c101f08d3a02a#diff-14d2171dacfcac1fd2e1b1f7b885e530

A helper python script was added to help perform these conversions:

https://github.com/tianocore/edk2/blob/master/BaseTools/Scripts/ConvertUni.py

At some point, it may make sense to *require* UTF-8 without a BOM for all UNI files and all tools and for tools to reject UNI files that are not in UTF-8 without a BOM format.

Mike

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-
> bounces@lists.01.org] On Behalf Of Sean Brogan via edk2-devel
> Sent: Wednesday, November 7, 2018 11:11 PM
> To: Gao, Liming <liming.gao@intel.com>
> Cc: edk2-devel@lists.01.org
> Subject: Re: [edk2] Edk2 uni file encoding
> 
> Liming,
> That was exactly what I was looking for.
> 
> Thanks
> Sean
> 
> 
> 
> 
> -----Original Message-----
> From: Gao, Liming <liming.gao@intel.com>
> Sent: Wednesday, November 7, 2018 10:01 PM
> To: Sean Brogan <sean.brogan@microsoft.com>
> Cc: edk2-devel@lists.01.org
> Subject: RE: Edk2 uni file encoding
> 
> Sean:
>   EDKII UNI spec
> (https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Fgithub.com%2Ftianocore%2Ftianocore.github.io
> %2Fwiki%2FEDK-II-
> Specifications&amp;data=02%7C01%7Csean.brogan%40microso
> ft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f988bf86f
> 141af91ab2d7cd011db47%7C1%7C0%7C636772536983024335&amp;
> sdata=veov60rbEtr3ub7RcreuFuqJvc4%2BdtAowph7kBGXW54%3D&
> amp;reserved=0) Chapter 2 defines UNI file format.
> EdkCompatibilityPkg is obsolete. BZ
> https://na01.safelinks.protection.outlook.com/?url=http
> s%3A%2F%2Fbugzilla.tianocore.org%2Fshow_bug.cgi%3Fid%3D
> 1103&amp;data=02%7C01%7Csean.brogan%40microsoft.com%7C5
> ffeb105737e4c00150208d6453fa46a%7C72f988bf86f141af91ab2
> d7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata=LOLe
> zJzuK9kwu8QK78UM5nnCD%2FZEY5fxr1VQzk8sqY8%3D&amp;reserv
> ed=0 is submitted to delete EdkCompatibilityPkg from edk2/master. We 
> will work on it.
> 
> EDK II Unicode files are used for mapping token names to localized 
> strings that are identified by an RFC4646 language code. The format 
> for storing EDK II Unicode files on disk is UTF-8 (without a BOM 
> character) or UTF-16LE (with a BOM character). The character content 
> must be UCS-2.
> 
> Thanks
> Liming
> >-----Original Message-----
> >From: edk2-devel [mailto:edk2-devel-
> bounces@lists.01.org] On Behalf Of
> >Sean Brogan via edk2-devel
> >Sent: Thursday, November 08, 2018 7:00 AM
> >To: edk2-devel@lists.01.org
> >Subject: [edk2] Edk2 uni file encoding
> >
> >Is there a definitive answer for the file encoding for
> all UNI files in edk2?
> >If not I would like to propose one.  Incorrect
> encoding causes tool
> >issues and is something we can easily check for and
> fix.
> >
> >Proposal: All UNI files in edk2 should be
> >
> >
> >  1.  UTF-8
> >Or
> >
> >  1.  Use a BOM and be UTF-16
> >
> >https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Fen.wik
> >ipedia.org%2Fwiki%2FByte_order_mark&amp;data=02%7C01%7
> Csean.brogan%40mi
> >crosoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C72f98
> 8bf86f141af91ab2d
> >7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata=1IET
> 4LN5l9FfMscffzgk0
> >t7IqYGyYNU9IrZafvi9osU%3D&amp;reserved=0
> >
> >Results from searching edk2:
> >1 - UTF-16 LE BOM file:
> >EdkCompatibilityPkg\Compatibility\FrameworkHiiOnUefiHi
> iThunk\Strings.un
> >i
> >919 - Without BOM and decoded as UTF-8
> >
> >Thoughts?
> >
> >Future question:  Can we make rule for all other
> standard file types
> >(c, h, dec, dsc, fdf, inf,)?
> >
> >Thanks
> >Sean
> >
> >
> >
> >_______________________________________________
> >edk2-devel mailing list
> >edk2-devel@lists.01.org
> >https://na01.safelinks.protection.outlook.com/?url=htt
> ps%3A%2F%2Flists.
> >01.org%2Fmailman%2Flistinfo%2Fedk2-
> devel&amp;data=02%7C01%7Csean.brogan
> >%40microsoft.com%7C5ffeb105737e4c00150208d6453fa46a%7C
> 72f988bf86f141af9
> >1ab2d7cd011db47%7C1%7C0%7C636772536983024335&amp;sdata
> =HhfPaCyS0sKHu1fF
> >Gkfh%2FQ4pm34X68YKiaM6IN7%2Fzj0%3D&amp;reserved=0
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


  reply	other threads:[~2018-11-08 16:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-07 23:00 Edk2 uni file encoding Sean Brogan
2018-11-08  6:00 ` Gao, Liming
2018-11-08  7:10   ` Sean Brogan
2018-11-08 16:46     ` Kinney, Michael D
2018-11-08 16:55       ` Oram, Isaac W [this message]
2018-11-08 13:38 ` Laszlo Ersek
2018-11-08 16:42 ` Leif Lindholm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3155A53C14BABF45A364D10949B7414C8A7967BB@ORSMSX116.amr.corp.intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox