From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=192.55.52.115; helo=mga14.intel.com; envelope-from=michael.d.kinney@intel.com; receiver=edk2-devel@lists.01.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id EEAF1203B8595 for ; Mon, 21 May 2018 15:58:46 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 May 2018 15:58:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,427,1520924400"; d="scan'208";a="43102952" Received: from orsmsx102.amr.corp.intel.com ([10.22.225.129]) by orsmga008.jf.intel.com with ESMTP; 21 May 2018 15:58:46 -0700 Received: from orsmsx113.amr.corp.intel.com ([169.254.9.119]) by ORSMSX102.amr.corp.intel.com ([169.254.3.153]) with mapi id 14.03.0319.002; Mon, 21 May 2018 15:58:45 -0700 From: "Kinney, Michael D" To: "Carsey, Jaben" , "Kinney, Michael D" CC: "Gao, Liming" , "edk2-devel@lists.01.org" Thread-Topic: [edk2] [RFC] Formalize source files to follow DOS format Thread-Index: AQHT8L949ajxRTtA20mkJbQe5Aj6YqQ6ufAAgAAMfLCAAHewAP//jr0w Date: Mon, 21 May 2018 22:58:45 +0000 Message-ID: References: <1526878301-13892-1-git-send-email-liming.gao@intel.com> , <05BFD767-DC75-401E-B651-6333815FDDFF@intel.com> In-Reply-To: <05BFD767-DC75-401E-B651-6333815FDDFF@intel.com> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.200.100 dlp-reaction: no-action x-originating-ip: [10.22.254.139] MIME-Version: 1.0 Subject: Re: [RFC] Formalize source files to follow DOS format X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 May 2018 22:58:47 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Jaben, Yes. With default behavior is default set and=20 specifying one or more extensions overrides the=20 default set. Mike > -----Original Message----- > From: Carsey, Jaben > Sent: Monday, May 21, 2018 3:43 PM > To: Kinney, Michael D > Cc: Gao, Liming ; edk2- > devel@lists.01.org > Subject: Re: [edk2] [RFC] Formalize source files to > follow DOS format >=20 > Mike, >=20 > Perhaps a default set of file extensions that can be > overridden? >=20 > -Jaben >=20 >=20 > > On May 21, 2018, at 3:41 PM, Kinney, Michael D > wrote: > > > > Liming, > > > > We have a set of standard flags for tools that > > should always be present. > > > > --help > > -v > > -q > > --debug > > > > We should also always have the program name, > > description, version, and copyright. > > > > Please see BaseTools/Scripts/BinToPcd.py as > > an example. > > > > It might be useful to have a way to run this tool > > on a single file when BaseTools/Scripts/PatchCheck.py > > reports an issue. > > > > Do you think it would be good to have one option to > > scan path for file extensions that are documented as > > DOS line endings so the extensions do not have to be > > entered? > > > > Mike > > > > > >> -----Original Message----- > >> From: edk2-devel [mailto:edk2-devel- > >> bounces@lists.01.org] On Behalf Of Carsey, Jaben > >> Sent: Monday, May 21, 2018 7:50 AM > >> To: Gao, Liming ; edk2- > >> devel@lists.01.org > >> Subject: Re: [edk2] [RFC] Formalize source files to > >> follow DOS format > >> > >> Liming, > >> > >> One Pep8 thing. > >> Can you change to use the with statement for the file > >> read/write? > >> > >> Other small thoughts. > >> I think that FileList should be changed to a set as > >> order is not important. > >> Maybe wrapper the re.sub function with your own so > all > >> the .encode() are in one location? As we move to > python > >> 3 we will have fewer changes to make. > >> > >> > >>> -----Original Message----- > >>> From: edk2-devel [mailto:edk2-devel- > >> bounces@lists.01.org] On Behalf Of > >>> Liming Gao > >>> Sent: Sunday, May 20, 2018 9:52 PM > >>> To: edk2-devel@lists.01.org > >>> Subject: [edk2] [RFC] Formalize source files to > follow > >> DOS format > >>> > >>> FormatDosFiles.py is added to clean up dos source > >> files. It bases on > >>> the rules defined in EDKII C Coding Standards > >> Specification. > >>> 5.1.2 Do not use tab characters > >>> 5.1.6 Only use CRLF (Carriage Return Line Feed) line > >> endings. > >>> 5.1.7 All files must end with CRLF > >>> No trailing white space in one line. (To be added in > >> spec) > >>> > >>> The source files in edk2 project with the below > >> postfix are dos format. > >>> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni > >> .asl .aslc .vfr .idf > >>> .txt .bat .py > >>> > >>> The package maintainer can use this script to clean > up > >> all files in his > >>> package. The prefer way is to create one patch per > one > >> package. > >>> > >>> Contributed-under: TianoCore Contribution Agreement > >> 1.1 > >>> Signed-off-by: Liming Gao > >>> --- > >>> BaseTools/Scripts/FormatDosFiles.py | 93 > >>> +++++++++++++++++++++++++++++++++++++ > >>> 1 file changed, 93 insertions(+) > >>> create mode 100644 > >> BaseTools/Scripts/FormatDosFiles.py > >>> > >>> diff --git a/BaseTools/Scripts/FormatDosFiles.py > >>> b/BaseTools/Scripts/FormatDosFiles.py > >>> new file mode 100644 > >>> index 0000000..c3a5476 > >>> --- /dev/null > >>> +++ b/BaseTools/Scripts/FormatDosFiles.py > >>> @@ -0,0 +1,93 @@ > >>> +# @file FormatDosFiles.py > >>> +# This script format the source files to follow dos > >> style. > >>> +# It supports Python2.x and Python3.x both. > >>> +# > >>> +# Copyright (c) 2018, Intel Corporation. All > rights > >> reserved.
> >>> +# > >>> +# This program and the accompanying materials > >>> +# are licensed and made available under the terms > >> and conditions of the > >>> BSD License > >>> +# which accompanies this distribution. The full > >> text of the license may be > >>> found at > >>> +# http://opensource.org/licenses/bsd-license.php > >>> +# > >>> +# THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE > >> ON AN "AS IS" > >>> BASIS, > >>> +# WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY > KIND, > >> EITHER > >>> EXPRESS OR IMPLIED. > >>> +# > >>> + > >>> +# > >>> +# Import Modules > >>> +# > >>> +import argparse > >>> +import os > >>> +import os.path > >>> +import re > >>> +import sys > >>> + > >>> +""" > >>> +difference of string between python2 and python3: > >>> + > >>> +there is a large difference of string in python2 > and > >> python3. > >>> + > >>> +in python2,there are two type string,unicode string > >> (unicode type) and 8-bit > >>> string (str type). > >>> + us =3D u"abcd", > >>> + unicode string,which is internally stored as > unicode > >> code point. > >>> + s =3D "abcd",s =3D b"abcd",s =3D r"abcd", > >>> + all of them are 8-bit string,which is > internally > >> stored as bytes. > >>> + > >>> +in python3,a new type called bytes replace 8-bit > >> string,and str type is > >>> regarded as unicode string. > >>> + s =3D "abcd", s =3D u"abcd", s =3D r"abcd", > >>> + all of them are str type,which is internally > stored > >> unicode code point. > >>> + bs =3D b"abcd", > >>> + bytes type,which is interally stored as bytes > >>> + > >>> +in python2 ,the both type string can be mixed > use,but > >> in python3 it could > >>> not, > >>> +which means the pattern and content in re match > >> should be the same type > >>> in python3. > >>> +in function FormatFile,it read file in binary mode > so > >> that the content is bytes > >>> type,so the pattern should also be bytes type. > >>> +As a result,I add encode() to make it compitable > >> among python2 and > >>> python3. > >>> + > >>> +difference of encode,decode in python2 and python3: > >>> +the builtin function str.encode(encoding) and > >> str.decode(encoding) are > >>> used for convert between 8-bit string and unicode > >> string. > >>> + > >>> +in python2 > >>> + encode convert unicode type to str type.decode > vice > >> versa.default > >>> encoding is ascii. > >>> + for example: s =3D us.encode() > >>> + but if the us is str type,the code will also > work.it > >> will be firstly convert > >>> to unicode type, > >>> + in this situation,the call equals s =3D > >> us.decode().encode(). > >>> + > >>> +in python3 > >>> + encode convert str type to bytes type,decode > vice > >> versa.default > >>> encoding is utf8. > >>> + fpr example: > >>> + bs =3D s.encode(),only str type has encode > method,so > >> that won't be > >>> used wrongly.decode is the same. > >>> + > >>> +in conclusion: > >>> + this code could work the same in python27 and > >> python36 > >>> environment as far as the re pattern satisfy ascii > >> character set. > >>> + > >>> +""" > >>> +def FormatFiles(): > >>> + parser =3D argparse.ArgumentParser() > >>> + parser.add_argument('path', nargs=3D1, help=3D'The > >> path for files to be > >>> converted.') > >>> + parser.add_argument('extensions', nargs=3D'+', > >> help=3D'File extensions filter. > >>> (Example: .txt .c .h)') > >>> + args =3D parser.parse_args() > >>> + filelist =3D [] > >>> + for dirpath, dirnames, filenames in > >> os.walk(args.path[0]): > >>> + for filename in [f for f in filenames if > >> any(f.endswith(ext) for ext in > >>> args.extensions)]: > >>> + filelist.append(os.path.join(dirpath, > >> filename)) > >>> + for file in filelist: > >>> + fd =3D open(file, 'rb') > >>> + content =3D fd.read() > >>> + fd.close() > >>> + # Convert the line endings to CRLF > >>> + content =3D re.sub(r'([^\r])\n'.encode(), > >> r'\1\r\n'.encode(), content) > >>> + content =3D re.sub(r'^\n'.encode(), > >> r'\r\n'.encode(), content, flags =3D > >>> re.MULTILINE) > >>> + # Add a new empty line if the file is not > end > >> with one > >>> + content =3D re.sub(r'([^\r\n])$'.encode(), > >> r'\1\r\n'.encode(), content) > >>> + # Remove trailing white spaces > >>> + content =3D re.sub(r'[ \t]+(\r\n)'.encode(), > >> r'\1'.encode(), content, flags =3D > >>> re.MULTILINE) > >>> + # Replace '\t' with two spaces > >>> + content =3D re.sub('\t'.encode(), ' > >> '.encode(), content) > >>> + fd =3D open(file, 'wb') > >>> + fd.write(content) > >>> + fd.close() > >>> + print(file) > >>> + > >>> +if __name__ =3D=3D "__main__": > >>> + sys.exit(FormatFiles()) > >>> \ No newline at end of file > >>> -- > >>> 2.8.0.windows.1 > >>> > >>> _______________________________________________ > >>> edk2-devel mailing list > >>> edk2-devel@lists.01.org > >>> https://lists.01.org/mailman/listinfo/edk2-devel > >> _______________________________________________ > >> edk2-devel mailing list > >> edk2-devel@lists.01.org > >> https://lists.01.org/mailman/listinfo/edk2-devel