From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=134.134.136.100; helo=mga07.intel.com; envelope-from=liming.gao@intel.com; receiver=edk2-devel@lists.01.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id D0A832083775C for ; Thu, 24 May 2018 01:31:20 -0700 (PDT) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 May 2018 01:31:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,436,1520924400"; d="scan'208";a="43825341" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by orsmga008.jf.intel.com with ESMTP; 24 May 2018 01:31:19 -0700 Received: from fmsmsx155.amr.corp.intel.com (10.18.116.71) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 24 May 2018 01:31:19 -0700 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by FMSMSX155.amr.corp.intel.com (10.18.116.71) with Microsoft SMTP Server (TLS) id 14.3.319.2; Thu, 24 May 2018 01:31:19 -0700 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.240]) by shsmsx102.ccr.corp.intel.com ([169.254.2.79]) with mapi id 14.03.0319.002; Thu, 24 May 2018 16:31:16 +0800 From: "Gao, Liming" To: "Carsey, Jaben" , "edk2-devel@lists.01.org" Thread-Topic: [edk2] [RFC] Formalize source files to follow DOS format Thread-Index: AQHT8L95unhbvU1d5EWbV5oLOW8ZXqQ5vnsAgATSlEA= Date: Thu, 24 May 2018 08:31:16 +0000 Message-ID: <4A89E2EF3DFEDB4C8BFDE51014F606A14E230C9B@SHSMSX104.ccr.corp.intel.com> References: <1526878301-13892-1-git-send-email-liming.gao@intel.com> In-Reply-To: Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Subject: Re: [RFC] Formalize source files to follow DOS format X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 May 2018 08:31:21 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Jaben: What difference of statement for file read/write?=20 Besides, we use .encode() here to support python 3. After we move to pyth= on 3, this script is not changed.=20 Thanks Liming >-----Original Message----- >From: Carsey, Jaben >Sent: Monday, May 21, 2018 10:50 PM >To: Gao, Liming ; edk2-devel@lists.01.org >Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format > >Liming, > >One Pep8 thing. >Can you change to use the with statement for the file read/write? > >Other small thoughts. >I think that FileList should be changed to a set as order is not important= . >Maybe wrapper the re.sub function with your own so all the .encode() are i= n >one location? As we move to python 3 we will have fewer changes to make. > > >> -----Original Message----- >> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of >> Liming Gao >> Sent: Sunday, May 20, 2018 9:52 PM >> To: edk2-devel@lists.01.org >> Subject: [edk2] [RFC] Formalize source files to follow DOS format >> >> FormatDosFiles.py is added to clean up dos source files. It bases on >> the rules defined in EDKII C Coding Standards Specification. >> 5.1.2 Do not use tab characters >> 5.1.6 Only use CRLF (Carriage Return Line Feed) line endings. >> 5.1.7 All files must end with CRLF >> No trailing white space in one line. (To be added in spec) >> >> The source files in edk2 project with the below postfix are dos format. >> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf >> .txt .bat .py >> >> The package maintainer can use this script to clean up all files in his >> package. The prefer way is to create one patch per one package. >> >> Contributed-under: TianoCore Contribution Agreement 1.1 >> Signed-off-by: Liming Gao >> --- >> BaseTools/Scripts/FormatDosFiles.py | 93 >> +++++++++++++++++++++++++++++++++++++ >> 1 file changed, 93 insertions(+) >> create mode 100644 BaseTools/Scripts/FormatDosFiles.py >> >> diff --git a/BaseTools/Scripts/FormatDosFiles.py >> b/BaseTools/Scripts/FormatDosFiles.py >> new file mode 100644 >> index 0000000..c3a5476 >> --- /dev/null >> +++ b/BaseTools/Scripts/FormatDosFiles.py >> @@ -0,0 +1,93 @@ >> +# @file FormatDosFiles.py >> +# This script format the source files to follow dos style. >> +# It supports Python2.x and Python3.x both. >> +# >> +# Copyright (c) 2018, Intel Corporation. All rights reserved.
>> +# >> +# This program and the accompanying materials >> +# are licensed and made available under the terms and conditions of th= e >> BSD License >> +# which accompanies this distribution. The full text of the license m= ay be >> found at >> +# http://opensource.org/licenses/bsd-license.php >> +# >> +# THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" >> BASIS, >> +# WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER >> EXPRESS OR IMPLIED. >> +# >> + >> +# >> +# Import Modules >> +# >> +import argparse >> +import os >> +import os.path >> +import re >> +import sys >> + >> +""" >> +difference of string between python2 and python3: >> + >> +there is a large difference of string in python2 and python3. >> + >> +in python2,there are two type string,unicode string (unicode type) and = 8- >bit >> string (str type). >> + us =3D u"abcd", >> + unicode string,which is internally stored as unicode code point. >> + s =3D "abcd",s =3D b"abcd",s =3D r"abcd", >> + all of them are 8-bit string,which is internally stored as bytes. >> + >> +in python3,a new type called bytes replace 8-bit string,and str type is >> regarded as unicode string. >> + s =3D "abcd", s =3D u"abcd", s =3D r"abcd", >> + all of them are str type,which is internally stored unicode code point= . >> + bs =3D b"abcd", >> + bytes type,which is interally stored as bytes >> + >> +in python2 ,the both type string can be mixed use,but in python3 it cou= ld >> not, >> +which means the pattern and content in re match should be the same type >> in python3. >> +in function FormatFile,it read file in binary mode so that the content = is >bytes >> type,so the pattern should also be bytes type. >> +As a result,I add encode() to make it compitable among python2 and >> python3. >> + >> +difference of encode,decode in python2 and python3: >> +the builtin function str.encode(encoding) and str.decode(encoding) are >> used for convert between 8-bit string and unicode string. >> + >> +in python2 >> + encode convert unicode type to str type.decode vice versa.default >> encoding is ascii. >> + for example: s =3D us.encode() >> + but if the us is str type,the code will also work.it will be firstly c= onvert >> to unicode type, >> + in this situation,the call equals s =3D us.decode().encode(). >> + >> +in python3 >> + encode convert str type to bytes type,decode vice versa.default >> encoding is utf8. >> + fpr example: >> + bs =3D s.encode(),only str type has encode method,so that won't be >> used wrongly.decode is the same. >> + >> +in conclusion: >> + this code could work the same in python27 and python36 >> environment as far as the re pattern satisfy ascii character set. >> + >> +""" >> +def FormatFiles(): >> + parser =3D argparse.ArgumentParser() >> + parser.add_argument('path', nargs=3D1, help=3D'The path for files t= o be >> converted.') >> + parser.add_argument('extensions', nargs=3D'+', help=3D'File extensi= ons filter. >> (Example: .txt .c .h)') >> + args =3D parser.parse_args() >> + filelist =3D [] >> + for dirpath, dirnames, filenames in os.walk(args.path[0]): >> + for filename in [f for f in filenames if any(f.endswith(ext) fo= r ext in >> args.extensions)]: >> + filelist.append(os.path.join(dirpath, filename)) >> + for file in filelist: >> + fd =3D open(file, 'rb') >> + content =3D fd.read() >> + fd.close() >> + # Convert the line endings to CRLF >> + content =3D re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), c= ontent) >> + content =3D re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, = flags =3D >> re.MULTILINE) >> + # Add a new empty line if the file is not end with one >> + content =3D re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), = content) >> + # Remove trailing white spaces >> + content =3D re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), co= ntent, flags >=3D >> re.MULTILINE) >> + # Replace '\t' with two spaces >> + content =3D re.sub('\t'.encode(), ' '.encode(), content) >> + fd =3D open(file, 'wb') >> + fd.write(content) >> + fd.close() >> + print(file) >> + >> +if __name__ =3D=3D "__main__": >> + sys.exit(FormatFiles()) >> \ No newline at end of file >> -- >> 2.8.0.windows.1 >> >> _______________________________________________ >> edk2-devel mailing list >> edk2-devel@lists.01.org >> https://lists.01.org/mailman/listinfo/edk2-devel