Re: [RFC] Formalize source files to follow DOS format

From: "Carsey, Jaben" <jaben.carsey@intel.com>
To: "Gao, Liming" <liming.gao@intel.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Subject: Re: [RFC] Formalize source files to follow DOS format
Date: Mon, 21 May 2018 14:50:22 +0000	[thread overview]
Message-ID: <CB6E33457884FA40993F35157061515CA3D002A4@FMSMSX103.amr.corp.intel.com> (raw)
In-Reply-To: <1526878301-13892-1-git-send-email-liming.gao@intel.com>

Liming,

One Pep8 thing.
Can you change to use the with statement for the file read/write?

Other small thoughts.
I think that FileList should be changed to a set as order is not important.
Maybe wrapper the re.sub function with your own so all the .encode() are in one location?  As we move to python 3 we will have fewer changes to make.

> -----Original Message-----
> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
> Liming Gao
> Sent: Sunday, May 20, 2018 9:52 PM
> To: edk2-devel@lists.01.org
> Subject: [edk2] [RFC] Formalize source files to follow DOS format
> 
> FormatDosFiles.py is added to clean up dos source files. It bases on
> the rules defined in EDKII C Coding Standards Specification.
> 5.1.2 Do not use tab characters
> 5.1.6 Only use CRLF (Carriage Return Line Feed) line endings.
> 5.1.7 All files must end with CRLF
> No trailing white space in one line. (To be added in spec)
> 
> The source files in edk2 project with the below postfix are dos format.
> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf
> .txt .bat .py
> 
> The package maintainer can use this script to clean up all files in his
> package. The prefer way is to create one patch per one package.
> 
> Contributed-under: TianoCore Contribution Agreement 1.1
> Signed-off-by: Liming Gao <liming.gao@intel.com>
> ---
>  BaseTools/Scripts/FormatDosFiles.py | 93
> +++++++++++++++++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
>  create mode 100644 BaseTools/Scripts/FormatDosFiles.py
> 
> diff --git a/BaseTools/Scripts/FormatDosFiles.py
> b/BaseTools/Scripts/FormatDosFiles.py
> new file mode 100644
> index 0000000..c3a5476
> --- /dev/null
> +++ b/BaseTools/Scripts/FormatDosFiles.py
> @@ -0,0 +1,93 @@
> +# @file FormatDosFiles.py
> +# This script format the source files to follow dos style.
> +# It supports Python2.x and Python3.x both.
> +#
> +#  Copyright (c) 2018, Intel Corporation. All rights reserved.<BR>
> +#
> +#  This program and the accompanying materials
> +#  are licensed and made available under the terms and conditions of the
> BSD License
> +#  which accompanies this distribution.  The full text of the license may be
> found at
> +#  http://opensource.org/licenses/bsd-license.php
> +#
> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS"
> BASIS,
> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
> EXPRESS OR IMPLIED.
> +#
> +
> +#
> +# Import Modules
> +#
> +import argparse
> +import os
> +import os.path
> +import re
> +import sys
> +
> +"""
> +difference of string between python2 and python3:
> +
> +there is a large difference of string in python2 and python3.
> +
> +in python2,there are two type string,unicode string (unicode type) and 8-bit
> string (str type).
> +	us = u"abcd",
> +	unicode string,which is internally stored as unicode code point.
> +	s = "abcd",s = b"abcd",s = r"abcd",
> +	all of them are 8-bit string,which is internally stored as bytes.
> +
> +in python3,a new type called bytes replace 8-bit string,and str type is
> regarded as unicode string.
> +	s = "abcd", s = u"abcd", s = r"abcd",
> +	all of them are str type,which is internally stored unicode code point.
> +	bs = b"abcd",
> +	bytes type,which is interally stored as bytes
> +
> +in python2 ,the both type string can be mixed use,but in python3 it could
> not,
> +which means the pattern and content in re match should be the same type
> in python3.
> +in function FormatFile,it read file in binary mode so that the content is bytes
> type,so the pattern should also be bytes type.
> +As a result,I add encode() to make it compitable among python2 and
> python3.
> +
> +difference of encode,decode in python2 and python3:
> +the builtin function str.encode(encoding) and str.decode(encoding) are
> used for convert between 8-bit string and unicode string.
> +
> +in python2
> +	encode convert unicode type to str type.decode vice versa.default
> encoding is ascii.
> +	for example: s = us.encode()
> +	but if the us is str type,the code will also work.it will be firstly convert
> to unicode type,
> +	in this situation,the call equals s = us.decode().encode().
> +
> +in python3
> +	encode convert str type to bytes type,decode vice versa.default
> encoding is utf8.
> +	fpr example:
> +	bs = s.encode(),only str type has encode method,so that won't be
> used wrongly.decode is the same.
> +
> +in conclusion:
> +	this code could work the same in python27 and python36
> environment as far as the re pattern satisfy ascii character set.
> +
> +"""
> +def FormatFiles():
> +    parser = argparse.ArgumentParser()
> +    parser.add_argument('path', nargs=1, help='The path for files to be
> converted.')
> +    parser.add_argument('extensions', nargs='+', help='File extensions filter.
> (Example: .txt .c .h)')
> +    args = parser.parse_args()
> +    filelist = []
> +    for dirpath, dirnames, filenames in os.walk(args.path[0]):
> +        for filename in [f for f in filenames if any(f.endswith(ext) for ext in
> args.extensions)]:
> +            filelist.append(os.path.join(dirpath, filename))
> +    for file in filelist:
> +        fd = open(file, 'rb')
> +        content = fd.read()
> +        fd.close()
> +        # Convert the line endings to CRLF
> +        content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content)
> +        content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags =
> re.MULTILINE)
> +        # Add a new empty line if the file is not end with one
> +        content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content)
> +        # Remove trailing white spaces
> +        content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content, flags =
> re.MULTILINE)
> +        # Replace '\t' with two spaces
> +        content = re.sub('\t'.encode(), '  '.encode(), content)
> +        fd = open(file, 'wb')
> +        fd.write(content)
> +        fd.close()
> +        print(file)
> +
> +if __name__ == "__main__":
> +    sys.exit(FormatFiles())
> \ No newline at end of file
> --
> 2.8.0.windows.1
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel