public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Gao, Liming" <liming.gao@intel.com>
To: "Carsey, Jaben" <jaben.carsey@intel.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Subject: Re: [RFC] Formalize source files to follow DOS format
Date: Thu, 24 May 2018 08:31:16 +0000	[thread overview]
Message-ID: <4A89E2EF3DFEDB4C8BFDE51014F606A14E230C9B@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <CB6E33457884FA40993F35157061515CA3D002A4@FMSMSX103.amr.corp.intel.com>

Jaben:
  What difference of statement for file read/write? 

  Besides, we use .encode() here to support python 3. After we move to python 3, this script is not changed. 

Thanks
Liming
>-----Original Message-----
>From: Carsey, Jaben
>Sent: Monday, May 21, 2018 10:50 PM
>To: Gao, Liming <liming.gao@intel.com>; edk2-devel@lists.01.org
>Subject: RE: [edk2] [RFC] Formalize source files to follow DOS format
>
>Liming,
>
>One Pep8 thing.
>Can you change to use the with statement for the file read/write?
>
>Other small thoughts.
>I think that FileList should be changed to a set as order is not important.
>Maybe wrapper the re.sub function with your own so all the .encode() are in
>one location?  As we move to python 3 we will have fewer changes to make.
>
>
>> -----Original Message-----
>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of
>> Liming Gao
>> Sent: Sunday, May 20, 2018 9:52 PM
>> To: edk2-devel@lists.01.org
>> Subject: [edk2] [RFC] Formalize source files to follow DOS format
>>
>> FormatDosFiles.py is added to clean up dos source files. It bases on
>> the rules defined in EDKII C Coding Standards Specification.
>> 5.1.2 Do not use tab characters
>> 5.1.6 Only use CRLF (Carriage Return Line Feed) line endings.
>> 5.1.7 All files must end with CRLF
>> No trailing white space in one line. (To be added in spec)
>>
>> The source files in edk2 project with the below postfix are dos format.
>> .h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf
>> .txt .bat .py
>>
>> The package maintainer can use this script to clean up all files in his
>> package. The prefer way is to create one patch per one package.
>>
>> Contributed-under: TianoCore Contribution Agreement 1.1
>> Signed-off-by: Liming Gao <liming.gao@intel.com>
>> ---
>>  BaseTools/Scripts/FormatDosFiles.py | 93
>> +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 93 insertions(+)
>>  create mode 100644 BaseTools/Scripts/FormatDosFiles.py
>>
>> diff --git a/BaseTools/Scripts/FormatDosFiles.py
>> b/BaseTools/Scripts/FormatDosFiles.py
>> new file mode 100644
>> index 0000000..c3a5476
>> --- /dev/null
>> +++ b/BaseTools/Scripts/FormatDosFiles.py
>> @@ -0,0 +1,93 @@
>> +# @file FormatDosFiles.py
>> +# This script format the source files to follow dos style.
>> +# It supports Python2.x and Python3.x both.
>> +#
>> +#  Copyright (c) 2018, Intel Corporation. All rights reserved.<BR>
>> +#
>> +#  This program and the accompanying materials
>> +#  are licensed and made available under the terms and conditions of the
>> BSD License
>> +#  which accompanies this distribution.  The full text of the license may be
>> found at
>> +#  http://opensource.org/licenses/bsd-license.php
>> +#
>> +#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS"
>> BASIS,
>> +#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER
>> EXPRESS OR IMPLIED.
>> +#
>> +
>> +#
>> +# Import Modules
>> +#
>> +import argparse
>> +import os
>> +import os.path
>> +import re
>> +import sys
>> +
>> +"""
>> +difference of string between python2 and python3:
>> +
>> +there is a large difference of string in python2 and python3.
>> +
>> +in python2,there are two type string,unicode string (unicode type) and 8-
>bit
>> string (str type).
>> +	us = u"abcd",
>> +	unicode string,which is internally stored as unicode code point.
>> +	s = "abcd",s = b"abcd",s = r"abcd",
>> +	all of them are 8-bit string,which is internally stored as bytes.
>> +
>> +in python3,a new type called bytes replace 8-bit string,and str type is
>> regarded as unicode string.
>> +	s = "abcd", s = u"abcd", s = r"abcd",
>> +	all of them are str type,which is internally stored unicode code point.
>> +	bs = b"abcd",
>> +	bytes type,which is interally stored as bytes
>> +
>> +in python2 ,the both type string can be mixed use,but in python3 it could
>> not,
>> +which means the pattern and content in re match should be the same type
>> in python3.
>> +in function FormatFile,it read file in binary mode so that the content is
>bytes
>> type,so the pattern should also be bytes type.
>> +As a result,I add encode() to make it compitable among python2 and
>> python3.
>> +
>> +difference of encode,decode in python2 and python3:
>> +the builtin function str.encode(encoding) and str.decode(encoding) are
>> used for convert between 8-bit string and unicode string.
>> +
>> +in python2
>> +	encode convert unicode type to str type.decode vice versa.default
>> encoding is ascii.
>> +	for example: s = us.encode()
>> +	but if the us is str type,the code will also work.it will be firstly convert
>> to unicode type,
>> +	in this situation,the call equals s = us.decode().encode().
>> +
>> +in python3
>> +	encode convert str type to bytes type,decode vice versa.default
>> encoding is utf8.
>> +	fpr example:
>> +	bs = s.encode(),only str type has encode method,so that won't be
>> used wrongly.decode is the same.
>> +
>> +in conclusion:
>> +	this code could work the same in python27 and python36
>> environment as far as the re pattern satisfy ascii character set.
>> +
>> +"""
>> +def FormatFiles():
>> +    parser = argparse.ArgumentParser()
>> +    parser.add_argument('path', nargs=1, help='The path for files to be
>> converted.')
>> +    parser.add_argument('extensions', nargs='+', help='File extensions filter.
>> (Example: .txt .c .h)')
>> +    args = parser.parse_args()
>> +    filelist = []
>> +    for dirpath, dirnames, filenames in os.walk(args.path[0]):
>> +        for filename in [f for f in filenames if any(f.endswith(ext) for ext in
>> args.extensions)]:
>> +            filelist.append(os.path.join(dirpath, filename))
>> +    for file in filelist:
>> +        fd = open(file, 'rb')
>> +        content = fd.read()
>> +        fd.close()
>> +        # Convert the line endings to CRLF
>> +        content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content)
>> +        content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags =
>> re.MULTILINE)
>> +        # Add a new empty line if the file is not end with one
>> +        content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content)
>> +        # Remove trailing white spaces
>> +        content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content, flags
>=
>> re.MULTILINE)
>> +        # Replace '\t' with two spaces
>> +        content = re.sub('\t'.encode(), '  '.encode(), content)
>> +        fd = open(file, 'wb')
>> +        fd.write(content)
>> +        fd.close()
>> +        print(file)
>> +
>> +if __name__ == "__main__":
>> +    sys.exit(FormatFiles())
>> \ No newline at end of file
>> --
>> 2.8.0.windows.1
>>
>> _______________________________________________
>> edk2-devel mailing list
>> edk2-devel@lists.01.org
>> https://lists.01.org/mailman/listinfo/edk2-devel


  parent reply	other threads:[~2018-05-24  8:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21  4:51 [RFC] Formalize source files to follow DOS format Liming Gao
2018-05-21 14:50 ` Carsey, Jaben
2018-05-21 22:41   ` Kinney, Michael D
2018-05-21 22:43     ` Carsey, Jaben
2018-05-21 22:58       ` Kinney, Michael D
2018-05-24  8:35         ` Gao, Liming
2018-05-24 14:13           ` Carsey, Jaben
2018-05-25  2:24             ` Gao, Liming
2018-05-24  8:31   ` Gao, Liming [this message]
2018-05-24 14:13     ` Carsey, Jaben

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A89E2EF3DFEDB4C8BFDE51014F606A14E230C9B@SHSMSX104.ccr.corp.intel.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox