public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [RFC] Formalize source files to follow DOS format
@ 2018-05-21  4:51 Liming Gao
  2018-05-21 14:50 ` Carsey, Jaben
  0 siblings, 1 reply; 10+ messages in thread
From: Liming Gao @ 2018-05-21  4:51 UTC (permalink / raw)
  To: edk2-devel

FormatDosFiles.py is added to clean up dos source files. It bases on
the rules defined in EDKII C Coding Standards Specification.
5.1.2 Do not use tab characters
5.1.6 Only use CRLF (Carriage Return Line Feed) line endings.
5.1.7 All files must end with CRLF
No trailing white space in one line. (To be added in spec)

The source files in edk2 project with the below postfix are dos format.
.h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf 
.txt .bat .py

The package maintainer can use this script to clean up all files in his 
package. The prefer way is to create one patch per one package.

Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Liming Gao <liming.gao@intel.com>
---
 BaseTools/Scripts/FormatDosFiles.py | 93 +++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)
 create mode 100644 BaseTools/Scripts/FormatDosFiles.py

diff --git a/BaseTools/Scripts/FormatDosFiles.py b/BaseTools/Scripts/FormatDosFiles.py
new file mode 100644
index 0000000..c3a5476
--- /dev/null
+++ b/BaseTools/Scripts/FormatDosFiles.py
@@ -0,0 +1,93 @@
+# @file FormatDosFiles.py
+# This script format the source files to follow dos style.
+# It supports Python2.x and Python3.x both.
+#
+#  Copyright (c) 2018, Intel Corporation. All rights reserved.<BR>
+#
+#  This program and the accompanying materials
+#  are licensed and made available under the terms and conditions of the BSD License
+#  which accompanies this distribution.  The full text of the license may be found at
+#  http://opensource.org/licenses/bsd-license.php
+#
+#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+#
+
+#
+# Import Modules
+#
+import argparse
+import os
+import os.path
+import re
+import sys
+
+"""
+difference of string between python2 and python3:
+
+there is a large difference of string in python2 and python3.
+
+in python2,there are two type string,unicode string (unicode type) and 8-bit string (str type).
+	us = u"abcd",
+	unicode string,which is internally stored as unicode code point.
+	s = "abcd",s = b"abcd",s = r"abcd",
+	all of them are 8-bit string,which is internally stored as bytes.
+
+in python3,a new type called bytes replace 8-bit string,and str type is regarded as unicode string.
+	s = "abcd", s = u"abcd", s = r"abcd",
+	all of them are str type,which is internally stored unicode code point.
+	bs = b"abcd",
+	bytes type,which is interally stored as bytes
+
+in python2 ,the both type string can be mixed use,but in python3 it could not,
+which means the pattern and content in re match should be the same type in python3.
+in function FormatFile,it read file in binary mode so that the content is bytes type,so the pattern should also be bytes type.
+As a result,I add encode() to make it compitable among python2 and python3.
+  
+difference of encode,decode in python2 and python3: 
+the builtin function str.encode(encoding) and str.decode(encoding) are used for convert between 8-bit string and unicode string.
+
+in python2
+	encode convert unicode type to str type.decode vice versa.default encoding is ascii.
+	for example: s = us.encode()
+	but if the us is str type,the code will also work.it will be firstly convert to unicode type,
+	in this situation,the call equals s = us.decode().encode().
+
+in python3
+	encode convert str type to bytes type,decode vice versa.default encoding is utf8.
+	fpr example:
+	bs = s.encode(),only str type has encode method,so that won't be used wrongly.decode is the same.
+	
+in conclusion:
+	this code could work the same in python27 and python36 environment as far as the re pattern satisfy ascii character set.
+
+"""
+def FormatFiles():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('path', nargs=1, help='The path for files to be converted.')
+    parser.add_argument('extensions', nargs='+', help='File extensions filter. (Example: .txt .c .h)')
+    args = parser.parse_args()
+    filelist = []
+    for dirpath, dirnames, filenames in os.walk(args.path[0]):
+        for filename in [f for f in filenames if any(f.endswith(ext) for ext in args.extensions)]:
+            filelist.append(os.path.join(dirpath, filename))
+    for file in filelist:
+        fd = open(file, 'rb')
+        content = fd.read()
+        fd.close()
+        # Convert the line endings to CRLF
+        content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content)
+        content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags = re.MULTILINE)
+        # Add a new empty line if the file is not end with one
+        content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content)
+        # Remove trailing white spaces
+        content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content, flags = re.MULTILINE)
+        # Replace '\t' with two spaces
+        content = re.sub('\t'.encode(), '  '.encode(), content)
+        fd = open(file, 'wb')
+        fd.write(content)
+        fd.close()
+        print(file)
+
+if __name__ == "__main__":
+    sys.exit(FormatFiles())
\ No newline at end of file
-- 
2.8.0.windows.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-05-25  2:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-21  4:51 [RFC] Formalize source files to follow DOS format Liming Gao
2018-05-21 14:50 ` Carsey, Jaben
2018-05-21 22:41   ` Kinney, Michael D
2018-05-21 22:43     ` Carsey, Jaben
2018-05-21 22:58       ` Kinney, Michael D
2018-05-24  8:35         ` Gao, Liming
2018-05-24 14:13           ` Carsey, Jaben
2018-05-25  2:24             ` Gao, Liming
2018-05-24  8:31   ` Gao, Liming
2018-05-24 14:13     ` Carsey, Jaben

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox