From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <liming.gao@intel.com>
Received-SPF: Pass (sender SPF authorized) identity=mailfrom;
 client-ip=134.134.136.31; helo=mga06.intel.com;
 envelope-from=liming.gao@intel.com; receiver=edk2-devel@lists.01.org 
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by ml01.01.org (Postfix) with ESMTPS id 61372207E53E2
 for <edk2-devel@lists.01.org>; Sun, 20 May 2018 21:52:00 -0700 (PDT)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
 by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 20 May 2018 21:52:00 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,425,1520924400"; d="scan'208";a="225852639"
Received: from shwde7172.ccr.corp.intel.com ([10.239.158.42])
 by orsmga005.jf.intel.com with ESMTP; 20 May 2018 21:51:59 -0700
From: Liming Gao <liming.gao@intel.com>
To: edk2-devel@lists.01.org
Date: Mon, 21 May 2018 12:51:41 +0800
Message-Id: <1526878301-13892-1-git-send-email-liming.gao@intel.com>
X-Mailer: git-send-email 2.8.0.windows.1
Subject: [RFC] Formalize source files to follow DOS format
X-BeenThere: edk2-devel@lists.01.org
X-Mailman-Version: 2.1.26
Precedence: list
List-Id: EDK II Development  <edk2-devel.lists.01.org>
List-Unsubscribe: <https://lists.01.org/mailman/options/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=unsubscribe>
List-Archive: <http://lists.01.org/pipermail/edk2-devel/>
List-Post: <mailto:edk2-devel@lists.01.org>
List-Help: <mailto:edk2-devel-request@lists.01.org?subject=help>
List-Subscribe: <https://lists.01.org/mailman/listinfo/edk2-devel>,
 <mailto:edk2-devel-request@lists.01.org?subject=subscribe>
X-List-Received-Date: Mon, 21 May 2018 04:52:01 -0000

FormatDosFiles.py is added to clean up dos source files. It bases on
the rules defined in EDKII C Coding Standards Specification.
5.1.2 Do not use tab characters
5.1.6 Only use CRLF (Carriage Return Line Feed) line endings.
5.1.7 All files must end with CRLF
No trailing white space in one line. (To be added in spec)

The source files in edk2 project with the below postfix are dos format.
.h .c .nasm .nasmb .asm .S .inf .dec .dsc .fdf .uni .asl .aslc .vfr .idf 
.txt .bat .py

The package maintainer can use this script to clean up all files in his 
package. The prefer way is to create one patch per one package.

Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Liming Gao <liming.gao@intel.com>
---
 BaseTools/Scripts/FormatDosFiles.py | 93 +++++++++++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)
 create mode 100644 BaseTools/Scripts/FormatDosFiles.py
diff --git a/BaseTools/Scripts/FormatDosFiles.py b/BaseTools/Scripts/FormatDosFiles.py
new file mode 100644
index 0000000..c3a5476
--- /dev/null
+++ b/BaseTools/Scripts/FormatDosFiles.py
@@ -0,0 +1,93 @@
+# @file FormatDosFiles.py
+# This script format the source files to follow dos style.
+# It supports Python2.x and Python3.x both.
+#
+#  Copyright (c) 2018, Intel Corporation. All rights reserved.<BR>
+#
+#  This program and the accompanying materials
+#  are licensed and made available under the terms and conditions of the BSD License
+#  which accompanies this distribution.  The full text of the license may be found at
+#  http://opensource.org/licenses/bsd-license.php
+#
+#  THE PROGRAM IS DISTRIBUTED UNDER THE BSD LICENSE ON AN "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR REPRESENTATIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED.
+#
+
+#
+# Import Modules
+#
+import argparse
+import os
+import os.path
+import re
+import sys
+
+"""
+difference of string between python2 and python3:
+
+there is a large difference of string in python2 and python3.
+
+in python2,there are two type string,unicode string (unicode type) and 8-bit string (str type).
+	us = u"abcd",
+	unicode string,which is internally stored as unicode code point.
+	s = "abcd",s = b"abcd",s = r"abcd",
+	all of them are 8-bit string,which is internally stored as bytes.
+
+in python3,a new type called bytes replace 8-bit string,and str type is regarded as unicode string.
+	s = "abcd", s = u"abcd", s = r"abcd",
+	all of them are str type,which is internally stored unicode code point.
+	bs = b"abcd",
+	bytes type,which is interally stored as bytes
+
+in python2 ,the both type string can be mixed use,but in python3 it could not,
+which means the pattern and content in re match should be the same type in python3.
+in function FormatFile,it read file in binary mode so that the content is bytes type,so the pattern should also be bytes type.
+As a result,I add encode() to make it compitable among python2 and python3.
+  
+difference of encode,decode in python2 and python3: 
+the builtin function str.encode(encoding) and str.decode(encoding) are used for convert between 8-bit string and unicode string.
+
+in python2
+	encode convert unicode type to str type.decode vice versa.default encoding is ascii.
+	for example: s = us.encode()
+	but if the us is str type,the code will also work.it will be firstly convert to unicode type,
+	in this situation,the call equals s = us.decode().encode().
+
+in python3
+	encode convert str type to bytes type,decode vice versa.default encoding is utf8.
+	fpr example:
+	bs = s.encode(),only str type has encode method,so that won't be used wrongly.decode is the same.
+	
+in conclusion:
+	this code could work the same in python27 and python36 environment as far as the re pattern satisfy ascii character set.
+
+"""
+def FormatFiles():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('path', nargs=1, help='The path for files to be converted.')
+    parser.add_argument('extensions', nargs='+', help='File extensions filter. (Example: .txt .c .h)')
+    args = parser.parse_args()
+    filelist = []
+    for dirpath, dirnames, filenames in os.walk(args.path[0]):
+        for filename in [f for f in filenames if any(f.endswith(ext) for ext in args.extensions)]:
+            filelist.append(os.path.join(dirpath, filename))
+    for file in filelist:
+        fd = open(file, 'rb')
+        content = fd.read()
+        fd.close()
+        # Convert the line endings to CRLF
+        content = re.sub(r'([^\r])\n'.encode(), r'\1\r\n'.encode(), content)
+        content = re.sub(r'^\n'.encode(), r'\r\n'.encode(), content, flags = re.MULTILINE)
+        # Add a new empty line if the file is not end with one
+        content = re.sub(r'([^\r\n])$'.encode(), r'\1\r\n'.encode(), content)
+        # Remove trailing white spaces
+        content = re.sub(r'[ \t]+(\r\n)'.encode(), r'\1'.encode(), content, flags = re.MULTILINE)
+        # Replace '\t' with two spaces
+        content = re.sub('\t'.encode(), '  '.encode(), content)
+        fd = open(file, 'wb')
+        fd.write(content)
+        fd.close()
+        print(file)
+
+if __name__ == "__main__":
+    sys.exit(FormatFiles())
\ No newline at end of file
-- 
2.8.0.windows.1