public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
@ 2024-11-28  7:55 davidr via groups.io
  2024-12-04 10:22 ` Ard Biesheuvel via groups.io
  0 siblings, 1 reply; 6+ messages in thread
From: davidr via groups.io @ 2024-11-28  7:55 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 3150 bytes --]

Hi,

I was testing out dumping a raw OVMF_VARS.fd into an FDF data section and noticed that my rebuilds of OVMF with no code changes went from 15 seconds to over 1 minute. The only change was the data in the NV_VARIABLE_STORE data section in OvmfPkg/Include/Fdf/VarStore.fdf.inc which changed the size of the file from about 4 KiB to about 256 KiB. I was rather curious as to why my build times changed so much and profiled the build process with cProfile. Specifically https://github.com/tianocore/edk2/blob/master/BaseTools/Source/Python/GenFds/FdfParser.py#L279 takes the vast majority of the time.

You can reproduce this by adding 256 KiB of "#" characters to the end of OvmfPkg/Include/Fdf/VarStore.fdf.inc and building OVMF, which produced this result for me:
Build total time: 00:01:31
34883442 function calls (34368868 primitive calls) in 91.191 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
18769   73.501    0.004   75.942    0.004 FdfParser.py:276(_SkipWhiteSpace)
728    5.738    0.008    5.738    0.008 {method 'acquire' of '_thread.lock' objects}
12    1.569    0.131    1.569    0.131 {method 'poll' of 'select.poll' objects}
2849255    1.248    0.000    1.439    0.000 FdfParser.py:354(_GetOneChar)
2305959    0.873    0.000    1.131    0.000 FdfParser.py:293(_EndOfFile)
5374641    0.831    0.000    0.831    0.000 FdfParser.py:368(_CurrentChar)
2    0.526    0.263    1.278    0.639 FdfParser.py:497(PreprocessFile)

Changing _SkippedChars from a string to StringIO reduced my build time to 19 seconds:
Build total time: 00:00:19
36552029 function calls (36037563 primitive calls) in 18.618 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
728    5.391    0.007    5.391    0.007 {method 'acquire' of '_thread.lock' objects}
18769    1.593    0.000    3.452    0.000 FdfParser.py:277(_SkipWhiteSpace)
12    1.551    0.129    1.551    0.129 {method 'poll' of 'select.poll' objects}
2849255    0.850    0.000    0.979    0.000 FdfParser.py:355(_GetOneChar)
5374641    0.742    0.000    0.742    0.000 FdfParser.py:369(_CurrentChar)
2305959    0.741    0.000    0.951    0.000 FdfParser.py:294(_EndOfFile)
2    0.511    0.256    1.237    0.618 FdfParser.py:498(PreprocessFile)

All of these tests were run using the python3 binary provided in the docker container created by https://github.com/tianocore/containers/tree/main/Ubuntu-22/Dockerfile ( https://github.com/tianocore/containers/blob/main/Ubuntu-22/Dockerfile ).

This seems like an easy change to make builds a tiny bit faster.

Thanks,
David


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120865): https://edk2.groups.io/g/devel/message/120865
Mute This Topic: https://groups.io/mt/109914552/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



[-- Attachment #2: Type: text/html, Size: 4883 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
  2024-11-28  7:55 [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs davidr via groups.io
@ 2024-12-04 10:22 ` Ard Biesheuvel via groups.io
  2024-12-04 18:00   ` davidr via groups.io
  0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel via groups.io @ 2024-12-04 10:22 UTC (permalink / raw)
  To: devel, davidr, Rebecca Cran

On Wed, 4 Dec 2024 at 05:20, davidr via groups.io
<davidr=ghs.com@groups.io> wrote:
>
> Hi,
>
> I was testing out dumping a raw OVMF_VARS.fd into an FDF data section and noticed that my rebuilds of OVMF with no code changes went from 15 seconds to over 1 minute. The only change was the data in the NV_VARIABLE_STORE data section in OvmfPkg/Include/Fdf/VarStore.fdf.inc which changed the size of the file from about 4 KiB to about 256 KiB. I was rather curious as to why my build times changed so much and profiled the build process with cProfile. Specifically https://github.com/tianocore/edk2/blob/master/BaseTools/Source/Python/GenFds/FdfParser.py#L279 takes the vast majority of the time.
>
> You can reproduce this by adding 256 KiB of "#" characters to the end of OvmfPkg/Include/Fdf/VarStore.fdf.inc and building OVMF, which produced this result for me:
> Build total time: 00:01:31
>          34883442 function calls (34368868 primitive calls) in 91.191 seconds
>    Ordered by: internal time
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>     18769   73.501    0.004   75.942    0.004 FdfParser.py:276(_SkipWhiteSpace)
>       728    5.738    0.008    5.738    0.008 {method 'acquire' of '_thread.lock' objects}
>        12    1.569    0.131    1.569    0.131 {method 'poll' of 'select.poll' objects}
>   2849255    1.248    0.000    1.439    0.000 FdfParser.py:354(_GetOneChar)
>   2305959    0.873    0.000    1.131    0.000 FdfParser.py:293(_EndOfFile)
>   5374641    0.831    0.000    0.831    0.000 FdfParser.py:368(_CurrentChar)
>         2    0.526    0.263    1.278    0.639 FdfParser.py:497(PreprocessFile)
>
> Changing _SkippedChars from a string to StringIO reduced my build time to 19 seconds:
> Build total time: 00:00:19
>          36552029 function calls (36037563 primitive calls) in 18.618 seconds
>    Ordered by: internal time
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>       728    5.391    0.007    5.391    0.007 {method 'acquire' of '_thread.lock' objects}
>     18769    1.593    0.000    3.452    0.000 FdfParser.py:277(_SkipWhiteSpace)
>        12    1.551    0.129    1.551    0.129 {method 'poll' of 'select.poll' objects}
>   2849255    0.850    0.000    0.979    0.000 FdfParser.py:355(_GetOneChar)
>   5374641    0.742    0.000    0.742    0.000 FdfParser.py:369(_CurrentChar)
>   2305959    0.741    0.000    0.951    0.000 FdfParser.py:294(_EndOfFile)
>         2    0.511    0.256    1.237    0.618 FdfParser.py:498(PreprocessFile)
>
> All of these tests were run using the python3 binary provided in the docker container created by https://github.com/tianocore/containers/tree/main/Ubuntu-22/Dockerfile.
>
> This seems like an easy change to make builds a tiny bit faster.
>

Thanks for the report. Mind sending a patch / PR ?


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120866): https://edk2.groups.io/g/devel/message/120866
Mute This Topic: https://groups.io/mt/109914552/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
  2024-12-04 10:22 ` Ard Biesheuvel via groups.io
@ 2024-12-04 18:00   ` davidr via groups.io
  2024-12-05  0:50     ` 回复: " gaoliming via groups.io
  0 siblings, 1 reply; 6+ messages in thread
From: davidr via groups.io @ 2024-12-04 18:00 UTC (permalink / raw)
  To: Ard Biesheuvel, devel

[-- Attachment #1: Type: text/plain, Size: 423 bytes --]

Sure, once I have time to read up on the patch process.


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120873): https://edk2.groups.io/g/devel/message/120873
Mute This Topic: https://groups.io/mt/109914552/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



[-- Attachment #2: Type: text/html, Size: 846 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
  2024-12-04 18:00   ` davidr via groups.io
@ 2024-12-05  0:50     ` gaoliming via groups.io
  2024-12-06  0:54       ` [edk2-devel] " davidr via groups.io
  0 siblings, 1 reply; 6+ messages in thread
From: gaoliming via groups.io @ 2024-12-05  0:50 UTC (permalink / raw)
  To: devel, davidr, 'Ard Biesheuvel'

[-- Attachment #1: Type: text/plain, Size: 819 bytes --]

Can you show your code change first? I would like to try your change. 

 

Thanks

Liming

发件人: devel@edk2.groups.io <devel@edk2.groups.io> 代表 davidr via groups.io
发送时间: 2024年12月5日 2:00
收件人: Ard Biesheuvel <ardb@kernel.org>; devel@edk2.groups.io
主题: Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs

 

Sure, once I have time to read up on the patch process.





-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120876): https://edk2.groups.io/g/devel/message/120876
Mute This Topic: https://groups.io/mt/109931827/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



[-- Attachment #2: Type: text/html, Size: 4536 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
  2024-12-05  0:50     ` 回复: " gaoliming via groups.io
@ 2024-12-06  0:54       ` davidr via groups.io
  2024-12-06  1:41         ` 回复: " gaoliming via groups.io
  0 siblings, 1 reply; 6+ messages in thread
From: davidr via groups.io @ 2024-12-06  0:54 UTC (permalink / raw)
  To: gaoliming, devel

[-- Attachment #1: Type: text/plain, Size: 4837 bytes --]

Here is the code I was testing with. You can probably reduce the changes down to just _SkipWhiteSpace(), but I haven't tried that yet.

diff --git a/BaseTools/Source/Python/GenFds/FdfParser.py b/BaseTools/Source/Python/GenFds/FdfParser.py
index feb4c72779..8c57720116 100644
--- a/BaseTools/Source/Python/GenFds/FdfParser.py
+++ b/BaseTools/Source/Python/GenFds/FdfParser.py
@@ -12,6 +12,7 @@
#
from __future__ import print_function
from __future__ import absolute_import
+from io import StringIO
from re import compile, DOTALL
from string import hexdigits
from uuid import UUID
@@ -253,7 +254,7 @@ class FdfParser:
self.CurrentFdName = None
self.CurrentFvName = None
self._Token = ""
-        self._SkippedChars = ""
+        self._SkippedChars = StringIO()
GlobalData.gFdfParser = self

# Used to section info
@@ -276,7 +277,7 @@ class FdfParser:
def _SkipWhiteSpace(self):
while not self._EndOfFile():
if self._CurrentChar() in {TAB_PRINTCHAR_NUL, T_CHAR_CR, TAB_LINE_BREAK, TAB_SPACE_SPLIT, T_CHAR_TAB}:
-                self._SkippedChars += str(self._CurrentChar())
+                self._SkippedChars.write(str(self._CurrentChar()))
self._GetOneChar()
else:
return
@@ -696,7 +697,7 @@ class FdfParser:
Header = self._Token
if not self._Token.endswith(TAB_SECTION_END):
self._SkipToToken(TAB_SECTION_END)
-                        Header += self._SkippedChars
+                        Header += self._SkippedChars.getvalue()
if Header.find('$(') != -1:
raise Warning("macro cannot be used in section header", self.FileName, self.CurrentLineNumber)
self._SectionHeaderParser(Header)
@@ -1226,7 +1227,7 @@ class FdfParser:
raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber)
if currentLineNumber != self.CurrentLineNumber:
raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber)
-        self._Token = self._SkippedChars.rstrip(QuoteToUse)
+        self._Token = self._SkippedChars.getvalue().rstrip(QuoteToUse)
return True

## _SkipToToken() method
@@ -1243,7 +1244,7 @@ class FdfParser:
def _SkipToToken(self, String, IgnoreCase = False):
StartPos = self.GetFileBufferPos()

-        self._SkippedChars = ""
+        self._SkippedChars = StringIO()
while not self._EndOfFile():
index = -1
if IgnoreCase:
@@ -1252,13 +1253,13 @@ class FdfParser:
index = self._CurrentLine()[self.CurrentOffsetWithinLine: ].find(String)
if index == 0:
self.CurrentOffsetWithinLine += len(String)
-                self._SkippedChars += String
+                self._SkippedChars.write(String)
return True
-            self._SkippedChars += str(self._CurrentChar())
+            self._SkippedChars.write(str(self._CurrentChar()))
self._GetOneChar()

self.SetFileBufferPos(StartPos)
-        self._SkippedChars = ""
+        self._SkippedChars = StringIO()
return False

## GetFileBufferPos() method
@@ -2890,7 +2891,7 @@ class FdfParser:
if not self._SkipToToken(T_CHAR_BRACE_R):
raise Warning.Expected("Depex expression ending '}'", self.FileName, self.CurrentLineNumber)

-            DepexSectionObj.Expression = self._SkippedChars.rstrip(T_CHAR_BRACE_R)
+            DepexSectionObj.Expression = self._SkippedChars.getvalue().rstrip(T_CHAR_BRACE_R)
Obj.SectionList.append(DepexSectionObj)

elif self._IsKeyword("SUBTYPE_GUID"):
@@ -3525,7 +3526,7 @@ class FdfParser:
if not self._SkipToToken(TAB_SPLIT):
raise Warning.Expected("'.'", self.FileName, self.CurrentLineNumber)

-        Arch = self._SkippedChars.rstrip(TAB_SPLIT)
+        Arch = self._SkippedChars.getvalue().rstrip(TAB_SPLIT)

ModuleType = self._GetModuleType()

Also, if you would like to profile the build process you can apply this patch.

diff --git a/BaseTools/Source/Python/build/build.py b/BaseTools/Source/Python/build/build.py
index 51fb1f433e..396729efd9 100755
--- a/BaseTools/Source/Python/build/build.py
+++ b/BaseTools/Source/Python/build/build.py
@@ -2778,12 +2778,18 @@ def Main():
Log_Agent.join()
return ReturnCode

+import cProfile
+
if __name__ == '__main__':
try:
mp.set_start_method('spawn')
except:
pass
-    r = Main()
+
+    with cProfile.Profile() as pr:
+        r = Main()
+        pr.print_stats('tottime')
+
## 0-127 is a safe return range, and 1 is a standard default error
if r < 0 or r > 127: r = 1
sys.exit(r)


-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120884): https://edk2.groups.io/g/devel/message/120884
Mute This Topic: https://groups.io/mt/109931827/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



[-- Attachment #2: Type: text/html, Size: 8645 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* 回复: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
  2024-12-06  0:54       ` [edk2-devel] " davidr via groups.io
@ 2024-12-06  1:41         ` gaoliming via groups.io
  0 siblings, 0 replies; 6+ messages in thread
From: gaoliming via groups.io @ 2024-12-06  1:41 UTC (permalink / raw)
  To: davidr, devel

[-- Attachment #1: Type: text/plain, Size: 5652 bytes --]

Thanks. I will try it. 

 

发件人: via groups.io <davidr=ghs.com@groups.io> 
发送时间: 2024年12月6日 8:55
收件人: gaoliming <gaoliming@byosoft.com.cn>; devel@edk2.groups.io
主题: Re: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs

 

Here is the code I was testing with. You can probably reduce the changes down to just _SkipWhiteSpace(), but I haven't tried that yet.

 

diff --git a/BaseTools/Source/Python/GenFds/FdfParser.py b/BaseTools/Source/Python/GenFds/FdfParser.py
index feb4c72779..8c57720116 100644
--- a/BaseTools/Source/Python/GenFds/FdfParser.py
+++ b/BaseTools/Source/Python/GenFds/FdfParser.py
@@ -12,6 +12,7 @@
 #
 from __future__ import print_function
 from __future__ import absolute_import
+from io import StringIO
 from re import compile, DOTALL
 from string import hexdigits
 from uuid import UUID
@@ -253,7 +254,7 @@ class FdfParser:
         self.CurrentFdName = None
         self.CurrentFvName = None
         self._Token = ""
-        self._SkippedChars = ""
+        self._SkippedChars = StringIO()
         GlobalData.gFdfParser = self
 
         # Used to section info
@@ -276,7 +277,7 @@ class FdfParser:
     def _SkipWhiteSpace(self):
         while not self._EndOfFile():
             if self._CurrentChar() in {TAB_PRINTCHAR_NUL, T_CHAR_CR, TAB_LINE_BREAK, TAB_SPACE_SPLIT, T_CHAR_TAB}:
-                self._SkippedChars += str(self._CurrentChar())
+                self._SkippedChars.write(str(self._CurrentChar()))
                 self._GetOneChar()
             else:
                 return
@@ -696,7 +697,7 @@ class FdfParser:
                     Header = self._Token
                     if not self._Token.endswith(TAB_SECTION_END):
                         self._SkipToToken(TAB_SECTION_END)
-                        Header += self._SkippedChars
+                        Header += self._SkippedChars.getvalue()
                     if Header.find('$(') != -1:
                         raise Warning("macro cannot be used in section header", self.FileName, self.CurrentLineNumber)
                     self._SectionHeaderParser(Header)
@@ -1226,7 +1227,7 @@ class FdfParser:
             raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber)
         if currentLineNumber != self.CurrentLineNumber:
             raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber)
-        self._Token = self._SkippedChars.rstrip(QuoteToUse)
+        self._Token = self._SkippedChars.getvalue().rstrip(QuoteToUse)
         return True
 
     ## _SkipToToken() method
@@ -1243,7 +1244,7 @@ class FdfParser:
     def _SkipToToken(self, String, IgnoreCase = False):
         StartPos = self.GetFileBufferPos()
 
-        self._SkippedChars = ""
+        self._SkippedChars = StringIO()
         while not self._EndOfFile():
             index = -1
             if IgnoreCase:
@@ -1252,13 +1253,13 @@ class FdfParser:
                 index = self._CurrentLine()[self.CurrentOffsetWithinLine: ].find(String)
             if index == 0:
                 self.CurrentOffsetWithinLine += len(String)
-                self._SkippedChars += String
+                self._SkippedChars.write(String)
                 return True
-            self._SkippedChars += str(self._CurrentChar())
+            self._SkippedChars.write(str(self._CurrentChar()))
             self._GetOneChar()
 
         self.SetFileBufferPos(StartPos)
-        self._SkippedChars = ""
+        self._SkippedChars = StringIO()
         return False
 
     ## GetFileBufferPos() method
@@ -2890,7 +2891,7 @@ class FdfParser:
             if not self._SkipToToken(T_CHAR_BRACE_R):
                 raise Warning.Expected("Depex expression ending '}'", self.FileName, self.CurrentLineNumber)
 
-            DepexSectionObj.Expression = self._SkippedChars.rstrip(T_CHAR_BRACE_R)
+            DepexSectionObj.Expression = self._SkippedChars.getvalue().rstrip(T_CHAR_BRACE_R)
             Obj.SectionList.append(DepexSectionObj)
 
         elif self._IsKeyword("SUBTYPE_GUID"):
@@ -3525,7 +3526,7 @@ class FdfParser:
         if not self._SkipToToken(TAB_SPLIT):
             raise Warning.Expected("'.'", self.FileName, self.CurrentLineNumber)
 
-        Arch = self._SkippedChars.rstrip(TAB_SPLIT)
+        Arch = self._SkippedChars.getvalue().rstrip(TAB_SPLIT)
 
         ModuleType = self._GetModuleType()
 

 

 

Also, if you would like to profile the build process you can apply this patch.

 

diff --git a/BaseTools/Source/Python/build/build.py b/BaseTools/Source/Python/build/build.py
index 51fb1f433e..396729efd9 100755
--- a/BaseTools/Source/Python/build/build.py
+++ b/BaseTools/Source/Python/build/build.py
@@ -2778,12 +2778,18 @@ def Main():
     Log_Agent.join()
     return ReturnCode
 
+import cProfile
+
 if __name__ == '__main__':
     try:
         mp.set_start_method('spawn')
     except:
         pass
-    r = Main()
+
+    with cProfile.Profile() as pr:
+        r = Main()
+        pr.print_stats('tottime')
+
     ## 0-127 is a safe return range, and 1 is a standard default error
     if r < 0 or r > 127: r = 1
     sys.exit(r)



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120885): https://edk2.groups.io/g/devel/message/120885
Mute This Topic: https://groups.io/mt/109951333/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-



[-- Attachment #2: Type: text/html, Size: 11880 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-12-06  1:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28  7:55 [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs davidr via groups.io
2024-12-04 10:22 ` Ard Biesheuvel via groups.io
2024-12-04 18:00   ` davidr via groups.io
2024-12-05  0:50     ` 回复: " gaoliming via groups.io
2024-12-06  0:54       ` [edk2-devel] " davidr via groups.io
2024-12-06  1:41         ` 回复: " gaoliming via groups.io

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox