* [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
@ 2024-11-28 7:55 davidr via groups.io
2024-12-04 10:22 ` Ard Biesheuvel via groups.io
0 siblings, 1 reply; 6+ messages in thread
From: davidr via groups.io @ 2024-11-28 7:55 UTC (permalink / raw)
To: devel
[-- Attachment #1: Type: text/plain, Size: 3150 bytes --]
Hi,
I was testing out dumping a raw OVMF_VARS.fd into an FDF data section and noticed that my rebuilds of OVMF with no code changes went from 15 seconds to over 1 minute. The only change was the data in the NV_VARIABLE_STORE data section in OvmfPkg/Include/Fdf/VarStore.fdf.inc which changed the size of the file from about 4 KiB to about 256 KiB. I was rather curious as to why my build times changed so much and profiled the build process with cProfile. Specifically https://github.com/tianocore/edk2/blob/master/BaseTools/Source/Python/GenFds/FdfParser.py#L279 takes the vast majority of the time.
You can reproduce this by adding 256 KiB of "#" characters to the end of OvmfPkg/Include/Fdf/VarStore.fdf.inc and building OVMF, which produced this result for me:
Build total time: 00:01:31
34883442 function calls (34368868 primitive calls) in 91.191 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
18769 73.501 0.004 75.942 0.004 FdfParser.py:276(_SkipWhiteSpace)
728 5.738 0.008 5.738 0.008 {method 'acquire' of '_thread.lock' objects}
12 1.569 0.131 1.569 0.131 {method 'poll' of 'select.poll' objects}
2849255 1.248 0.000 1.439 0.000 FdfParser.py:354(_GetOneChar)
2305959 0.873 0.000 1.131 0.000 FdfParser.py:293(_EndOfFile)
5374641 0.831 0.000 0.831 0.000 FdfParser.py:368(_CurrentChar)
2 0.526 0.263 1.278 0.639 FdfParser.py:497(PreprocessFile)
Changing _SkippedChars from a string to StringIO reduced my build time to 19 seconds:
Build total time: 00:00:19
36552029 function calls (36037563 primitive calls) in 18.618 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
728 5.391 0.007 5.391 0.007 {method 'acquire' of '_thread.lock' objects}
18769 1.593 0.000 3.452 0.000 FdfParser.py:277(_SkipWhiteSpace)
12 1.551 0.129 1.551 0.129 {method 'poll' of 'select.poll' objects}
2849255 0.850 0.000 0.979 0.000 FdfParser.py:355(_GetOneChar)
5374641 0.742 0.000 0.742 0.000 FdfParser.py:369(_CurrentChar)
2305959 0.741 0.000 0.951 0.000 FdfParser.py:294(_EndOfFile)
2 0.511 0.256 1.237 0.618 FdfParser.py:498(PreprocessFile)
All of these tests were run using the python3 binary provided in the docker container created by https://github.com/tianocore/containers/tree/main/Ubuntu-22/Dockerfile ( https://github.com/tianocore/containers/blob/main/Ubuntu-22/Dockerfile ).
This seems like an easy change to make builds a tiny bit faster.
Thanks,
David
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120865): https://edk2.groups.io/g/devel/message/120865
Mute This Topic: https://groups.io/mt/109914552/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
[-- Attachment #2: Type: text/html, Size: 4883 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
2024-11-28 7:55 [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs davidr via groups.io
@ 2024-12-04 10:22 ` Ard Biesheuvel via groups.io
2024-12-04 18:00 ` davidr via groups.io
0 siblings, 1 reply; 6+ messages in thread
From: Ard Biesheuvel via groups.io @ 2024-12-04 10:22 UTC (permalink / raw)
To: devel, davidr, Rebecca Cran
On Wed, 4 Dec 2024 at 05:20, davidr via groups.io
<davidr=ghs.com@groups.io> wrote:
>
> Hi,
>
> I was testing out dumping a raw OVMF_VARS.fd into an FDF data section and noticed that my rebuilds of OVMF with no code changes went from 15 seconds to over 1 minute. The only change was the data in the NV_VARIABLE_STORE data section in OvmfPkg/Include/Fdf/VarStore.fdf.inc which changed the size of the file from about 4 KiB to about 256 KiB. I was rather curious as to why my build times changed so much and profiled the build process with cProfile. Specifically https://github.com/tianocore/edk2/blob/master/BaseTools/Source/Python/GenFds/FdfParser.py#L279 takes the vast majority of the time.
>
> You can reproduce this by adding 256 KiB of "#" characters to the end of OvmfPkg/Include/Fdf/VarStore.fdf.inc and building OVMF, which produced this result for me:
> Build total time: 00:01:31
> 34883442 function calls (34368868 primitive calls) in 91.191 seconds
> Ordered by: internal time
> ncalls tottime percall cumtime percall filename:lineno(function)
> 18769 73.501 0.004 75.942 0.004 FdfParser.py:276(_SkipWhiteSpace)
> 728 5.738 0.008 5.738 0.008 {method 'acquire' of '_thread.lock' objects}
> 12 1.569 0.131 1.569 0.131 {method 'poll' of 'select.poll' objects}
> 2849255 1.248 0.000 1.439 0.000 FdfParser.py:354(_GetOneChar)
> 2305959 0.873 0.000 1.131 0.000 FdfParser.py:293(_EndOfFile)
> 5374641 0.831 0.000 0.831 0.000 FdfParser.py:368(_CurrentChar)
> 2 0.526 0.263 1.278 0.639 FdfParser.py:497(PreprocessFile)
>
> Changing _SkippedChars from a string to StringIO reduced my build time to 19 seconds:
> Build total time: 00:00:19
> 36552029 function calls (36037563 primitive calls) in 18.618 seconds
> Ordered by: internal time
> ncalls tottime percall cumtime percall filename:lineno(function)
> 728 5.391 0.007 5.391 0.007 {method 'acquire' of '_thread.lock' objects}
> 18769 1.593 0.000 3.452 0.000 FdfParser.py:277(_SkipWhiteSpace)
> 12 1.551 0.129 1.551 0.129 {method 'poll' of 'select.poll' objects}
> 2849255 0.850 0.000 0.979 0.000 FdfParser.py:355(_GetOneChar)
> 5374641 0.742 0.000 0.742 0.000 FdfParser.py:369(_CurrentChar)
> 2305959 0.741 0.000 0.951 0.000 FdfParser.py:294(_EndOfFile)
> 2 0.511 0.256 1.237 0.618 FdfParser.py:498(PreprocessFile)
>
> All of these tests were run using the python3 binary provided in the docker container created by https://github.com/tianocore/containers/tree/main/Ubuntu-22/Dockerfile.
>
> This seems like an easy change to make builds a tiny bit faster.
>
Thanks for the report. Mind sending a patch / PR ?
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120866): https://edk2.groups.io/g/devel/message/120866
Mute This Topic: https://groups.io/mt/109914552/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
2024-12-04 10:22 ` Ard Biesheuvel via groups.io
@ 2024-12-04 18:00 ` davidr via groups.io
2024-12-05 0:50 ` 回复: " gaoliming via groups.io
0 siblings, 1 reply; 6+ messages in thread
From: davidr via groups.io @ 2024-12-04 18:00 UTC (permalink / raw)
To: Ard Biesheuvel, devel
[-- Attachment #1: Type: text/plain, Size: 423 bytes --]
Sure, once I have time to read up on the patch process.
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120873): https://edk2.groups.io/g/devel/message/120873
Mute This Topic: https://groups.io/mt/109914552/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
[-- Attachment #2: Type: text/html, Size: 846 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
2024-12-04 18:00 ` davidr via groups.io
@ 2024-12-05 0:50 ` gaoliming via groups.io
2024-12-06 0:54 ` [edk2-devel] " davidr via groups.io
0 siblings, 1 reply; 6+ messages in thread
From: gaoliming via groups.io @ 2024-12-05 0:50 UTC (permalink / raw)
To: devel, davidr, 'Ard Biesheuvel'
[-- Attachment #1: Type: text/plain, Size: 819 bytes --]
Can you show your code change first? I would like to try your change.
Thanks
Liming
发件人: devel@edk2.groups.io <devel@edk2.groups.io> 代表 davidr via groups.io
发送时间: 2024年12月5日 2:00
收件人: Ard Biesheuvel <ardb@kernel.org>; devel@edk2.groups.io
主题: Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
Sure, once I have time to read up on the patch process.
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120876): https://edk2.groups.io/g/devel/message/120876
Mute This Topic: https://groups.io/mt/109931827/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
[-- Attachment #2: Type: text/html, Size: 4536 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
2024-12-05 0:50 ` 回复: " gaoliming via groups.io
@ 2024-12-06 0:54 ` davidr via groups.io
2024-12-06 1:41 ` 回复: " gaoliming via groups.io
0 siblings, 1 reply; 6+ messages in thread
From: davidr via groups.io @ 2024-12-06 0:54 UTC (permalink / raw)
To: gaoliming, devel
[-- Attachment #1: Type: text/plain, Size: 4837 bytes --]
Here is the code I was testing with. You can probably reduce the changes down to just _SkipWhiteSpace(), but I haven't tried that yet.
diff --git a/BaseTools/Source/Python/GenFds/FdfParser.py b/BaseTools/Source/Python/GenFds/FdfParser.py
index feb4c72779..8c57720116 100644
--- a/BaseTools/Source/Python/GenFds/FdfParser.py
+++ b/BaseTools/Source/Python/GenFds/FdfParser.py
@@ -12,6 +12,7 @@
#
from __future__ import print_function
from __future__ import absolute_import
+from io import StringIO
from re import compile, DOTALL
from string import hexdigits
from uuid import UUID
@@ -253,7 +254,7 @@ class FdfParser:
self.CurrentFdName = None
self.CurrentFvName = None
self._Token = ""
- self._SkippedChars = ""
+ self._SkippedChars = StringIO()
GlobalData.gFdfParser = self
# Used to section info
@@ -276,7 +277,7 @@ class FdfParser:
def _SkipWhiteSpace(self):
while not self._EndOfFile():
if self._CurrentChar() in {TAB_PRINTCHAR_NUL, T_CHAR_CR, TAB_LINE_BREAK, TAB_SPACE_SPLIT, T_CHAR_TAB}:
- self._SkippedChars += str(self._CurrentChar())
+ self._SkippedChars.write(str(self._CurrentChar()))
self._GetOneChar()
else:
return
@@ -696,7 +697,7 @@ class FdfParser:
Header = self._Token
if not self._Token.endswith(TAB_SECTION_END):
self._SkipToToken(TAB_SECTION_END)
- Header += self._SkippedChars
+ Header += self._SkippedChars.getvalue()
if Header.find('$(') != -1:
raise Warning("macro cannot be used in section header", self.FileName, self.CurrentLineNumber)
self._SectionHeaderParser(Header)
@@ -1226,7 +1227,7 @@ class FdfParser:
raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber)
if currentLineNumber != self.CurrentLineNumber:
raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber)
- self._Token = self._SkippedChars.rstrip(QuoteToUse)
+ self._Token = self._SkippedChars.getvalue().rstrip(QuoteToUse)
return True
## _SkipToToken() method
@@ -1243,7 +1244,7 @@ class FdfParser:
def _SkipToToken(self, String, IgnoreCase = False):
StartPos = self.GetFileBufferPos()
- self._SkippedChars = ""
+ self._SkippedChars = StringIO()
while not self._EndOfFile():
index = -1
if IgnoreCase:
@@ -1252,13 +1253,13 @@ class FdfParser:
index = self._CurrentLine()[self.CurrentOffsetWithinLine: ].find(String)
if index == 0:
self.CurrentOffsetWithinLine += len(String)
- self._SkippedChars += String
+ self._SkippedChars.write(String)
return True
- self._SkippedChars += str(self._CurrentChar())
+ self._SkippedChars.write(str(self._CurrentChar()))
self._GetOneChar()
self.SetFileBufferPos(StartPos)
- self._SkippedChars = ""
+ self._SkippedChars = StringIO()
return False
## GetFileBufferPos() method
@@ -2890,7 +2891,7 @@ class FdfParser:
if not self._SkipToToken(T_CHAR_BRACE_R):
raise Warning.Expected("Depex expression ending '}'", self.FileName, self.CurrentLineNumber)
- DepexSectionObj.Expression = self._SkippedChars.rstrip(T_CHAR_BRACE_R)
+ DepexSectionObj.Expression = self._SkippedChars.getvalue().rstrip(T_CHAR_BRACE_R)
Obj.SectionList.append(DepexSectionObj)
elif self._IsKeyword("SUBTYPE_GUID"):
@@ -3525,7 +3526,7 @@ class FdfParser:
if not self._SkipToToken(TAB_SPLIT):
raise Warning.Expected("'.'", self.FileName, self.CurrentLineNumber)
- Arch = self._SkippedChars.rstrip(TAB_SPLIT)
+ Arch = self._SkippedChars.getvalue().rstrip(TAB_SPLIT)
ModuleType = self._GetModuleType()
Also, if you would like to profile the build process you can apply this patch.
diff --git a/BaseTools/Source/Python/build/build.py b/BaseTools/Source/Python/build/build.py
index 51fb1f433e..396729efd9 100755
--- a/BaseTools/Source/Python/build/build.py
+++ b/BaseTools/Source/Python/build/build.py
@@ -2778,12 +2778,18 @@ def Main():
Log_Agent.join()
return ReturnCode
+import cProfile
+
if __name__ == '__main__':
try:
mp.set_start_method('spawn')
except:
pass
- r = Main()
+
+ with cProfile.Profile() as pr:
+ r = Main()
+ pr.print_stats('tottime')
+
## 0-127 is a safe return range, and 1 is a standard default error
if r < 0 or r > 127: r = 1
sys.exit(r)
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120884): https://edk2.groups.io/g/devel/message/120884
Mute This Topic: https://groups.io/mt/109931827/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
[-- Attachment #2: Type: text/html, Size: 8645 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
* 回复: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
2024-12-06 0:54 ` [edk2-devel] " davidr via groups.io
@ 2024-12-06 1:41 ` gaoliming via groups.io
0 siblings, 0 replies; 6+ messages in thread
From: gaoliming via groups.io @ 2024-12-06 1:41 UTC (permalink / raw)
To: davidr, devel
[-- Attachment #1: Type: text/plain, Size: 5652 bytes --]
Thanks. I will try it.
发件人: via groups.io <davidr=ghs.com@groups.io>
发送时间: 2024年12月6日 8:55
收件人: gaoliming <gaoliming@byosoft.com.cn>; devel@edk2.groups.io
主题: Re: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs
Here is the code I was testing with. You can probably reduce the changes down to just _SkipWhiteSpace(), but I haven't tried that yet.
diff --git a/BaseTools/Source/Python/GenFds/FdfParser.py b/BaseTools/Source/Python/GenFds/FdfParser.py
index feb4c72779..8c57720116 100644
--- a/BaseTools/Source/Python/GenFds/FdfParser.py
+++ b/BaseTools/Source/Python/GenFds/FdfParser.py
@@ -12,6 +12,7 @@
#
from __future__ import print_function
from __future__ import absolute_import
+from io import StringIO
from re import compile, DOTALL
from string import hexdigits
from uuid import UUID
@@ -253,7 +254,7 @@ class FdfParser:
self.CurrentFdName = None
self.CurrentFvName = None
self._Token = ""
- self._SkippedChars = ""
+ self._SkippedChars = StringIO()
GlobalData.gFdfParser = self
# Used to section info
@@ -276,7 +277,7 @@ class FdfParser:
def _SkipWhiteSpace(self):
while not self._EndOfFile():
if self._CurrentChar() in {TAB_PRINTCHAR_NUL, T_CHAR_CR, TAB_LINE_BREAK, TAB_SPACE_SPLIT, T_CHAR_TAB}:
- self._SkippedChars += str(self._CurrentChar())
+ self._SkippedChars.write(str(self._CurrentChar()))
self._GetOneChar()
else:
return
@@ -696,7 +697,7 @@ class FdfParser:
Header = self._Token
if not self._Token.endswith(TAB_SECTION_END):
self._SkipToToken(TAB_SECTION_END)
- Header += self._SkippedChars
+ Header += self._SkippedChars.getvalue()
if Header.find('$(') != -1:
raise Warning("macro cannot be used in section header", self.FileName, self.CurrentLineNumber)
self._SectionHeaderParser(Header)
@@ -1226,7 +1227,7 @@ class FdfParser:
raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber)
if currentLineNumber != self.CurrentLineNumber:
raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber)
- self._Token = self._SkippedChars.rstrip(QuoteToUse)
+ self._Token = self._SkippedChars.getvalue().rstrip(QuoteToUse)
return True
## _SkipToToken() method
@@ -1243,7 +1244,7 @@ class FdfParser:
def _SkipToToken(self, String, IgnoreCase = False):
StartPos = self.GetFileBufferPos()
- self._SkippedChars = ""
+ self._SkippedChars = StringIO()
while not self._EndOfFile():
index = -1
if IgnoreCase:
@@ -1252,13 +1253,13 @@ class FdfParser:
index = self._CurrentLine()[self.CurrentOffsetWithinLine: ].find(String)
if index == 0:
self.CurrentOffsetWithinLine += len(String)
- self._SkippedChars += String
+ self._SkippedChars.write(String)
return True
- self._SkippedChars += str(self._CurrentChar())
+ self._SkippedChars.write(str(self._CurrentChar()))
self._GetOneChar()
self.SetFileBufferPos(StartPos)
- self._SkippedChars = ""
+ self._SkippedChars = StringIO()
return False
## GetFileBufferPos() method
@@ -2890,7 +2891,7 @@ class FdfParser:
if not self._SkipToToken(T_CHAR_BRACE_R):
raise Warning.Expected("Depex expression ending '}'", self.FileName, self.CurrentLineNumber)
- DepexSectionObj.Expression = self._SkippedChars.rstrip(T_CHAR_BRACE_R)
+ DepexSectionObj.Expression = self._SkippedChars.getvalue().rstrip(T_CHAR_BRACE_R)
Obj.SectionList.append(DepexSectionObj)
elif self._IsKeyword("SUBTYPE_GUID"):
@@ -3525,7 +3526,7 @@ class FdfParser:
if not self._SkipToToken(TAB_SPLIT):
raise Warning.Expected("'.'", self.FileName, self.CurrentLineNumber)
- Arch = self._SkippedChars.rstrip(TAB_SPLIT)
+ Arch = self._SkippedChars.getvalue().rstrip(TAB_SPLIT)
ModuleType = self._GetModuleType()
Also, if you would like to profile the build process you can apply this patch.
diff --git a/BaseTools/Source/Python/build/build.py b/BaseTools/Source/Python/build/build.py
index 51fb1f433e..396729efd9 100755
--- a/BaseTools/Source/Python/build/build.py
+++ b/BaseTools/Source/Python/build/build.py
@@ -2778,12 +2778,18 @@ def Main():
Log_Agent.join()
return ReturnCode
+import cProfile
+
if __name__ == '__main__':
try:
mp.set_start_method('spawn')
except:
pass
- r = Main()
+
+ with cProfile.Profile() as pr:
+ r = Main()
+ pr.print_stats('tottime')
+
## 0-127 is a safe return range, and 1 is a standard default error
if r < 0 or r > 127: r = 1
sys.exit(r)
-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#120885): https://edk2.groups.io/g/devel/message/120885
Mute This Topic: https://groups.io/mt/109951333/7686176
Group Owner: devel+owner@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io]
-=-=-=-=-=-=-=-=-=-=-=-
[-- Attachment #2: Type: text/html, Size: 11880 bytes --]
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-12-06 1:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28 7:55 [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs davidr via groups.io
2024-12-04 10:22 ` Ard Biesheuvel via groups.io
2024-12-04 18:00 ` davidr via groups.io
2024-12-05 0:50 ` 回复: " gaoliming via groups.io
2024-12-06 0:54 ` [edk2-devel] " davidr via groups.io
2024-12-06 1:41 ` 回复: " gaoliming via groups.io
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox