* [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs @ 2024-11-28 7:55 davidr via groups.io 2024-12-04 10:22 ` Ard Biesheuvel via groups.io 0 siblings, 1 reply; 6+ messages in thread From: davidr via groups.io @ 2024-11-28 7:55 UTC (permalink / raw) To: devel [-- Attachment #1: Type: text/plain, Size: 3150 bytes --] Hi, I was testing out dumping a raw OVMF_VARS.fd into an FDF data section and noticed that my rebuilds of OVMF with no code changes went from 15 seconds to over 1 minute. The only change was the data in the NV_VARIABLE_STORE data section in OvmfPkg/Include/Fdf/VarStore.fdf.inc which changed the size of the file from about 4 KiB to about 256 KiB. I was rather curious as to why my build times changed so much and profiled the build process with cProfile. Specifically https://github.com/tianocore/edk2/blob/master/BaseTools/Source/Python/GenFds/FdfParser.py#L279 takes the vast majority of the time. You can reproduce this by adding 256 KiB of "#" characters to the end of OvmfPkg/Include/Fdf/VarStore.fdf.inc and building OVMF, which produced this result for me: Build total time: 00:01:31 34883442 function calls (34368868 primitive calls) in 91.191 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 18769 73.501 0.004 75.942 0.004 FdfParser.py:276(_SkipWhiteSpace) 728 5.738 0.008 5.738 0.008 {method 'acquire' of '_thread.lock' objects} 12 1.569 0.131 1.569 0.131 {method 'poll' of 'select.poll' objects} 2849255 1.248 0.000 1.439 0.000 FdfParser.py:354(_GetOneChar) 2305959 0.873 0.000 1.131 0.000 FdfParser.py:293(_EndOfFile) 5374641 0.831 0.000 0.831 0.000 FdfParser.py:368(_CurrentChar) 2 0.526 0.263 1.278 0.639 FdfParser.py:497(PreprocessFile) Changing _SkippedChars from a string to StringIO reduced my build time to 19 seconds: Build total time: 00:00:19 36552029 function calls (36037563 primitive calls) in 18.618 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 728 5.391 0.007 5.391 0.007 {method 'acquire' of '_thread.lock' objects} 18769 1.593 0.000 3.452 0.000 FdfParser.py:277(_SkipWhiteSpace) 12 1.551 0.129 1.551 0.129 {method 'poll' of 'select.poll' objects} 2849255 0.850 0.000 0.979 0.000 FdfParser.py:355(_GetOneChar) 5374641 0.742 0.000 0.742 0.000 FdfParser.py:369(_CurrentChar) 2305959 0.741 0.000 0.951 0.000 FdfParser.py:294(_EndOfFile) 2 0.511 0.256 1.237 0.618 FdfParser.py:498(PreprocessFile) All of these tests were run using the python3 binary provided in the docker container created by https://github.com/tianocore/containers/tree/main/Ubuntu-22/Dockerfile ( https://github.com/tianocore/containers/blob/main/Ubuntu-22/Dockerfile ). This seems like an easy change to make builds a tiny bit faster. Thanks, David -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120865): https://edk2.groups.io/g/devel/message/120865 Mute This Topic: https://groups.io/mt/109914552/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 4883 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs 2024-11-28 7:55 [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs davidr via groups.io @ 2024-12-04 10:22 ` Ard Biesheuvel via groups.io 2024-12-04 18:00 ` davidr via groups.io 0 siblings, 1 reply; 6+ messages in thread From: Ard Biesheuvel via groups.io @ 2024-12-04 10:22 UTC (permalink / raw) To: devel, davidr, Rebecca Cran On Wed, 4 Dec 2024 at 05:20, davidr via groups.io <davidr=ghs.com@groups.io> wrote: > > Hi, > > I was testing out dumping a raw OVMF_VARS.fd into an FDF data section and noticed that my rebuilds of OVMF with no code changes went from 15 seconds to over 1 minute. The only change was the data in the NV_VARIABLE_STORE data section in OvmfPkg/Include/Fdf/VarStore.fdf.inc which changed the size of the file from about 4 KiB to about 256 KiB. I was rather curious as to why my build times changed so much and profiled the build process with cProfile. Specifically https://github.com/tianocore/edk2/blob/master/BaseTools/Source/Python/GenFds/FdfParser.py#L279 takes the vast majority of the time. > > You can reproduce this by adding 256 KiB of "#" characters to the end of OvmfPkg/Include/Fdf/VarStore.fdf.inc and building OVMF, which produced this result for me: > Build total time: 00:01:31 > 34883442 function calls (34368868 primitive calls) in 91.191 seconds > Ordered by: internal time > ncalls tottime percall cumtime percall filename:lineno(function) > 18769 73.501 0.004 75.942 0.004 FdfParser.py:276(_SkipWhiteSpace) > 728 5.738 0.008 5.738 0.008 {method 'acquire' of '_thread.lock' objects} > 12 1.569 0.131 1.569 0.131 {method 'poll' of 'select.poll' objects} > 2849255 1.248 0.000 1.439 0.000 FdfParser.py:354(_GetOneChar) > 2305959 0.873 0.000 1.131 0.000 FdfParser.py:293(_EndOfFile) > 5374641 0.831 0.000 0.831 0.000 FdfParser.py:368(_CurrentChar) > 2 0.526 0.263 1.278 0.639 FdfParser.py:497(PreprocessFile) > > Changing _SkippedChars from a string to StringIO reduced my build time to 19 seconds: > Build total time: 00:00:19 > 36552029 function calls (36037563 primitive calls) in 18.618 seconds > Ordered by: internal time > ncalls tottime percall cumtime percall filename:lineno(function) > 728 5.391 0.007 5.391 0.007 {method 'acquire' of '_thread.lock' objects} > 18769 1.593 0.000 3.452 0.000 FdfParser.py:277(_SkipWhiteSpace) > 12 1.551 0.129 1.551 0.129 {method 'poll' of 'select.poll' objects} > 2849255 0.850 0.000 0.979 0.000 FdfParser.py:355(_GetOneChar) > 5374641 0.742 0.000 0.742 0.000 FdfParser.py:369(_CurrentChar) > 2305959 0.741 0.000 0.951 0.000 FdfParser.py:294(_EndOfFile) > 2 0.511 0.256 1.237 0.618 FdfParser.py:498(PreprocessFile) > > All of these tests were run using the python3 binary provided in the docker container created by https://github.com/tianocore/containers/tree/main/Ubuntu-22/Dockerfile. > > This seems like an easy change to make builds a tiny bit faster. > Thanks for the report. Mind sending a patch / PR ? -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120866): https://edk2.groups.io/g/devel/message/120866 Mute This Topic: https://groups.io/mt/109914552/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs 2024-12-04 10:22 ` Ard Biesheuvel via groups.io @ 2024-12-04 18:00 ` davidr via groups.io 2024-12-05 0:50 ` 回复: " gaoliming via groups.io 0 siblings, 1 reply; 6+ messages in thread From: davidr via groups.io @ 2024-12-04 18:00 UTC (permalink / raw) To: Ard Biesheuvel, devel [-- Attachment #1: Type: text/plain, Size: 423 bytes --] Sure, once I have time to read up on the patch process. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120873): https://edk2.groups.io/g/devel/message/120873 Mute This Topic: https://groups.io/mt/109914552/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 846 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs 2024-12-04 18:00 ` davidr via groups.io @ 2024-12-05 0:50 ` gaoliming via groups.io 2024-12-06 0:54 ` [edk2-devel] " davidr via groups.io 0 siblings, 1 reply; 6+ messages in thread From: gaoliming via groups.io @ 2024-12-05 0:50 UTC (permalink / raw) To: devel, davidr, 'Ard Biesheuvel' [-- Attachment #1: Type: text/plain, Size: 819 bytes --] Can you show your code change first? I would like to try your change. Thanks Liming 发件人: devel@edk2.groups.io <devel@edk2.groups.io> 代表 davidr via groups.io 发送时间: 2024年12月5日 2:00 收件人: Ard Biesheuvel <ardb@kernel.org>; devel@edk2.groups.io 主题: Re: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs Sure, once I have time to read up on the patch process. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120876): https://edk2.groups.io/g/devel/message/120876 Mute This Topic: https://groups.io/mt/109931827/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 4536 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs 2024-12-05 0:50 ` 回复: " gaoliming via groups.io @ 2024-12-06 0:54 ` davidr via groups.io 2024-12-06 1:41 ` 回复: " gaoliming via groups.io 0 siblings, 1 reply; 6+ messages in thread From: davidr via groups.io @ 2024-12-06 0:54 UTC (permalink / raw) To: gaoliming, devel [-- Attachment #1: Type: text/plain, Size: 4837 bytes --] Here is the code I was testing with. You can probably reduce the changes down to just _SkipWhiteSpace(), but I haven't tried that yet. diff --git a/BaseTools/Source/Python/GenFds/FdfParser.py b/BaseTools/Source/Python/GenFds/FdfParser.py index feb4c72779..8c57720116 100644 --- a/BaseTools/Source/Python/GenFds/FdfParser.py +++ b/BaseTools/Source/Python/GenFds/FdfParser.py @@ -12,6 +12,7 @@ # from __future__ import print_function from __future__ import absolute_import +from io import StringIO from re import compile, DOTALL from string import hexdigits from uuid import UUID @@ -253,7 +254,7 @@ class FdfParser: self.CurrentFdName = None self.CurrentFvName = None self._Token = "" - self._SkippedChars = "" + self._SkippedChars = StringIO() GlobalData.gFdfParser = self # Used to section info @@ -276,7 +277,7 @@ class FdfParser: def _SkipWhiteSpace(self): while not self._EndOfFile(): if self._CurrentChar() in {TAB_PRINTCHAR_NUL, T_CHAR_CR, TAB_LINE_BREAK, TAB_SPACE_SPLIT, T_CHAR_TAB}: - self._SkippedChars += str(self._CurrentChar()) + self._SkippedChars.write(str(self._CurrentChar())) self._GetOneChar() else: return @@ -696,7 +697,7 @@ class FdfParser: Header = self._Token if not self._Token.endswith(TAB_SECTION_END): self._SkipToToken(TAB_SECTION_END) - Header += self._SkippedChars + Header += self._SkippedChars.getvalue() if Header.find('$(') != -1: raise Warning("macro cannot be used in section header", self.FileName, self.CurrentLineNumber) self._SectionHeaderParser(Header) @@ -1226,7 +1227,7 @@ class FdfParser: raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber) if currentLineNumber != self.CurrentLineNumber: raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber) - self._Token = self._SkippedChars.rstrip(QuoteToUse) + self._Token = self._SkippedChars.getvalue().rstrip(QuoteToUse) return True ## _SkipToToken() method @@ -1243,7 +1244,7 @@ class FdfParser: def _SkipToToken(self, String, IgnoreCase = False): StartPos = self.GetFileBufferPos() - self._SkippedChars = "" + self._SkippedChars = StringIO() while not self._EndOfFile(): index = -1 if IgnoreCase: @@ -1252,13 +1253,13 @@ class FdfParser: index = self._CurrentLine()[self.CurrentOffsetWithinLine: ].find(String) if index == 0: self.CurrentOffsetWithinLine += len(String) - self._SkippedChars += String + self._SkippedChars.write(String) return True - self._SkippedChars += str(self._CurrentChar()) + self._SkippedChars.write(str(self._CurrentChar())) self._GetOneChar() self.SetFileBufferPos(StartPos) - self._SkippedChars = "" + self._SkippedChars = StringIO() return False ## GetFileBufferPos() method @@ -2890,7 +2891,7 @@ class FdfParser: if not self._SkipToToken(T_CHAR_BRACE_R): raise Warning.Expected("Depex expression ending '}'", self.FileName, self.CurrentLineNumber) - DepexSectionObj.Expression = self._SkippedChars.rstrip(T_CHAR_BRACE_R) + DepexSectionObj.Expression = self._SkippedChars.getvalue().rstrip(T_CHAR_BRACE_R) Obj.SectionList.append(DepexSectionObj) elif self._IsKeyword("SUBTYPE_GUID"): @@ -3525,7 +3526,7 @@ class FdfParser: if not self._SkipToToken(TAB_SPLIT): raise Warning.Expected("'.'", self.FileName, self.CurrentLineNumber) - Arch = self._SkippedChars.rstrip(TAB_SPLIT) + Arch = self._SkippedChars.getvalue().rstrip(TAB_SPLIT) ModuleType = self._GetModuleType() Also, if you would like to profile the build process you can apply this patch. diff --git a/BaseTools/Source/Python/build/build.py b/BaseTools/Source/Python/build/build.py index 51fb1f433e..396729efd9 100755 --- a/BaseTools/Source/Python/build/build.py +++ b/BaseTools/Source/Python/build/build.py @@ -2778,12 +2778,18 @@ def Main(): Log_Agent.join() return ReturnCode +import cProfile + if __name__ == '__main__': try: mp.set_start_method('spawn') except: pass - r = Main() + + with cProfile.Profile() as pr: + r = Main() + pr.print_stats('tottime') + ## 0-127 is a safe return range, and 1 is a standard default error if r < 0 or r > 127: r = 1 sys.exit(r) -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120884): https://edk2.groups.io/g/devel/message/120884 Mute This Topic: https://groups.io/mt/109931827/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 8645 bytes --] ^ permalink raw reply related [flat|nested] 6+ messages in thread
* 回复: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs 2024-12-06 0:54 ` [edk2-devel] " davidr via groups.io @ 2024-12-06 1:41 ` gaoliming via groups.io 0 siblings, 0 replies; 6+ messages in thread From: gaoliming via groups.io @ 2024-12-06 1:41 UTC (permalink / raw) To: davidr, devel [-- Attachment #1: Type: text/plain, Size: 5652 bytes --] Thanks. I will try it. 发件人: via groups.io <davidr=ghs.com@groups.io> 发送时间: 2024年12月6日 8:55 收件人: gaoliming <gaoliming@byosoft.com.cn>; devel@edk2.groups.io 主题: Re: [edk2-devel] 回复: [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs Here is the code I was testing with. You can probably reduce the changes down to just _SkipWhiteSpace(), but I haven't tried that yet. diff --git a/BaseTools/Source/Python/GenFds/FdfParser.py b/BaseTools/Source/Python/GenFds/FdfParser.py index feb4c72779..8c57720116 100644 --- a/BaseTools/Source/Python/GenFds/FdfParser.py +++ b/BaseTools/Source/Python/GenFds/FdfParser.py @@ -12,6 +12,7 @@ # from __future__ import print_function from __future__ import absolute_import +from io import StringIO from re import compile, DOTALL from string import hexdigits from uuid import UUID @@ -253,7 +254,7 @@ class FdfParser: self.CurrentFdName = None self.CurrentFvName = None self._Token = "" - self._SkippedChars = "" + self._SkippedChars = StringIO() GlobalData.gFdfParser = self # Used to section info @@ -276,7 +277,7 @@ class FdfParser: def _SkipWhiteSpace(self): while not self._EndOfFile(): if self._CurrentChar() in {TAB_PRINTCHAR_NUL, T_CHAR_CR, TAB_LINE_BREAK, TAB_SPACE_SPLIT, T_CHAR_TAB}: - self._SkippedChars += str(self._CurrentChar()) + self._SkippedChars.write(str(self._CurrentChar())) self._GetOneChar() else: return @@ -696,7 +697,7 @@ class FdfParser: Header = self._Token if not self._Token.endswith(TAB_SECTION_END): self._SkipToToken(TAB_SECTION_END) - Header += self._SkippedChars + Header += self._SkippedChars.getvalue() if Header.find('$(') != -1: raise Warning("macro cannot be used in section header", self.FileName, self.CurrentLineNumber) self._SectionHeaderParser(Header) @@ -1226,7 +1227,7 @@ class FdfParser: raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber) if currentLineNumber != self.CurrentLineNumber: raise Warning(QuoteToUse, self.FileName, self.CurrentLineNumber) - self._Token = self._SkippedChars.rstrip(QuoteToUse) + self._Token = self._SkippedChars.getvalue().rstrip(QuoteToUse) return True ## _SkipToToken() method @@ -1243,7 +1244,7 @@ class FdfParser: def _SkipToToken(self, String, IgnoreCase = False): StartPos = self.GetFileBufferPos() - self._SkippedChars = "" + self._SkippedChars = StringIO() while not self._EndOfFile(): index = -1 if IgnoreCase: @@ -1252,13 +1253,13 @@ class FdfParser: index = self._CurrentLine()[self.CurrentOffsetWithinLine: ].find(String) if index == 0: self.CurrentOffsetWithinLine += len(String) - self._SkippedChars += String + self._SkippedChars.write(String) return True - self._SkippedChars += str(self._CurrentChar()) + self._SkippedChars.write(str(self._CurrentChar())) self._GetOneChar() self.SetFileBufferPos(StartPos) - self._SkippedChars = "" + self._SkippedChars = StringIO() return False ## GetFileBufferPos() method @@ -2890,7 +2891,7 @@ class FdfParser: if not self._SkipToToken(T_CHAR_BRACE_R): raise Warning.Expected("Depex expression ending '}'", self.FileName, self.CurrentLineNumber) - DepexSectionObj.Expression = self._SkippedChars.rstrip(T_CHAR_BRACE_R) + DepexSectionObj.Expression = self._SkippedChars.getvalue().rstrip(T_CHAR_BRACE_R) Obj.SectionList.append(DepexSectionObj) elif self._IsKeyword("SUBTYPE_GUID"): @@ -3525,7 +3526,7 @@ class FdfParser: if not self._SkipToToken(TAB_SPLIT): raise Warning.Expected("'.'", self.FileName, self.CurrentLineNumber) - Arch = self._SkippedChars.rstrip(TAB_SPLIT) + Arch = self._SkippedChars.getvalue().rstrip(TAB_SPLIT) ModuleType = self._GetModuleType() Also, if you would like to profile the build process you can apply this patch. diff --git a/BaseTools/Source/Python/build/build.py b/BaseTools/Source/Python/build/build.py index 51fb1f433e..396729efd9 100755 --- a/BaseTools/Source/Python/build/build.py +++ b/BaseTools/Source/Python/build/build.py @@ -2778,12 +2778,18 @@ def Main(): Log_Agent.join() return ReturnCode +import cProfile + if __name__ == '__main__': try: mp.set_start_method('spawn') except: pass - r = Main() + + with cProfile.Profile() as pr: + r = Main() + pr.print_stats('tottime') + ## 0-127 is a safe return range, and 1 is a standard default error if r < 0 or r > 127: r = 1 sys.exit(r) -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#120885): https://edk2.groups.io/g/devel/message/120885 Mute This Topic: https://groups.io/mt/109951333/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=-=-=-=-=-=-=-=-=-=-=- [-- Attachment #2: Type: text/html, Size: 11880 bytes --] ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-12-06 1:41 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-28 7:55 [edk2-devel] FDF parser performance degrades rapidly on non-trivially sized inputs davidr via groups.io 2024-12-04 10:22 ` Ard Biesheuvel via groups.io 2024-12-04 18:00 ` davidr via groups.io 2024-12-05 0:50 ` 回复: " gaoliming via groups.io 2024-12-06 0:54 ` [edk2-devel] " davidr via groups.io 2024-12-06 1:41 ` 回复: " gaoliming via groups.io
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox