public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* Multithreaded compression with LZMA2
@ 2020-12-02  2:59 Daniel Schaefer
  2020-12-02  3:36 ` [edk2-devel] " Andrew Fish
  2020-12-03 10:24 ` Laszlo Ersek
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Schaefer @ 2020-12-02  2:59 UTC (permalink / raw)
  To: devel@edk2.groups.io; +Cc: derek.lin2

Hi everyone,

I'm looking into how to speed up the build process and noticed that our build
uses LZMA to encrypt the main firmware volume. Since it's quite big it takes a
while but only uses one CPU thread.

LZMA2 is a version of LZMA which can be multi-threaded and achieve much faster
compression times. I did a quick benchmark using the `xz` command-line tool,
which uses a modified version of the LZMA SDK that EDK2 uses. The results are:

Uncompressed size: 64M

| Algo  | Comp Time | Decomp Time | Size | Threads |
| ----- | --------- | ----------- | ---- | ------- |
| LZMA  |    19.67s |        0.9s | 9.1M |       1 |
| LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
| LZMA2 |     8.31s |        1.0s | 9.4M |       4 |

Using those commands:

time xz --format=lzma testfile
time unlzma testfile.lzma

time xz --lzma2 testfile
time unxz testfile.xz

time xz -T4 --lzma2 testfile
time unxz testfile.xz

This is quite a significant improvement of build time, while decompression time
and size only slightly increase. If that's a concern, then LZMA2 could be used
for development only.

I haven't investigated the details of how to support this in the code but it
appears to be a simple change, since the LZMA SDK that we use already supports
LZMA2.

What do you think?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-02  2:59 Multithreaded compression with LZMA2 Daniel Schaefer
@ 2020-12-02  3:36 ` Andrew Fish
  2020-12-02  5:21   ` 回复: " gaoliming
  2020-12-03 10:24 ` Laszlo Ersek
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Fish @ 2020-12-02  3:36 UTC (permalink / raw)
  To: devel, daniel.schaefer; +Cc: derek.lin2

[-- Attachment #1: Type: text/plain, Size: 1841 bytes --]



> On Dec 1, 2020, at 6:59 PM, Daniel Schaefer <daniel.schaefer@hpe.com> wrote:
> 
> Hi everyone,
> 
> I'm looking into how to speed up the build process and noticed that our build
> uses LZMA to encrypt the main firmware volume. Since it's quite big it takes a
> while but only uses one CPU thread.
> 
> LZMA2 is a version of LZMA which can be multi-threaded and achieve much faster
> compression times. I did a quick benchmark using the `xz` command-line tool,
> which uses a modified version of the LZMA SDK that EDK2 uses. The results are:
> 
> Uncompressed size: 64M
> 
> | Algo  | Comp Time | Decomp Time | Size | Threads |
> | ----- | --------- | ----------- | ---- | ------- |
> | LZMA  |    19.67s |        0.9s | 9.1M |       1 |
> | LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
> | LZMA2 |     8.31s |        1.0s | 9.4M |       4 |
> 
> Using those commands:
> 
> time xz --format=lzma testfile
> time unlzma testfile.lzma
> 
> time xz --lzma2 testfile
> time unxz testfile.xz
> 
> time xz -T4 --lzma2 testfile
> time unxz testfile.xz
> 
> This is quite a significant improvement of build time, while decompression time
> and size only slightly increase. If that's a concern, then LZMA2 could be used
> for development only.
> 
> I haven't investigated the details of how to support this in the code but it
> appears to be a simple change, since the LZMA SDK that we use already supports
> LZMA2.
> 
> What do you think?
> 

Interesting idea. What OS did you use? I tried this on macOS on some larger FVs and I did not see much difference? I tried a 17.5 MiB FV and it was around 3 seconds both ways. 

Maybe it would be worth while seeing how it works on various systems? I guess it might be data set related? 

Thanks,

Andrew Fish

> Thanks,
> Daniel
> 
> 
> 


[-- Attachment #2: Type: text/html, Size: 28545 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* 回复: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-02  3:36 ` [edk2-devel] " Andrew Fish
@ 2020-12-02  5:21   ` gaoliming
  2020-12-02  8:24     ` Daniel Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: gaoliming @ 2020-12-02  5:21 UTC (permalink / raw)
  To: devel, afish, daniel.schaefer; +Cc: derek.lin2

[-- Attachment #1: Type: text/plain, Size: 2319 bytes --]

Daniel:

 Can you provide the compressed image size? And, what image is used to be
compressed? Is it the generated FV image?

 

Thanks

Liming

发件人: bounce+27952+68159+4905953+8761045@groups.io
<bounce+27952+68159+4905953+8761045@groups.io> 代表 Andrew Fish via
groups.io
发送时间: 2020年12月2日 11:37
收件人: devel@edk2.groups.io; daniel.schaefer@hpe.com
抄送: derek.lin2@hpe.com
主题: Re: [edk2-devel] Multithreaded compression with LZMA2

 

 





On Dec 1, 2020, at 6:59 PM, Daniel Schaefer <daniel.schaefer@hpe.com
<mailto:daniel.schaefer@hpe.com> > wrote:

 

Hi everyone,

I'm looking into how to speed up the build process and noticed that our
build
uses LZMA to encrypt the main firmware volume. Since it's quite big it takes
a
while but only uses one CPU thread.

LZMA2 is a version of LZMA which can be multi-threaded and achieve much
faster
compression times. I did a quick benchmark using the `xz` command-line tool,
which uses a modified version of the LZMA SDK that EDK2 uses. The results
are:

Uncompressed size: 64M

| Algo  | Comp Time | Decomp Time | Size | Threads |
| ----- | --------- | ----------- | ---- | ------- |
| LZMA  |    19.67s |        0.9s | 9.1M |       1 |
| LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
| LZMA2 |     8.31s |        1.0s | 9.4M |       4 |

Using those commands:

time xz --format=lzma testfile
time unlzma testfile.lzma

time xz --lzma2 testfile
time unxz testfile.xz

time xz -T4 --lzma2 testfile
time unxz testfile.xz

This is quite a significant improvement of build time, while decompression
time
and size only slightly increase. If that's a concern, then LZMA2 could be
used
for development only.

I haven't investigated the details of how to support this in the code but it
appears to be a simple change, since the LZMA SDK that we use already
supports
LZMA2.

What do you think?



 

Interesting idea. What OS did you use? I tried this on macOS on some larger
FVs and I did not see much difference? I tried a 17.5 MiB FV and it was
around 3 seconds both ways. 

 

Maybe it would be worth while seeing how it works on various systems? I
guess it might be data set related? 

 

Thanks,

 

Andrew Fish





Thanks,
Daniel





 




[-- Attachment #2: Type: text/html, Size: 7518 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 回复: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-02  5:21   ` 回复: " gaoliming
@ 2020-12-02  8:24     ` Daniel Schaefer
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Schaefer @ 2020-12-02  8:24 UTC (permalink / raw)
  To: devel, gaoliming, afish; +Cc: derek.lin2

On 12/2/20 1:21 PM, gaoliming wrote:
> Daniel:
> 
>   Can you provide the compressed image size? And, what image is used to be compressed? Is it the generated FV image?
> 
> Thanks
> 
> Liming
> 
> *发件人:*bounce+27952+68159+4905953+8761045@groups.io <bounce+27952+68159+4905953+8761045@groups.io> *代表 *Andrew Fish via groups.io
> *发送时间:*2020年12月2日11:37
> *收件人:*devel@edk2.groups.io; daniel.schaefer@hpe.com
> *抄送:*derek.lin2@hpe.com
> *主题:*Re: [edk2-devel] Multithreaded compression with LZMA2
> 
> 
> 
>     On Dec 1, 2020, at 6:59 PM, Daniel Schaefer <daniel.schaefer@hpe.com <mailto:daniel.schaefer@hpe.com>> wrote:
> 
>     Hi everyone,
> 
>     I'm looking into how to speed up the build process and noticed that our build
>     uses LZMA to encrypt the main firmware volume. Since it's quite big it takes a
>     while but only uses one CPU thread.
> 
>     LZMA2 is a version of LZMA which can be multi-threaded and achieve much faster
>     compression times. I did a quick benchmark using the `xz` command-line tool,
>     which uses a modified version of the LZMA SDK that EDK2 uses. The results are:
> 
>     Uncompressed size: 64M
> 
>     | Algo  | Comp Time | Decomp Time | Size | Threads |
>     | ----- | --------- | ----------- | ---- | ------- |
>     | LZMA  |    19.67s |        0.9s | 9.1M |       1 |
>     | LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
>     | LZMA2 |     8.31s |        1.0s | 9.4M |       4 |
> 
>     Using those commands:
> 
>     time xz --format=lzma testfile
>     time unlzma testfile.lzma
> 
>     time xz --lzma2 testfile
>     time unxz testfile.xz
> 
>     time xz -T4 --lzma2 testfile
>     time unxz testfile.xz
> 
>     This is quite a significant improvement of build time, while decompression time
>     and size only slightly increase. If that's a concern, then LZMA2 could be used
>     for development only.
> 
>     I haven't investigated the details of how to support this in the code but it
>     appears to be a simple change, since the LZMA SDK that we use already supports
>     LZMA2.
> 
>     What do you think?
> 
> Interesting idea. What OS did you use? I tried this on macOS on some larger FVs and I did not see much difference? I tried a 17.5 MiB FV and it was around 3 seconds both ways.
> 
> Maybe it would be worth while seeing how it works on various systems? I guess it might be data set related?

Hi Andrew and Liming,

the FV file is our main FV with the majority of DXEs. It's 64MB uncompressed
and 9MB compressed, as mentioned before.  Unfortunately I cannot share that
particular file with you but I am also suprised that it compresses so well to
just 14% of its original size.

I'm running my tests on X64 Linux. I ran the same tests again on a more
powerful machine with the hyperfine command. It runs the testcase 3 times to
warm up (e.g. caches) and then runs it 10 times that count towards the average.
The result is the same as before. The compression takes just 40% with 4 threads.
I don't observe any further speedup by using all 16 thread of the CPU.

# Simple LZMA
$ hyperfine --warmup 3 'xz -k --format=lzma testfile && rm testfile.lzma'
Benchmark #1: xz -k --format=lzma testfile && rm testfile.lzma
   Time (mean ± σ):     12.755 s ±  0.151 s    [User: 12.691 s, System: 0.064 s]
   Range (min … max):   12.568 s … 12.991 s    10 runs

# LZMA2 with single thread
$ hyperfine --warmup 3 'xz -k -T1 --lzma2 testfile && rm testfile.xz'
Benchmark #1: xz -k -T1 --lzma2 testfile && rm testfile.xz
   Time (mean ± σ):     12.838 s ±  0.149 s    [User: 12.783 s, System: 0.055 s]
   Range (min … max):   12.546 s … 13.053 s    10 runs

# LZMA2 with 4 threads
$ hyperfine --warmup 3 'xz -k -T4 --lzma2 testfile && rm testfile.xz'
Benchmark #1: xz -k -T4 --lzma2 testfile && rm testfile.xz
   Time (mean ± σ):      5.241 s ±  0.025 s    [User: 13.537 s, System: 0.177 s]
   Range (min … max):    5.227 s …  5.302 s    10 runs

Using xz from mingw64 on Windows 10 X64 the 4-threaded compression takes 47% of single-threaded.

---

I wanted to try it on a bigger file and ran the same benchmarks on a 16MB file
to discover that compression is no faster with multithreading. The file shrinks
from 16MB to 9MB. However, after my tests I discovered that this is the
combined FV, which includes a few other compressed FVs.
Here the multithreaded command is even able to compress the FV 0.2MB more.

$ hyperfine --warmup 3 'xz -k -T1 --lzma2 testfile && rm testfile.xz'
Benchmark #1: xz -k -T1 --lzma2 testfile && rm testfile.xz
   Time (mean ± σ):      2.874 s ±  0.134 s    [User: 2.825 s, System: 0.049 s]
   Range (min … max):    2.751 s …  3.088 s    10 runs
$ ls -lh
-rw-r--r-- 1 zoid users  16M Dec  2 15:29 testfile
-rw-r--r-- 1 zoid users 9.4M Dec  2 15:29 testfile.lzma

$ hyperfine --warmup 3 'xz -k -T4 --lzma2 testfile && rm testfile.xz'
Benchmark #1: xz -k -T4 --lzma2 testfile && rm testfile.xz
   Time (mean ± σ):      2.874 s ±  0.108 s    [User: 2.818 s, System: 0.070 s]
   Range (min … max):    2.775 s …  3.081 s    10 runs
$ ls -lh
-rw-r--r-- 1 zoid users  16M Dec  2 15:29 testfile
-rw-r--r-- 1 zoid users 9.2M Dec  2 15:29 testfile.xz

So even with a file that doesn't compress as well, the performance doesn't get any worse.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-02  2:59 Multithreaded compression with LZMA2 Daniel Schaefer
  2020-12-02  3:36 ` [edk2-devel] " Andrew Fish
@ 2020-12-03 10:24 ` Laszlo Ersek
  2020-12-03 12:11   ` Daniel Schaefer
  1 sibling, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2020-12-03 10:24 UTC (permalink / raw)
  To: devel, daniel.schaefer; +Cc: derek.lin2

On 12/02/20 03:59, Daniel Schaefer wrote:
> Hi everyone,
> 
> I'm looking into how to speed up the build process and noticed that our
> build
> uses LZMA to encrypt the main firmware volume. Since it's quite big it
> takes a
> while but only uses one CPU thread.
> 
> LZMA2 is a version of LZMA which can be multi-threaded and achieve much
> faster
> compression times. I did a quick benchmark using the `xz` command-line
> tool,
> which uses a modified version of the LZMA SDK that EDK2 uses. The
> results are:
> 
> Uncompressed size: 64M
> 
> | Algo  | Comp Time | Decomp Time | Size | Threads |
> | ----- | --------- | ----------- | ---- | ------- |
> | LZMA  |    19.67s |        0.9s | 9.1M |       1 |
> | LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
> | LZMA2 |     8.31s |        1.0s | 9.4M |       4 |
> 
> Using those commands:
> 
> time xz --format=lzma testfile
> time unlzma testfile.lzma
> 
> time xz --lzma2 testfile
> time unxz testfile.xz
> 
> time xz -T4 --lzma2 testfile
> time unxz testfile.xz
> 
> This is quite a significant improvement of build time, while
> decompression time
> and size only slightly increase. If that's a concern, then LZMA2 could
> be used
> for development only.
> 
> I haven't investigated the details of how to support this in the code
> but it
> appears to be a simple change, since the LZMA SDK that we use already
> supports
> LZMA2.
> 
> What do you think?

"xz -T" works by splitting the input into blocks, and it generates a
multi-block compressed output. I'm unsure if the current LZMA
decompressor that runs inside the firmware (= guided section extractor)
copes with multi-block input.

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-03 10:24 ` Laszlo Ersek
@ 2020-12-03 12:11   ` Daniel Schaefer
  2020-12-03 15:57     ` Bret Barkelew
  2020-12-03 23:35     ` Laszlo Ersek
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Schaefer @ 2020-12-03 12:11 UTC (permalink / raw)
  To: Laszlo Ersek, devel@edk2.groups.io; +Cc: Lin, Derek (HPS SW)

[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]


From: Laszlo Ersek <lersek@redhat.com>
Sent: Thursday, December 3, 2020 18:24
To: devel@edk2.groups.io <devel@edk2.groups.io>; Schaefer, Daniel <daniel.schaefer@hpe.com>
Cc: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
Subject: Re: [edk2-devel] Multithreaded compression with LZMA2

On 12/02/20 03:59, Daniel Schaefer wrote:
> Hi everyone,
>
> I'm looking into how to speed up the build process and noticed that our
> build
> uses LZMA to encrypt the main firmware volume. Since it's quite big it
> takes a
> while but only uses one CPU thread.
>
> LZMA2 is a version of LZMA which can be multi-threaded and achieve much
> faster
> compression times. I did a quick benchmark using the `xz` command-line
> tool,
> which uses a modified version of the LZMA SDK that EDK2 uses. The
> results are:
>
> Uncompressed size: 64M
>
> | Algo  | Comp Time | Decomp Time | Size | Threads |
> | ----- | --------- | ----------- | ---- | ------- |
> | LZMA  |    19.67s |        0.9s | 9.1M |       1 |
> | LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
> | LZMA2 |     8.31s |        1.0s | 9.4M |       4 |
>
> Using those commands:
>
> time xz --format=lzma testfile
> time unlzma testfile.lzma
>
> time xz --lzma2 testfile
> time unxz testfile.xz
>
> time xz -T4 --lzma2 testfile
> time unxz testfile.xz
>
> This is quite a significant improvement of build time, while
> decompression time
> and size only slightly increase. If that's a concern, then LZMA2 could
> be used
> for development only.
>
> I haven't investigated the details of how to support this in the code
> but it
> appears to be a simple change, since the LZMA SDK that we use already
> supports
> LZMA2.
>
> What do you think?

"xz -T" works by splitting the input into blocks, and it generates a
multi-block compressed output.

Yes, that's correct.

> I'm unsure if the current LZMA
decompressor that runs inside the firmware (= guided section extractor)
copes with multi-block input.

I think you're right that it doesn't. But we can make the guided section extractor use that same algorithm(LZMA2) and assign it a different GUID, right?


[-- Attachment #2: Type: text/html, Size: 3645 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-03 12:11   ` Daniel Schaefer
@ 2020-12-03 15:57     ` Bret Barkelew
  2020-12-04  8:19       ` Daniel Schaefer
  2020-12-03 23:35     ` Laszlo Ersek
  1 sibling, 1 reply; 12+ messages in thread
From: Bret Barkelew @ 2020-12-03 15:57 UTC (permalink / raw)
  To: devel@edk2.groups.io, daniel.schaefer@hpe.com, Laszlo Ersek
  Cc: Lin, Derek (HPS SW)


[-- Attachment #1.1: Type: text/plain, Size: 2653 bytes --]

Wasn’t there another push (somewhere in the last 8 months, my brain is foggy) to adopt LZMA2? Or was it a different algorithm?

- Bret

From: Daniel Schaefer via groups.io<mailto:daniel.schaefer=hpe.com@groups.io>
Sent: Thursday, December 3, 2020 4:12 AM
To: Laszlo Ersek<mailto:lersek@redhat.com>; devel@edk2.groups.io<mailto:devel@edk2.groups.io>
Cc: Lin, Derek (HPS SW)<mailto:derek.lin2@hpe.com>
Subject: [EXTERNAL] Re: [edk2-devel] Multithreaded compression with LZMA2


From: Laszlo Ersek <lersek@redhat.com>
Sent: Thursday, December 3, 2020 18:24
To: devel@edk2.groups.io <devel@edk2.groups.io>; Schaefer, Daniel <daniel.schaefer@hpe.com>
Cc: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
Subject: Re: [edk2-devel] Multithreaded compression with LZMA2

On 12/02/20 03:59, Daniel Schaefer wrote:
> Hi everyone,
>
> I'm looking into how to speed up the build process and noticed that our
> build
> uses LZMA to encrypt the main firmware volume. Since it's quite big it
> takes a
> while but only uses one CPU thread.
>
> LZMA2 is a version of LZMA which can be multi-threaded and achieve much
> faster
> compression times. I did a quick benchmark using the `xz` command-line
> tool,
> which uses a modified version of the LZMA SDK that EDK2 uses. The
> results are:
>
> Uncompressed size: 64M
>
> | Algo  | Comp Time | Decomp Time | Size | Threads |
> | ----- | --------- | ----------- | ---- | ------- |
> | LZMA  |    19.67s |        0.9s | 9.1M |       1 |
> | LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
> | LZMA2 |     8.31s |        1.0s | 9.4M |       4 |
>
> Using those commands:
>
> time xz --format=lzma testfile
> time unlzma testfile.lzma
>
> time xz --lzma2 testfile
> time unxz testfile.xz
>
> time xz -T4 --lzma2 testfile
> time unxz testfile.xz
>
> This is quite a significant improvement of build time, while
> decompression time
> and size only slightly increase. If that's a concern, then LZMA2 could
> be used
> for development only.
>
> I haven't investigated the details of how to support this in the code
> but it
> appears to be a simple change, since the LZMA SDK that we use already
> supports
> LZMA2.
>
> What do you think?

"xz -T" works by splitting the input into blocks, and it generates a
multi-block compressed output.

Yes, that's correct.

> I'm unsure if the current LZMA
decompressor that runs inside the firmware (= guided section extractor)
copes with multi-block input.

I think you're right that it doesn't. But we can make the guided section extractor use that same algorithm(LZMA2) and assign it a different GUID, right?



[-- Attachment #1.2: Type: text/html, Size: 6002 bytes --]

[-- Attachment #2: 69F4FCEBBB92465AA3A93438CD55E1D3.png --]
[-- Type: image/png, Size: 140 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-03 12:11   ` Daniel Schaefer
  2020-12-03 15:57     ` Bret Barkelew
@ 2020-12-03 23:35     ` Laszlo Ersek
  2020-12-04  2:28       ` 回复: " gaoliming
  1 sibling, 1 reply; 12+ messages in thread
From: Laszlo Ersek @ 2020-12-03 23:35 UTC (permalink / raw)
  To: Schaefer, Daniel, devel@edk2.groups.io; +Cc: Lin, Derek (HPS SW)

On 12/03/20 13:11, Schaefer, Daniel wrote:
> 
> From: Laszlo Ersek <lersek@redhat.com>
> Sent: Thursday, December 3, 2020 18:24
> To: devel@edk2.groups.io <devel@edk2.groups.io>; Schaefer, Daniel <daniel.schaefer@hpe.com>
> Cc: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
> Subject: Re: [edk2-devel] Multithreaded compression with LZMA2
> 

> "xz -T" works by splitting the input into blocks, and it generates a
> multi-block compressed output.
> 
> Yes, that's correct.
> 
>> I'm unsure if the current LZMA
> decompressor that runs inside the firmware (= guided section extractor)
> copes with multi-block input.
> 
> I think you're right that it doesn't. But we can make the guided section extractor use that same algorithm(LZMA2) and assign it a different GUID, right?

I guess so...

Thanks
Laszlo


^ permalink raw reply	[flat|nested] 12+ messages in thread

* 回复: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-03 23:35     ` Laszlo Ersek
@ 2020-12-04  2:28       ` gaoliming
  2020-12-04  9:02         ` Daniel Schaefer
  0 siblings, 1 reply; 12+ messages in thread
From: gaoliming @ 2020-12-04  2:28 UTC (permalink / raw)
  To: devel, lersek, 'Schaefer, Daniel'; +Cc: 'Lin, Derek (HPS SW)'

Daniel:
  Yes. New guided section extractor matches new compression algorithm. For
the compression algorithm, its compression ratio, compression performance,
the decompression performance, the decompression taken memory are all
required to be considered. 

Thanks
Liming
> -----邮件原件-----
> 发件人: bounce+27952+68292+4905953+8761045@groups.io
> <bounce+27952+68292+4905953+8761045@groups.io> 代表 Laszlo Ersek
> 发送时间: 2020年12月4日 7:35
> 收件人: Schaefer, Daniel <daniel.schaefer@hpe.com>; devel@edk2.groups.io
> 抄送: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
> 主题: Re: [edk2-devel] Multithreaded compression with LZMA2
> 
> On 12/03/20 13:11, Schaefer, Daniel wrote:
> >
> > From: Laszlo Ersek <lersek@redhat.com>
> > Sent: Thursday, December 3, 2020 18:24
> > To: devel@edk2.groups.io <devel@edk2.groups.io>; Schaefer, Daniel
> <daniel.schaefer@hpe.com>
> > Cc: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
> > Subject: Re: [edk2-devel] Multithreaded compression with LZMA2
> >
> 
> > "xz -T" works by splitting the input into blocks, and it generates a
> > multi-block compressed output.
> >
> > Yes, that's correct.
> >
> >> I'm unsure if the current LZMA
> > decompressor that runs inside the firmware (= guided section extractor)
> > copes with multi-block input.
> >
> > I think you're right that it doesn't. But we can make the guided section
> extractor use that same algorithm(LZMA2) and assign it a different GUID,
> right?
> 
> I guess so...
> 
> Thanks
> Laszlo
> 
> 
> 
> 
> 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-03 15:57     ` Bret Barkelew
@ 2020-12-04  8:19       ` Daniel Schaefer
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Schaefer @ 2020-12-04  8:19 UTC (permalink / raw)
  To: devel, bret.barkelew, Laszlo Ersek; +Cc: Lin, Derek (HPS SW)

On 12/3/20 11:57 PM, Bret Barkelew via groups.io wrote:
> Wasn’t there another push (somewhere in the last 8 months, my brain is foggy) to adopt LZMA2? Or was it a different algorithm?

I couldn't find anything: https://edk2.groups.io/g/devel/search?q=LZMA2

> 
> - Bret
> 
> *From: *Daniel Schaefer via groups.io <mailto:daniel.schaefer=hpe.com@groups.io>
> *Sent: *Thursday, December 3, 2020 4:12 AM
> *To: *Laszlo Ersek <mailto:lersek@redhat.com>; devel@edk2.groups.io <mailto:devel@edk2.groups.io>
> *Cc: *Lin, Derek (HPS SW) <mailto:derek.lin2@hpe.com>
> *Subject: *[EXTERNAL] Re: [edk2-devel] Multithreaded compression with LZMA2
> 
> *From:* Laszlo Ersek <lersek@redhat.com>
> 
> *Sent:*Thursday, December 3, 2020 18:24
> *To:* devel@edk2.groups.io <devel@edk2.groups.io>; Schaefer, Daniel <daniel.schaefer@hpe.com>
> *Cc:* Lin, Derek (HPS SW) <derek.lin2@hpe.com>
> *Subject:* Re: [edk2-devel] Multithreaded compression with LZMA2
> 
> On 12/02/20 03:59, Daniel Schaefer wrote:
>  > Hi everyone,
>  >
>  > I'm looking into how to speed up the build process and noticed that our
>  > build
>  > uses LZMA to encrypt the main firmware volume. Since it's quite big it
>  > takes a
>  > while but only uses one CPU thread.
>  >
>  > LZMA2 is a version of LZMA which can be multi-threaded and achieve much
>  > faster
>  > compression times. I did a quick benchmark using the `xz` command-line
>  > tool,
>  > which uses a modified version of the LZMA SDK that EDK2 uses. The
>  > results are:
>  >
>  > Uncompressed size: 64M
>  >
>  > | Algo  | Comp Time | Decomp Time | Size | Threads |
>  > | ----- | --------- | ----------- | ---- | ------- |
>  > | LZMA  |    19.67s |        0.9s | 9.1M |       1 |
>  > | LZMA2 |    20.11s |        1.2s | 9.2M |       1 |
>  > | LZMA2 |     8.31s |        1.0s | 9.4M |       4 |
>  >
>  > Using those commands:
>  >
>  > time xz --format=lzma testfile
>  > time unlzma testfile.lzma
>  >
>  > time xz --lzma2 testfile
>  > time unxz testfile.xz
>  >
>  > time xz -T4 --lzma2 testfile
>  > time unxz testfile.xz
>  >
>  > This is quite a significant improvement of build time, while
>  > decompression time
>  > and size only slightly increase. If that's a concern, then LZMA2 could
>  > be used
>  > for development only.
>  >
>  > I haven't investigated the details of how to support this in the code
>  > but it
>  > appears to be a simple change, since the LZMA SDK that we use already
>  > supports
>  > LZMA2.
>  >
>  > What do you think?
> 
> "xz -T" works by splitting the input into blocks, and it generates a
> multi-block compressed output.
> 
> Yes, that's correct.
> 
>  > I'm unsure if the current LZMA
> decompressor that runs inside the firmware (= guided section extractor)
> copes with multi-block input.
> 
> I think you're right that it doesn't. But we can make the guided section extractor use that same algorithm(LZMA2) and assign it a different GUID, right?
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 回复: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-04  2:28       ` 回复: " gaoliming
@ 2020-12-04  9:02         ` Daniel Schaefer
  2020-12-08  6:01           ` 回复: " gaoliming
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Schaefer @ 2020-12-04  9:02 UTC (permalink / raw)
  To: devel, gaoliming
  Cc: Laszlo Ersek, Lin, Derek (HPS SW), Bret.Barkelew@microsoft.com,
	afish

On 12/4/20 10:28 AM, gaoliming wrote:
> Daniel:
>    Yes. New guided section extractor matches new compression algorithm.

Good. I see that we use version 18.05 of the LZMA SDK, while there's already 19.00:
https://www.7-zip.org/sdk.html

Should we use this opportunity to update?

> For the compression algorithm, its compression ratio, compression performance,
> the decompression performance, the decompression taken memory are all
> required to be considered.

For compression ratio and performance, see my earlier emails. Summary:
Compression ratio is basically the same.
Performance is also the same, except when using 4 threads it compresses our
main image in just 40% of the time. Some images don't compress as well, they
take the same time to compress.

For decompression memory usage I used the xz commands on Linux again:

# LZMA1
$ /usr/bin/time -v unxz testfile.lzma
Maximum resident set size (kbytes): 10228, 10492, 10460, 10200, 10244 => 10324.8

# LZMA2
$ /usr/bin/time -v unxz testfile.xz
Maximum resident set size (kbytes): 10456, 10460, 10224, 10212, 10548 => 10380.0

Result: Basically the same.

I don't know how I would measure this in EDK2. Any ideas?


 From the manpage of xz and LZMA authors:

 > LZMA2  is an updated version of LZMA1 to fix some practical issues of LZMA1.
 > Compression speed and  ratios  of  LZMA1 and LZMA2 are practically the same.
 > LZMA2 is better than LZMA, if you compress already compressed data.

Here's a benchmark which compares both of them, among others:
https://stephane.lesimple.fr/blog/lzop-vs-compress-vs-gzip-vs-bzip2-vs-lzma-vs-lzma2xz-benchmark-reloaded/
Same result: They're basically the same.


> Thanks
> Liming
>> -----邮件原件-----
>> 发件人: bounce+27952+68292+4905953+8761045@groups.io
>> <bounce+27952+68292+4905953+8761045@groups.io> 代表 Laszlo Ersek
>> 发送时间: 2020年12月4日 7:35
>> 收件人: Schaefer, Daniel <daniel.schaefer@hpe.com>; devel@edk2.groups.io
>> 抄送: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
>> 主题: Re: [edk2-devel] Multithreaded compression with LZMA2
>>
>> On 12/03/20 13:11, Schaefer, Daniel wrote:
>>>
>>> From: Laszlo Ersek <lersek@redhat.com>
>>> Sent: Thursday, December 3, 2020 18:24
>>> To: devel@edk2.groups.io <devel@edk2.groups.io>; Schaefer, Daniel
>> <daniel.schaefer@hpe.com>
>>> Cc: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
>>> Subject: Re: [edk2-devel] Multithreaded compression with LZMA2
>>>
>>
>>> "xz -T" works by splitting the input into blocks, and it generates a
>>> multi-block compressed output.
>>>
>>> Yes, that's correct.
>>>
>>>> I'm unsure if the current LZMA
>>> decompressor that runs inside the firmware (= guided section extractor)
>>> copes with multi-block input.
>>>
>>> I think you're right that it doesn't. But we can make the guided section
>> extractor use that same algorithm(LZMA2) and assign it a different GUID,
>> right?
>>
>> I guess so...
>>
>> Thanks
>> Laszlo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* 回复: 回复: [edk2-devel] Multithreaded compression with LZMA2
  2020-12-04  9:02         ` Daniel Schaefer
@ 2020-12-08  6:01           ` gaoliming
  0 siblings, 0 replies; 12+ messages in thread
From: gaoliming @ 2020-12-08  6:01 UTC (permalink / raw)
  To: devel, daniel.schaefer
  Cc: 'Laszlo Ersek', 'Lin, Derek (HPS SW)',
	Bret.Barkelew, afish

Daniel:

> -----邮件原件-----
> 发件人: bounce+27952+68335+4905953+8761045@groups.io
> <bounce+27952+68335+4905953+8761045@groups.io> 代表 Daniel
> Schaefer
> 发送时间: 2020年12月4日 17:03
> 收件人: devel@edk2.groups.io; gaoliming@byosoft.com.cn
> 抄送: Laszlo Ersek <lersek@redhat.com>; Lin, Derek (HPS SW)
> <derek.lin2@hpe.com>; Bret.Barkelew@microsoft.com; afish@apple.com
> 主题: Re: 回复: [edk2-devel] Multithreaded compression with LZMA2
> 
> On 12/4/20 10:28 AM, gaoliming wrote:
> > Daniel:
> >    Yes. New guided section extractor matches new compression
> algorithm.
> 
> Good. I see that we use version 18.05 of the LZMA SDK, while there's
already
> 19.00:
> https://www.7-zip.org/sdk.html
> 
> Should we use this opportunity to update?
> 
I suggest to separate them. The update doesn't block this change. 

> > For the compression algorithm, its compression ratio, compression
> performance,
> > the decompression performance, the decompression taken memory are all
> > required to be considered.
> 
> For compression ratio and performance, see my earlier emails. Summary:
> Compression ratio is basically the same.
> Performance is also the same, except when using 4 threads it compresses
our
> main image in just 40% of the time. Some images don't compress as well,
they
> take the same time to compress.
> 
> For decompression memory usage I used the xz commands on Linux again:
> 
> # LZMA1
> $ /usr/bin/time -v unxz testfile.lzma
> Maximum resident set size (kbytes): 10228, 10492, 10460, 10200, 10244 =>
> 10324.8
> 
> # LZMA2
> $ /usr/bin/time -v unxz testfile.xz
> Maximum resident set size (kbytes): 10456, 10460, 10224, 10212, 10548 =>
> 10380.0
> 
> Result: Basically the same.
> 
> I don't know how I would measure this in EDK2. Any ideas?
> 
You need to enable it in Edk2 like current LzmaCompression tool and
LzmaDecompress library, 
and apply them in the platform DSC/FDF, then measure its build and
decompression.

Thanks
Liming
> 
>  From the manpage of xz and LZMA authors:
> 
>  > LZMA2  is an updated version of LZMA1 to fix some practical issues of
> LZMA1.
>  > Compression speed and  ratios  of  LZMA1 and LZMA2 are practically
> the same.
>  > LZMA2 is better than LZMA, if you compress already compressed data.
> 
> Here's a benchmark which compares both of them, among others:
>
https://stephane.lesimple.fr/blog/lzop-vs-compress-vs-gzip-vs-bzip2-vs-lzma-
> vs-lzma2xz-benchmark-reloaded/
> Same result: They're basically the same.
> 
> 
> > Thanks
> > Liming
> >> -----邮件原件-----
> >> 发件人: bounce+27952+68292+4905953+8761045@groups.io
> >> <bounce+27952+68292+4905953+8761045@groups.io> 代表 Laszlo
> Ersek
> >> 发送时间: 2020年12月4日 7:35
> >> 收件人: Schaefer, Daniel <daniel.schaefer@hpe.com>;
> devel@edk2.groups.io
> >> 抄送: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
> >> 主题: Re: [edk2-devel] Multithreaded compression with LZMA2
> >>
> >> On 12/03/20 13:11, Schaefer, Daniel wrote:
> >>>
> >>> From: Laszlo Ersek <lersek@redhat.com>
> >>> Sent: Thursday, December 3, 2020 18:24
> >>> To: devel@edk2.groups.io <devel@edk2.groups.io>; Schaefer, Daniel
> >> <daniel.schaefer@hpe.com>
> >>> Cc: Lin, Derek (HPS SW) <derek.lin2@hpe.com>
> >>> Subject: Re: [edk2-devel] Multithreaded compression with LZMA2
> >>>
> >>
> >>> "xz -T" works by splitting the input into blocks, and it generates a
> >>> multi-block compressed output.
> >>>
> >>> Yes, that's correct.
> >>>
> >>>> I'm unsure if the current LZMA
> >>> decompressor that runs inside the firmware (= guided section
extractor)
> >>> copes with multi-block input.
> >>>
> >>> I think you're right that it doesn't. But we can make the guided
section
> >> extractor use that same algorithm(LZMA2) and assign it a different
GUID,
> >> right?
> >>
> >> I guess so...
> >>
> >> Thanks
> >> Laszlo
> 
> 
> 
> 




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-12-08  6:01 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-02  2:59 Multithreaded compression with LZMA2 Daniel Schaefer
2020-12-02  3:36 ` [edk2-devel] " Andrew Fish
2020-12-02  5:21   ` 回复: " gaoliming
2020-12-02  8:24     ` Daniel Schaefer
2020-12-03 10:24 ` Laszlo Ersek
2020-12-03 12:11   ` Daniel Schaefer
2020-12-03 15:57     ` Bret Barkelew
2020-12-04  8:19       ` Daniel Schaefer
2020-12-03 23:35     ` Laszlo Ersek
2020-12-04  2:28       ` 回复: " gaoliming
2020-12-04  9:02         ` Daniel Schaefer
2020-12-08  6:01           ` 回复: " gaoliming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox