Use below three rules to optimize load uCode performance: 1. Let BSP relocate uCode from flash to memory for better performance. 2. BSP caches the CPU ID and address of uCode so AP doesn’t need to look for the uCode again if the CPU ID is same as BSP’s. 3. Only apply uCode in one thread of a core when hyper threading is enabled. v2 changes: Fix potential issue if allocate memory failed. V3 Changes: Remove the ASSERT code which is not correct. Test: Use an sample platform which has 1 socket, 4 core, 8 threads, the CpuMpPei driver cost time reduce from 108.4ms to 27.2ms Eric Dong (3): UefiCpuPkg/MpInitLib: Use BSP uCode for APs if possible. UefiCpuPkg/MpInitLib: Load uCode once for each core. UefiCpuPkg/MpInitLib: Relocate uCode to memory to save time. UefiCpuPkg/Library/MpInitLib/Microcode.c | 43 +++++++++++++++++++++++++++++--- UefiCpuPkg/Library/MpInitLib/MpLib.c | 38 +++++++++++++++++++++++++--- UefiCpuPkg/Library/MpInitLib/MpLib.h | 11 ++++++-- 3 files changed, 84 insertions(+), 8 deletions(-) -- 2.15.0.windows.1