From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.61]) by mx.groups.io with SMTP id smtpd.web10.6576.1574949449752413996 for ; Thu, 28 Nov 2019 05:57:29 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aksDnJ9L; spf=pass (domain: redhat.com, ip: 205.139.110.61, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574949448; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FbMABxzSbe8IGZnxFBZQBzzQtwp9+99HQ/5jFd4cjo8=; b=aksDnJ9LEKAqaQeziTDWKlEbgRBk6Wj/leVHrKzT+W0n4bQSOEbbwiI0/AvjyAP1uKPQI9 96GL8hNhjNGzSTkvgjHJP8Q0ldGJ9mKac2Z53I+YChlKH/g6vBJxMnboYTD5nh3/lzIz0s DUTr9qJ4GiKLyvloZajEj9aSgJtRALE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-232-2zmcYDJYNzSfpRsaj9QKlw-1; Thu, 28 Nov 2019 08:57:22 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B9F97107ACE8; Thu, 28 Nov 2019 13:57:21 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-116-145.ams2.redhat.com [10.36.116.145]) by smtp.corp.redhat.com (Postfix) with ESMTP id 821B819C6A; Thu, 28 Nov 2019 13:57:20 +0000 (UTC) Subject: Re: [PATCH v2] UefiCpuPkg/PiSmmCpuDxeSmm: Avoid allocate Token every time. To: Eric Dong , devel@edk2.groups.io Cc: Ray Ni , Liming Gao References: <20191128061716.196-1-eric.dong@intel.com> From: "Laszlo Ersek" Message-ID: <7ac1f712-5da3-e6b2-f24e-936feb81daa3@redhat.com> Date: Thu, 28 Nov 2019 14:57:19 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20191128061716.196-1-eric.dong@intel.com> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: 2zmcYDJYNzSfpRsaj9QKlw-1 X-Mimecast-Spam-Score: 0 Content-Language: en-US Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit On 11/28/19 07:17, Eric Dong wrote: > v2 changes: > Minor update based on comments. > > v1 changes: > REF: https://bugzilla.tianocore.org/show_bug.cgi?id=2388 > > Current logic allocate Token every time when need to use it. (1) Can you please clarify, in the *commit message*, what service the token allocation is needed for? To me it seems to be related to the SMM MP protocol, which was added for . Is that correct? (For example, the CreateToken() function was introduced in commit 51dd408ae102, "UefiCpuPkg/PiSmmCpuDxeSmm: Enable MM MP Protocol", 2019-07-16.) (2) If the problem indeed depends on MM MP protocol usage, then please update Bugzilla 2388 too, with that information ("performance regression from BZ#1937"). If a platform does not use the MM MP protocol, then highlighting this information helps platforms determine that they are not affected. (I read Bugzilla 2388 soon after it was filed, and I was surprised, because I didn't remember any "human visible" SMI handling slowdown. Maybe it's not on the "human visible" scale; I'm not sure.) > The logic > caused SMI latency raised to very high. Update logic to allocate Token > buffer at driver's entry point. Later use the token from the allocated > token buffer. Only when all the buffer have been used, then need to > allocate new buffer. > > Signed-off-by: Eric Dong > Cc: Ray Ni > Cc: Laszlo Ersek > --- > UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 56 ++++++++++++++++++++-- > UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h | 16 +++++++ > 2 files changed, 68 insertions(+), 4 deletions(-) commenting on the header file changes first: > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > index 8c29f1a558..36fd0dae92 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.h > @@ -198,6 +198,7 @@ typedef UINT32 SMM_CPU_ARRIVAL_EXCEPTIONS; > #define ARRIVAL_EXCEPTION_DELAYED 0x2 > #define ARRIVAL_EXCEPTION_SMI_DISABLED 0x4 > > +#define MAX_PRE_RESERVE_TOKEN_COUNT 0x512 (3) This is a very strange constant, 0x512: it's decimal 1298. Is that your intent? (4) Should we make this a global (SMRAM) variable instead, and initialize it from a PCD? I'm not sure how large tokens are, but if a platform has no use for the MM MP protocol, then pre-allocating tokens for the MM MP protocol just wastes SMRAM. If I grep edk2 (at bd85bf54c268), and edk2-platforms (at 04889ec1198b), for "gEfiMmMpProtocolGuid", the only reported module is "UefiCpuPkg/PiSmmCpuDxeSmm" itself. In other words, there is no open source consumer for the new protocol. I don't think we should waste SMRAM on platforms that don't consume this protocol. Furthermore, I've applied the following patch, and rebuilt OVMF: > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c > index 0685637c2b51..5874195ba96e 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c > @@ -1119,6 +1119,8 @@ CreateToken ( > SPIN_LOCK *CpuToken; > UINTN SpinLockSize; > > + ASSERT (FALSE); > + > SpinLockSize = GetSpinLockProperties (); > CpuToken = AllocatePool (SpinLockSize); > ASSERT (CpuToken != NULL); then I've re-run my usual tests. The ASSERT() has not been reached, and so I think that the new SMRAM allocation would be pure loss (= unused) for OVMF. I suggest that we introduce a new PCD for "token count per chunk", and set the default value to 1, in the DEC file. (And we should copy the PCD into a global variable, in InitializeDataForMmMp().) This way, if a platform suddenly starts consuming tokens, it will work (there will always be a single available token). If they want to improve performance (using a chunked allocation style), they can set the PCD as they see fit. Back to the patch: On 11/28/19 07:17, Eric Dong wrote: > // > // Wrapper used to convert EFI_AP_PROCEDURE2 and EFI_AP_PROCEDURE. > // > @@ -217,6 +218,17 @@ typedef struct { > > #define PROCEDURE_TOKEN_FROM_LINK(a) CR (a, PROCEDURE_TOKEN, Link, PROCEDURE_TOKEN_SIGNATURE) > > +#define TOKEN_BUFFER_SIGNATURE SIGNATURE_32 ('T', 'K', 'B', 'S') > + > +typedef struct { > + UINTN Signature; > + LIST_ENTRY Link; > + > + UINT8 *Buffer; > +} TOKEN_BUFFER; > + > +#define TOKEN_BUFFER_FROM_LINK(a) CR (a, TOKEN_BUFFER, Link, TOKEN_BUFFER_SIGNATURE) > + > // > // Private structure for the SMM CPU module that is stored in DXE Runtime memory > // Contains the SMM Configuration Protocols that is produced. > @@ -243,6 +255,10 @@ typedef struct { > PROCEDURE_WRAPPER *ApWrapperFunc; > LIST_ENTRY TokenList; > > + LIST_ENTRY OldTokenBufList; > + > + UINT8 *CurrentTokenBuf; > + UINTN UsedTokenNum; // Only record tokens used in CurrentTokenBuf. > } SMM_CPU_PRIVATE_DATA; > > extern SMM_CPU_PRIVATE_DATA *gSmmCpuPrivate; > (5) There are two new lines above that are overlong. Did "PatchCheck.py" not complain? > > diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c > index d8d2b6f444..4632e5b0c2 100644 > --- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c > +++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c > @@ -492,6 +492,23 @@ FreeTokens ( > { > LIST_ENTRY *Link; > PROCEDURE_TOKEN *ProcToken; > + TOKEN_BUFFER *TokenBuf; > + > + // > + // Not free the buffer, just clear the UsedTokenNum. In order to > + // avoid later trig allcate action again when need to use token. > + // > + gSmmCpuPrivate->UsedTokenNum = 0; (6) Here we do not zero out the current token buffer, but in CreateToken() and InitializeDataForMmMp(), we use AllocateZeroPool(). This is an inconsistency, we should call either ZeroMem() here (if zeroing matters), or AllocatePool() in the other two places (if zeroing does not matter). > + > + Link = GetFirstNode (&gSmmCpuPrivate->OldTokenBufList); > + while (!IsNull (&gSmmCpuPrivate->OldTokenBufList, Link)) { > + TokenBuf = TOKEN_BUFFER_FROM_LINK (Link); > + > + Link = RemoveEntryList (&TokenBuf->Link); > + > + FreePool (TokenBuf->Buffer); > + FreePool (TokenBuf); > + } > > while (!IsListEmpty (&gSmmCpuPrivate->TokenList)) { > Link = GetFirstNode (&gSmmCpuPrivate->TokenList); > @@ -499,7 +516,6 @@ FreeTokens ( > > RemoveEntryList (&ProcToken->Link); > > - FreePool ((VOID *)ProcToken->ProcedureToken); > FreePool (ProcToken); > } > } > @@ -1115,13 +1131,35 @@ CreateToken ( > VOID > ) > { > - PROCEDURE_TOKEN *ProcToken; > + PROCEDURE_TOKEN *ProcToken; > SPIN_LOCK *CpuToken; > UINTN SpinLockSize; > + TOKEN_BUFFER *TokenBuf; > > SpinLockSize = GetSpinLockProperties (); > - CpuToken = AllocatePool (SpinLockSize); > - ASSERT (CpuToken != NULL); > + > + if (gSmmCpuPrivate->UsedTokenNum == MAX_PRE_RESERVE_TOKEN_COUNT) { > + DEBUG((DEBUG_INFO, "CpuSmm: No free token buffer, allocate new buffer!\n")); (7) This is an expected case, and not too much a corner case at that. Furthermore, the DEBUG message is in a performance-sensitive path. Therefore, the message should be logged with the DEBUG_VERBOSE mask, and not with the DEBUG_INFO mask. (8) In addition, the DEBUG line is too long. (9) The DEBUG line also misses a space character. > + > + // > + // Record current token buffer for later free action usage. > + // Current used token buffer not in this list. > + // > + TokenBuf = AllocatePool (sizeof (TOKEN_BUFFER)); > + ASSERT (TokenBuf != NULL); > + TokenBuf->Signature = TOKEN_BUFFER_SIGNATURE; > + TokenBuf->Buffer = gSmmCpuPrivate->CurrentTokenBuf; > + > + InsertTailList (&gSmmCpuPrivate->OldTokenBufList, &TokenBuf->Link); > + > + gSmmCpuPrivate->CurrentTokenBuf = AllocateZeroPool (SpinLockSize * MAX_PRE_RESERVE_TOKEN_COUNT); > + ASSERT (gSmmCpuPrivate->CurrentTokenBuf != NULL); > + gSmmCpuPrivate->UsedTokenNum = 0; > + } > + > + CpuToken = (SPIN_LOCK *)(gSmmCpuPrivate->CurrentTokenBuf + SpinLockSize * gSmmCpuPrivate->UsedTokenNum); (10) Two lines are overlong. > + gSmmCpuPrivate->UsedTokenNum++; > + > InitializeSpinLock (CpuToken); > AcquireSpinLock (CpuToken); > > @@ -1737,10 +1775,20 @@ InitializeDataForMmMp ( > VOID > ) > { > + UINTN SpinLockSize; > + > + SpinLockSize = GetSpinLockProperties (); > + DEBUG((DEBUG_INFO, "CpuSmm: SpinLock Size = 0x%x\n", SpinLockSize)); (11) For consistency with (7), you might want to downgrade this to DEBUG_VERBOSE too. But, in this case, I won't insist. (12) Space character missing after "DEBUG". (13) Are you proposing this patch for edk2-stable201911? (CC Liming) Thanks Laszlo > + > + gSmmCpuPrivate->CurrentTokenBuf = AllocateZeroPool (SpinLockSize * MAX_PRE_RESERVE_TOKEN_COUNT); > + ASSERT (gSmmCpuPrivate->CurrentTokenBuf != NULL); > + gSmmCpuPrivate->UsedTokenNum = 0; > + > gSmmCpuPrivate->ApWrapperFunc = AllocatePool (sizeof (PROCEDURE_WRAPPER) * gSmmCpuPrivate->SmmCoreEntryContext.NumberOfCpus); > ASSERT (gSmmCpuPrivate->ApWrapperFunc != NULL); > > InitializeListHead (&gSmmCpuPrivate->TokenList); > + InitializeListHead (&gSmmCpuPrivate->OldTokenBufList); > } > > /**