From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by spool.mail.gandi.net (Postfix) with ESMTPS id 6A995941903 for ; Thu, 23 Nov 2023 11:44:06 +0000 (UTC) DKIM-Signature: a=rsa-sha256; bh=RjpyZXiFVNN5BO/kJkKGt2aewD51xzMHihfihH3mcNE=; c=relaxed/simple; d=groups.io; h=Message-ID:Date:MIME-Version:User-Agent:Subject:To:Cc:References:From:In-Reply-To:Precedence:List-Subscribe:List-Help:Sender:List-Id:Mailing-List:Delivered-To:Reply-To:List-Unsubscribe-Post:List-Unsubscribe:Content-Type; s=20140610; t=1700739845; v=1; b=OxLhkjkCqtMQcnUGlkwy5QU71uJP27wZYENf6PXcNqlmZ1mnZyMDqojq11ltIIY2JJauGI7L 8oF7GR111p7BdeZGlHyaRUw4HpP7G6NGPz0SKQIOYoTx3pmGxL77DsPa2g499wIk74OBNyoAwIw 3z02cAA9T970b4+mmtNXt5uU= X-Received: by 127.0.0.2 with SMTP id XmazYY7687511xqOY7QcRY1y; Thu, 23 Nov 2023 03:44:05 -0800 X-Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by mx.groups.io with SMTP id smtpd.web10.88899.1700739843498454283 for ; Thu, 23 Nov 2023 03:44:04 -0800 X-Received: from loongson.cn (unknown [10.40.24.149]) by gateway (Coremail) with SMTP id _____8Dxl+j7Ol9llkU8AA--.17933S3; Thu, 23 Nov 2023 19:43:55 +0800 (CST) X-Received: from [10.40.24.149] (unknown [10.40.24.149]) by localhost.localdomain (Coremail) with SMTP id AQAAf8BxrdzzOl9l2a9KAA--.33578S3; Thu, 23 Nov 2023 19:43:47 +0800 (CST) Message-ID: <1069d005-ef80-46c6-89fa-f3fed316e4ad@loongson.cn> Date: Thu, 23 Nov 2023 19:43:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [edk2-devel] [PATCH v3 11/39] UefiCpuPkg: Add LoongArch64 CPU Timer library To: devel@edk2.groups.io, lersek@redhat.com Cc: Eric Dong , Ray Ni , Rahul Kumar , Gerd Hoffmann References: <20231117095742.3605778-1-lichao@loongs> <20231117100007.3609080-1-lichao@loongson.cn> <2e42db7c-a927-f47b-7d80-632895b11e1b@redhat.com> From: "Chao Li" In-Reply-To: <2e42db7c-a927-f47b-7d80-632895b11e1b@redhat.com> X-CM-TRANSID: AQAAf8BxrdzzOl9l2a9KAA--.33578S3 X-CM-SenderInfo: xolfxt3r6o00pqjv00gofq/1tbiAQAECGVetqoIugABsZ X-Coremail-Antispam: 1Uk129KBj9fXoWfAw1rJrWUXw47Zr47Jr4fJFc_yoW8KF4Duo WIqw1ftr4UKry7WFnYyF1DJ3y3XFs7KF13Xr47Xa15JF1qqr10kFWkCw1rJ3sxCFWv9FZ7 Ga48A397ZFWfKrn3l-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrnUUv73VFW2AGmfu7bjvjm3AaLaJ3UjIYCTnIWjp_UUU5E7kC6x804xWl 14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI8IcIk0rVWrJVCq3wAFIxvE14AKwV WUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l84ACjcxK6xIIjxv20xvE 14v26r1j6r1xM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r1j6r4UM28EF7xvwVC2z280aV AFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJwAS0I0E0xvYzxvE 52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMcIj6xIIjxv20xvE14v26r1j6r 18McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41l 7480Y4vEI4kI2Ix0rVAqx4xJMxAIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r 4UMI8I3I0E5I8CrVAFwI0_JrI_JrWlx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF 67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2I x0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2 z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa73Uj IFyTuYvjxUz4SrUUUUU Precedence: Bulk List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,lichao@loongson.cn List-Unsubscribe-Post: List-Unsubscribe=One-Click List-Unsubscribe: X-Gm-Message-State: zIDaYP4Kl2ovm2oaBQqFoolKx7686176AA= Content-Type: multipart/alternative; boundary="------------Shjr9PWLf6iNpIg2Nwo8fa1e" X-GND-Status: LEGIT Authentication-Results: spool.mail.gandi.net; dkim=pass header.d=groups.io header.s=20140610 header.b=OxLhkjkC; dmarc=none; spf=pass (spool.mail.gandi.net: domain of bounce@groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce@groups.io --------------Shjr9PWLf6iNpIg2Nwo8fa1e Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Hi Laszlo, Thanks so carefully to review this patch, I'm guessing it took your a=20 long time. :) Thanks, Chao On 2023/11/23 00:12, Laszlo Ersek wrote: > On 11/17/23 11:00, Chao Li wrote: >> Add the LoongArch64 CPU Timer library, using CPUCFG 0x4 and 0x5 for >> Stable Counter frequency. >> >> BZ:https://bugzilla.tianocore.org/show_bug.cgi?id=3D4584 >> >> Cc: Eric Dong >> Cc: Ray Ni >> Cc: Rahul Kumar >> Cc: Gerd Hoffmann >> Signed-off-by: Chao Li >> --- >> .../BaseLoongArch64CpuTimerLib.inf | 30 +++ >> .../BaseLoongArch64CpuTimerLib.uni | 15 ++ >> .../BaseLoongArch64CpuTimerLib/CpuTimerLib.c | 226 ++++++++++++++++++ >> UefiCpuPkg/UefiCpuPkg.dsc | 3 + >> 4 files changed, 274 insertions(+) >> create mode 100644 UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseL= oongArch64CpuTimerLib.inf >> create mode 100644 UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseL= oongArch64CpuTimerLib.uni >> create mode 100644 UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/CpuTi= merLib.c > (1) sorry about the annoying comment, but the library should be called > (preferably) BaseTimerLibLoongArch64Cpu. > > We're frequently not consistent with the following library instance > naming scheme, but in theory anyway, library instances should be named > as follows: > > > > Thus, in this case, Base + TimerLib + LoongArch64Cpu. > > update UNI file name, INF file name, directory name, BASE_NAME and > MODULE_UNI_FILE accordingly (if you agree) There has a SPEC for naming style: https://github.com/tianocore-docs/edk2-CCodingStandardsSpecification/blob/m= aster/4_naming_conventions/42_directory_names.md Please check 4.2.3 EDKII Library directory, and most directory naming=20 follows this SPEC. > >> diff --git a/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch= 64CpuTimerLib.inf b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoong= Arch64CpuTimerLib.inf >> new file mode 100644 >> index 0000000000..c00c215aec >> --- /dev/null >> +++ b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTi= merLib.inf >> @@ -0,0 +1,30 @@ >> +## @file >> +# Base CPU Timer Library >> +# >> +# Provides base timer support using CPUCFG 0x4 and 0x5 stable counter = frequency. >> +# >> +# Copyright (c) 2023, Loongson Technology Corporation Limited. All rig= hts reserved.
>> +# SPDX-License-Identifier: BSD-2-Clause-Patent >> +# >> +## >> + >> +[Defines] >> + INF_VERSION =3D 0x00010005 > (2) Can you use the most recent INF file version? (Generally applies to > all new INF files in the series.) > > The latest version is 1.29 > , and you > can just write "1.29" here. OK, I will update this version, include all new INF files for this series. > >> + BASE_NAME =3D BaseLoongArch64CpuTimerLib >> + FILE_GUID =3D 740389C7-CC44-4A2F-88DC-89D97D3= 12E7C >> + MODULE_TYPE =3D BASE >> + VERSION_STRING =3D 1.0 >> + LIBRARY_CLASS =3D TimerLib >> + MODULE_UNI_FILE =3D BaseLoongArch64CpuTimerLib.uni >> + >> +[Sources.common] >> + CpuTimerLib.c > (3) ".common" is not an error, but superfluous here. Yes, I will remove it in V4. > >> + >> +[Packages] >> + MdePkg/MdePkg.dec >> + UefiCpuPkg/UefiCpuPkg.dec > (4) Do you actually consume any artifact from "UefiCpuPkg.dec"? > > Unless you do, there's no need to depend on "UefiCpuPkg.dec" here. Yes, the UefiCpuPkg.dec is indeed not necessary, so I will remove it in=20 this file for V4. > > (5) Relatedly: the TimerLib class is declared in "MdePkg.dec". Thus, if > you indeed don't need anything from "UefiCpuPkg.dec", I'd suggest moving > this library instance under MdePkg. It is inappropriate to place this library in MdePkg, because the MdePkg=20 doesn't have any CpuTimerLib instance, and this class library are HW=20 implementation-related, so it more appropriate to place it in UefiCpuPkg. > >> + >> +[LibraryClasses] >> + BaseLib >> + PcdLib >> + DebugLib > (6) If it's not too much of a burden, it's best to keep library classes > alphabetically sorted. OK, I will sort them for V4. > >> diff --git a/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch= 64CpuTimerLib.uni b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoong= Arch64CpuTimerLib.uni >> new file mode 100644 >> index 0000000000..72d38ec679 >> --- /dev/null >> +++ b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTi= merLib.uni >> @@ -0,0 +1,15 @@ >> +// /** @file >> +// Base CPU Timer Library >> +// >> +// Provides base timer support using CPUCFG 0x4 and 0x5 stable counter = frequency. >> +// >> +// Copyright (c) 2023, Loongson Technology Corporation Limited. All rig= hts reserved.
>> +// >> +// SPDX-License-Identifier: BSD-2-Clause-Patent >> +// >> +// **/ >> + >> + >> +#string STR_MODULE_ABSTRACT #language en-US "LOONGARCH CPU = Timer Library" >> + >> +#string STR_MODULE_DESCRIPTION #language en-US "Provides basic= timer support using CPUCFG 0x4 and 0x5 stable counter frequency." >> diff --git a/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/CpuTimerLib.c= b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/CpuTimerLib.c >> new file mode 100644 >> index 0000000000..349b881cbc >> --- /dev/null >> +++ b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/CpuTimerLib.c >> @@ -0,0 +1,226 @@ >> +/** @file >> + CPUCFG 0x4 and 0x5 for Stable Counter frequency instance of Timer Lib= rary. >> + >> + Copyright (c) 2023, Loongson Technology Corporation Limited. All righ= ts reserved.
>> + >> + SPDX-License-Identifier: BSD-2-Clause-Patent >> +**/ >> + >> +#include >> +#include >> +#include >> +#include >> +#include > (7) It's best to keep the Library includes in sync with the > [LibraryClasses] section from the INF file. > > (And to keep the #include directives alphabetically sorted as well.) > > You are including here because the file is going to > define those functions. OK. This is the sole justified difference > between the Library #includes here and the [LibraryClasses] section in > the INF file. > > Regarding dependencies, you are including and > , but [LibraryClasses] in addition lists PcdLib. > Thus, is missing here. (You are likely getting it > included recursively, but that's not clean enough.) OK, I see, I will sort them and keep them in sync with corresponding=20 libraries. > >> + >> +/** >> + Calculate clock frequency using CPUCFG 0x4 and 0x5 registers. >> + >> + @param VOID. > (8) This @param is useless and should be avoided, AFAICT. I think the > EccCheck plugin will not complain. The EccCheck already passed... Any way, I will remove it in V4. > >> + >> + @return The frequency in Hz. >> + >> +**/ >> +UINT32 >> +EFIAPI >> +CalcConstFreq ( > (9) Should be STATIC, and should *not* be EFIAPI. It is not a public > library or protocol interface. I will fix it in V4. > >> + VOID >> + ) >> +{ >> + UINT32 BaseFreq; >> + UINT32 ClockMultiplier; >> + UINT32 ClockDivide; >> + CPUCFG_REG4_INFO_DATA CCFreq; > (10) Per edk2 coding style, when acronyms and abbreviations are grouped > together, each individual acronym itself is supposed to be CamelCase. > For example, when we use "PCI" in identifiers, it becomes "Pci". So I > think this variable should be called "CcFreq", where "CC" stands for > "CPU Config". If I'm wrong about "CC", then please ignore this point. The two letter 'C' means: first 'C' means Constant, and the second means=20 Crystal. In LoongArch SPEC, this area is named CC_FREQ, which means:=20 Constant frequency timer and the crystal frequency corresponding to the=20 clock used by the timer. So I think CCFreq is more suitable. > >> + CPUCFG_REG5_INFO_DATA CpucfgReg5Data; >> + UINT32 StableTimerFreq; >> + >> + // >> + // Get the the crystal frequency corresponding to the constant >> + // frequency timer and the clock used by the timer. >> + // >> + AsmCpucfg (CPUCFG_REG4_INFO, &CCFreq.Uint32); >> + >> + // >> + // Get the multiplication factor and frequency division factor >> + // corresponding to the constant frequency timer and the clock >> + // used by the timer. >> + // >> + AsmCpucfg (CPUCFG_REG5_INFO, &CpucfgReg5Data.Uint32); >> + >> + BaseFreq =3D CCFreq.Bits.CC_FREQ; > (11) My comment here probably affects the header file where you > introduced the CC_FREQ field name. It should be CcFreq. > > Same for MUL and DIV below. The same to above. > >> + ClockMultiplier =3D CpucfgReg5Data.Bits.CC_MUL & 0xFFFFULL; >> + ClockDivide =3D CpucfgReg5Data.Bits.CC_DIV & 0xFFFFULL; > (12) Seeing how the Uint32 members of the unions (?) are used for > populating the bitfields, the ULL suffix on the masks seems superfluous. > The results are stored to UINT32 local variables, too. 0xFFFF should > suffice. Yes, the ULL does seem redundant, the UINT32 means unsigned INT32 and=20 0xFFFF shoud be unsigned, I thought when I wrote this code, ULL was=20 added to ensure that the value was unsigned. > >> + >> + if (!BaseFreq || !ClockMultiplier || !ClockDivide) { > (13) The edk2 coding style does not like scalar types *other* than > BOOLEAN to be used in logical context. Please rewrite as: > > (BaseFreq =3D=3D 0) || (ClockMultiplier =3D=3D 0) || (ClockDivide =3D= =3D 0) > >> + DEBUG ((DEBUG_ERROR, "LoongArch Stable Timer is not available in th= e CPU, hence this library cannot be used.\n")); > (13) I think this line is a bit too long (117 chars). I think the > currently accepted limit (?) is 100 characters (but I'm unsure). > > Recommend to break this up to multiple lines. > >> + StableTimerFreq =3D 0; > (14) Needless assignment AFAICT. > >> + ASSERT (0); > (15) Should be ASSERT (FALSE). > > (16) I recommend that a CpuDeadLoop() call be added here too, because > ASSERT()s are compiled out of RELEASE builds. > >> + } else { > (17) an "else" branch seems awkward here; code after ASSERT (FALSE) / > CpuDeadLoop() is supposed to be unreachable, so whatever you do here can > just be unnested. For double (13), (14), (15), (16) and (17), I agree, will fix them in=20 V4. But for 15, did the CpuDeadLoop() be needed? I think if the DEBUG=20 mode doesn't work, the RELEASE mode can not be created. > >> + StableTimerFreq =3D (BaseFreq * ClockMultiplier / ClockDivide); > (18) Unsafe multiplication. > > ClockMultiplier is at most 0xFFFF, but BaseFreq (from CC_FREQ) may > theoretically be as high as MAX_UINT32. > > - If there is a LoongArch64 specification that places a 0xFFFF limit on > CC_FREQ, then please add such an ASSERT() here. > > - If there is no such specification, then cast either BaseFreq or > ClockMultiplier to UINT64. A UINT64 multiplication will not overflow > here (because MAX_UINT32 * 0xFFFF fits in 32+16=3D48 bits). Yes, you are right, there is a risk of overflow, I think changing the=20 ClockMultiplier to UINT64 would do the trick, or force transferring the=20 result to ensure it doesn't overflow? Something like: ((UINT64)(BaseFreq=20 * ClockMultiplier) / ClockDivide). > > > (19) In turn, the result of the division needs to be checked against two > boundaries: > > - it must not be zero (could occur if the divisor is too large, relative > to the product). Returning zero from this function would case problems > elsewhere. I will ASSERT here if the result is zero. > > - it must fit in a UINT32. Better yet, change the return type of the > function to UINT64. Agree. > > >> + } >> + >> + return StableTimerFreq; >> +} >> + >> +/** >> + Stalls the CPU for at least the given number of microseconds. >> + >> + Stalls the CPU for the number of microseconds specified by MicroSecon= ds. >> + >> + @param MicroSeconds The minimum number of microseconds to delay. >> + >> + @return MicroSeconds >> + >> +**/ >> +UINTN >> +EFIAPI >> +MicroSecondDelay ( >> + IN UINTN MicroSeconds >> + ) >> +{ >> + UINTN Count, Ticks, Start, End; >> + >> + Count =3D (CalcConstFreq () * MicroSeconds) / 1000000; > (20) Unchecked multiplication. CalcConstFreq() may be as large as > > (MAX_UINT32 * 0xFFFF / 1) > > and then dependent on MicroSeconds we could overflow even a UINT64 here. > > This function does not permit returning errors, so I'm not fully sure > what to do. Normally I'd recommend a safe UINT64 multiplication from > SafeIntLib (and then storing the quotient, from the division, in a UINT64= ). OK, I will use the SafeIntLib here. > >> + Start =3D AsmReadStableCounter (); >> + End =3D Start + Count; >> + >> + do { >> + Ticks =3D AsmReadStableCounter (); >> + } while (Ticks < End); > The rest of this patch indicates that > > (a) the performance counter is just the stable counter > (GetPerformanceCounter() simply calls AsmReadStableCounter()), and that > > (b) the range is [BIT2, BIT48-1] (both sides inclusive). Sorry, The PerformanceCounter function series are updated in our=20 internal, I will update them in V4. I would say, the performance counter=20 is not same to stable counter, and the stable count in LoongArch64 are=20 64 bits wide, so no warparound occurs. PLS wait for V4. > > (21) The problem is that this simple loop does not consider wrap-around. > > We should do something like this: > > RETURN_STATUS Status; > UINT64 Remaining; > UINT64 Start; > > Status =3D SafeUint64Mult (CalcConstFreq (), MicroSeconds, &Remaining)= ; > ASSERT_RETURN_ERROR (Status); > if (RETURN_ERROR (Status)) { > CpuDeadLoop (); > } > > // > // for the given number of microseconds, the needed tick count should > // be rounded upwards > // > Status =3D SafeUint64Add (Remaining, 1000000 - 1, &Remaining); > ASSERT_RETURN_ERROR (Status); > if (RETURN_ERROR (Status)) { > CpuDeadLoop (); > } > > Remaining =3D DivU64x32 (Remaining, 1000000); > Start =3D AsmReadStableCounter (); > > while (Remaining > 0) { > UINT64 Stop; > UINT64 Diff; > > Stop =3D AsmReadStableCounter (); > > if (Start <=3D Stop) { > // > // no wrap-around > // > Diff =3D Stop - Start; > } else { > // > // wrap-around > // > Diff =3D (BIT48 - Start) + (Stop - BIT2); > } > > Start =3D Stop; > > Remaining -=3D MIN (Remaining, Diff); > } Refer to above. > >> + >> + return MicroSeconds; >> +} >> + >> +/** >> + Stalls the CPU for at least the given number of nanoseconds. >> + >> + Stalls the CPU for the number of nanoseconds specified by NanoSeconds= . >> + >> + @param NanoSeconds The minimum number of nanoseconds to delay. >> + >> + @return NanoSeconds >> + >> +**/ >> +UINTN >> +EFIAPI >> +NanoSecondDelay ( >> + IN UINTN NanoSeconds >> + ) >> +{ >> + UINT32 MicroSeconds; >> + >> + if (NanoSeconds % 1000 =3D=3D 0) { >> + MicroSeconds =3D NanoSeconds/1000; >> + } else { >> + MicroSeconds =3D NanoSeconds/1000 + 1; >> + } > (22) MicroSeconds should have type UINTN. Agree. > > (23) Using DivU64x32Remainder() would be more idiomatic. > > ( > > I agree that, if we find that the remainder is nonzero, adding 1 to the > quotient is safe, for the original UINTN range too. > > Initial condition: > > ns <=3D MAX_UINTN > > Divide by 1000 (mathematically): > > ns/1000 <=3D MAX_UINTN/1000 > > Because floor(x) <=3D x, we can expand the left hand side: > > floor(ns/1000) <=3D ns/1000 <=3D MAX_UINTN/1000 > > Drop the middle: > > floor(ns/1000) <=3D MAX_UINTN/1000 > > Add 1: > > floor(ns/1000) + 1 <=3D MAX_UINTN/1000 + 1 > > At the left hand side, we now have the C language result (the truncated, > then incremented, quotinent). > > Now, a small detour: > > 1/1000 < 1 > > Multiply by x (assuming x>0): > > x / 1000 < x > > Substitute MAX_UINTN for x (MAX_UINTN is positive): > > MAX_UINTN/1000 < MAX_UINTN > > Then use this to expand the right hand side: > > floor(ns/1000) + 1 <=3D MAX_UINTN/1000 + 1 < MAX_UINTN + 1 > > Drop the middle: > > floor(ns/1000) + 1 < MAX_UINTN + 1 > > Given that both sides are integers, we get > > floor(ns/1000) + 1 <=3D MAX_UINTN > > which is what we needed. > > ) OK, I see. In V4, I will use the DivU64X32Remainder() to change this=20 operation. > >> + >> + MicroSecondDelay (MicroSeconds); >> + >> + return NanoSeconds; >> +} >> + >> +/** >> + Retrieves the current value of a 64-bit free running performance coun= ter. >> + >> + Retrieves the current value of a 64-bit free running performance coun= ter. The >> + counter can either count up by 1 or count down by 1. If the physical >> + performance counter counts by a larger increment, then the counter va= lues >> + must be translated. The properties of the counter can be retrieved fr= om >> + GetPerformanceCounterProperties(). >> + >> + @return The current value of the free running performance counter. >> + >> +**/ >> +UINT64 >> +EFIAPI >> +GetPerformanceCounter ( >> + VOID >> + ) >> +{ >> + return AsmReadStableCounter (); >> +} >> + >> +/** >> + Retrieves the 64-bit frequency in Hz and the range of performance cou= nter >> + values. >> + >> + If StartValue is not NULL, then the value that the performance counte= r starts >> + with immediately after is it rolls over is returned in StartValue. If >> + EndValue is not NULL, then the value that the performance counter end= with >> + immediately before it rolls over is returned in EndValue. The 64-bit >> + frequency of the performance counter in Hz is always returned. If Sta= rtValue >> + is less than EndValue, then the performance counter counts up. If Sta= rtValue >> + is greater than EndValue, then the performance counter counts down. F= or >> + example, a 64-bit free running counter that counts up would have a St= artValue >> + of 0 and an EndValue of 0xFFFFFFFFFFFFFFFF. A 24-bit free running cou= nter >> + that counts down would have a StartValue of 0xFFFFFF and an EndValue = of 0. >> + >> + @param StartValue The value the performance counter starts with whe= n it >> + rolls over. >> + @param EndValue The value that the performance counter ends with = before >> + it rolls over. >> + >> + @return The frequency in Hz. >> + >> +**/ >> +UINT64 >> +EFIAPI >> +GetPerformanceCounterProperties ( >> + OUT UINT64 *StartValue OPTIONAL, >> + OUT UINT64 *EndValue OPTIONAL >> + ) >> +{ >> + if (StartValue !=3D NULL) { >> + *StartValue =3D BIT2; >> + } >> + >> + if (EndValue !=3D NULL) { >> + *EndValue =3D BIT48 - 1; >> + } >> + >> + return CalcConstFreq (); >> +} >> + >> +/** >> + Converts elapsed ticks of performance counter to time in nanoseconds. >> + >> + This function converts the elapsed ticks of running performance count= er to >> + time value in unit of nanoseconds. >> + >> + @param Ticks The number of elapsed ticks of running performance = counter. >> + >> + @return The elapsed time in nanoseconds. >> + >> +**/ >> +UINT64 >> +EFIAPI >> +GetTimeInNanoSecond ( >> + IN UINT64 Ticks >> + ) >> +{ >> + UINT64 Frequency; >> + UINT64 NanoSeconds; >> + UINT64 Remainder; >> + INTN Shift; >> + >> + Frequency =3D GetPerformanceCounterProperties (NULL, NULL); >> + >> + // >> + // Ticks >> + // Time =3D --------- x 1,000,000,000 >> + // Frequency >> + // >> + NanoSeconds =3D MultU64x32 (DivU64x64Remainder (Ticks, Frequency, &Re= mainder), 1000000000u); > (24) I understand this is being copied from elsewhere, but the > multiplication by 10^9 should be checked against overflow, using > SafeIntLib. If we have e.g. a 100 MHz clock, then passing in > Ticks=3DMAX_UINT64 will lead to an overflow. :( OK, in V4, I will use the SafeIntLib. > >> + >> + // >> + // Ensure (Remainder * 1,000,000,000) will not overflow 64-bit. >> + // Since 2^29 < 1,000,000,000 =3D 0x3B9ACA00 < 2^30, Remainder should= < 2^(64-30) =3D 2^34, >> + // i.e. highest bit set in Remainder should <=3D 33. >> + // > What we want here is (Remainder * 10^9 / Frequency), and in that we want > the product not to overflow 2^64 - 1. Thus, we need > > Remainder * 10^9 <=3D 2^64 - 1 > > Remainder <=3D (2^64 - 1) / 10^9 > > The floor -- i.e., integral part -- of the RHS is 18,446,744,073 > decimal, or 100_01001011_10000010_11111010_00001001 binary. The most > significant bit that is set is bit 34. But that's not a sufficient > condition, if we only consider the MSB in Remainder, because other bits > that should be clear (such as bit 33) could still be set. The sufficient > condition is thus that the MSB be <=3D 33. Therefore the comment is OK. > >> + Shift =3D MAX (0, HighBitSet64 (Remainder) - 33); > Quirky, but correct. > > If Remainder is 0, then HighBitSet64 returns -1, and we compare -34 vs. > 0 -- we get zero. OK, no shift needed. > > Otherwise, if HighBitSet64 returns a value in the range [0..33], then we > compare [-33..0] vs 0 -- we get zero from MAX. OK, no shift needed. > > Otherwise, HighBitSet64 returns a value from the range [34..63], then we > compare [1..30] against zero, and the former prevails (gets stored to > Shift). OK. > >> + Remainder =3D RShiftU64 (Remainder, (UINTN)Shift); >> + Frequency =3D RShiftU64 (Frequency, (UINTN)Shift); > I *think* this approach will ensure that Frequency is not zero. > Remainder is strictly smaller than Frequency, and we shift out only so > many LSBs from Remainder (and Frequency) that (Remainder * 10^9) below > *just* not overflow. That means Remainder should not end up being zeroed > out, and so Frequency shouldn't either (the MSB of the original > Frequency is larger than or equal to the MSB of the original Remainder). = OK. > >> + NanoSeconds +=3D DivU64x64Remainder (MultU64x32 (Remainder, 100000000= 0u), Frequency, NULL); > So the RHS here may give us a proper NanoSeconds increment, but how do > we know that the addition will not overflow NanoSeconds? > > Here's an example: > > Ticks =3D 18,446,744,055,553,255,925 (0xFFFF_FFFB_C5CC_E9F5) > Frequency =3D 999,999,999 (0x3B9A_C9FF) > > ( > > The idea with this example is that dividing Ticks by Frequency and then > multiplying the result with 1,000,000,000 causes a *very slight* blow-up: > > Ticks / 999,999,999 * 1,000,000,000 =3D > Ticks * (1,000,000,000 / 999,999,999) > > in such a tricky way that this blow-up: > > (a) *just* doesn't overflow the first part of the function (where we use > the integral multiples of Frequency), but > > (b) does overflow the last part (where we use the remaining, fractional, > multiple of Frequency). > > The way to achieve that, after setting Frequency to just below 1GHz for > the above blow-up, is to start with Ticks =3D MAX_UINT64, perform the > division, then maximize Remainder in comprison to the divisor (i.e., > Frequency) -- increase Remainder to (Frequency-1) --, then recalculate > Ticks from that *backwards* (increase Ticks the same as we increased > Remainder). > > ) > > And then this happens: > > - the initial DivU64x64Remainder (Ticks, Frequency, &Remainder) call > returns 18,446,744,073 (0x4_4B82_FA09) as quotient, and sets Remainder > to 999,999,998 (0x3B9A_C9FE, binary 111011_10011010_11001001_11111110). > > - the first MultU64x32() call produces 18,446,744,073,000,000,000 in > NanoSeconds (0xFFFF_FFFF_D5B5_1A00). > > - Remainder's MSB is bit 29, thus Shift gets assigned 0. > > - Multiplying Remainder by 1,000,000,000 produces > 999,999,998,000,000,000 (0xDE0_B6B3_302E_6C00). No overflow, as intended. > > - Dividing that product by Frequency produces 999,999,998 (0x3B9A_C9FE) > as the *increment* for NanoSeconds. > > - Adding this increment (0x3B9A_C9FE) to NanoSeconds from earlier > (0xFFFF_FFFF_D5B5_1A00) *does* overflow: the result would be > 0x1_0000_0000_114F_E3FE! > > (25) This proves that, generally speaking, the final "+=3D" operation > (increment) is unsafe -- in every TimerLib instance in edk2 that uses > this calculation --, and should be replaced with an addition from > SafeIntLib. Do you recommend changing the final "+=3D" to the SafeIntLib way?=20 Something like: NanoSeconds =3D SafeUint64Add (NanoSeconds,=20 DivU64x64Remainder (MultU64x32 (Remainder, 1000000000u), Frequency,=20 NULL), &NanoSeconds). > > I'm not asking for a tree-wide fix, but using SafeIntLib here would be ni= ce. > > >> + >> + return NanoSeconds; >> +} >> diff --git a/UefiCpuPkg/UefiCpuPkg.dsc b/UefiCpuPkg/UefiCpuPkg.dsc >> index 074fd77461..8e34a9cd6b 100644 >> --- a/UefiCpuPkg/UefiCpuPkg.dsc >> +++ b/UefiCpuPkg/UefiCpuPkg.dsc >> @@ -205,5 +205,8 @@ >> UefiCpuPkg/CpuTimerDxeRiscV64/CpuTimerDxeRiscV64.inf >> UefiCpuPkg/CpuDxeRiscV64/CpuDxeRiscV64.inf >> =20 >> +[Components.LOONGARCH64] >> + UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimer= Lib.inf >> + >> [BuildOptions] >> *_*_*_CC_FLAGS =3D -D DISABLE_NEW_DEPRECATED_INTERFACES > (26) And then this should go into MdePkg.dec, according to points (4) > and (5) above. Same to above. > > Thanks > Laszlo > > > >=20 > -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#111667): https://edk2.groups.io/g/devel/message/111667 Mute This Topic: https://groups.io/mt/102644766/7686176 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [rebecca@openfw.io] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- --------------Shjr9PWLf6iNpIg2Nwo8fa1e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hi Laszlo,

Thanks so carefully to review this patch, I'm guessing it took your a long time. :)


Thanks,
Chao
On 2023/11/23 00:12, Laszlo Ersek wrote:
On 11/17/23 11:00, Chao Li wrote:
Add the LoongArch64 CPU Timer library, using CPUCFG 0x4 and 0x5 for
Stable Counter frequency.

BZ: https://bugzilla.tianocore.org/show_bug.cgi?id=4584

Cc: Eric Dong <eric.dong@intel.com>
Cc: Ray Ni <ray.ni@intel.com>
Cc: Rahul Kumar <rahul1.kumar@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Chao Li <lichao@loongson.cn>
---
 .../BaseLoongArch64CpuTimerLib.inf            |  30 +++
 .../BaseLoongArch64CpuTimerLib.uni            |  15 ++
 .../BaseLoongArch64CpuTimerLib/CpuTimerLib.c  | 226 ++++++++++++++++++
 UefiCpuPkg/UefiCpuPkg.dsc                     |   3 +
 4 files changed, 274 insertions(+)
 create mode 100644 UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.inf
 create mode 100644 UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.uni
 create mode 100644 UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/CpuTimerLib.c
(1) sorry about the annoying comment, but the library should be called
(preferably) BaseTimerLibLoongArch64Cpu.

We're frequently not consistent with the following library instance
naming scheme, but in theory anyway, library instances should be named
as follows:

  <firmware module type><lib class name><library instance identifier>

Thus, in this case, Base + TimerLib + LoongArch64Cpu.

update UNI file name, INF file name, directory name, BASE_NAME and
MODULE_UNI_FILE accordingly (if you agree)

There has a SPEC for naming style: 

https://github.com/tianocore-docs/edk2-CCodingStandardsSpecification/blob/master/4_naming_conventions/42_directory_names.md

Please check 4.2.3 EDKII Library directory, and most directory naming follows this SPEC.


diff --git a/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.inf b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.inf
new file mode 100644
index 0000000000..c00c215aec
--- /dev/null
+++ b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.inf
@@ -0,0 +1,30 @@
+## @file
+#  Base CPU Timer Library
+#
+#  Provides base timer support using CPUCFG 0x4 and 0x5 stable counter frequency.
+#
+#  Copyright (c) 2023, Loongson Technology Corporation Limited. All rights reserved.<BR>
+#  SPDX-License-Identifier: BSD-2-Clause-Patent
+#
+##
+
+[Defines]
+  INF_VERSION                       = 0x00010005
(2) Can you use the most recent INF file version? (Generally applies to
all new INF files in the series.)

The latest version is 1.29
<https://tianocore-docs.github.io/edk2-InfSpecification/draft/>, and you
can just write "1.29" here.
OK, I will update this version, include all new INF files for this series.

+  BASE_NAME                         = BaseLoongArch64CpuTimerLib
+  FILE_GUID                         = 740389C7-CC44-4A2F-88DC-89D97D312E7C
+  MODULE_TYPE                       = BASE
+  VERSION_STRING                    = 1.0
+  LIBRARY_CLASS                     = TimerLib
+  MODULE_UNI_FILE                   = BaseLoongArch64CpuTimerLib.uni
+
+[Sources.common]
+  CpuTimerLib.c
(3) ".common" is not an error, but superfluous here.
Yes, I will remove it in V4.

+
+[Packages]
+  MdePkg/MdePkg.dec
+  UefiCpuPkg/UefiCpuPkg.dec
(4) Do you actually consume any artifact from "UefiCpuPkg.dec"?

Unless you do, there's no need to depend on "UefiCpuPkg.dec" here.
Yes, the UefiCpuPkg.dec is indeed not necessary, so I will remove it in this file for V4.

(5) Relatedly: the TimerLib class is declared in "MdePkg.dec". Thus, if
you indeed don't need anything from "UefiCpuPkg.dec", I'd suggest moving
this library instance under MdePkg.
It is inappropriate to place this library in MdePkg, because the MdePkg doesn't have any CpuTimerLib instance, and this class library are HW implementation-related, so it more appropriate to place it in UefiCpuPkg.

+
+[LibraryClasses]
+  BaseLib
+  PcdLib
+  DebugLib
(6) If it's not too much of a burden, it's best to keep library classes
alphabetically sorted.
OK, I will sort them for V4.

diff --git a/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.uni b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.uni
new file mode 100644
index 0000000000..72d38ec679
--- /dev/null
+++ b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.uni
@@ -0,0 +1,15 @@
+// /** @file
+// Base CPU Timer Library
+//
+// Provides base timer support using CPUCFG 0x4 and 0x5 stable counter frequency.
+//
+// Copyright (c) 2023, Loongson Technology Corporation Limited. All rights reserved.<BR>
+//
+// SPDX-License-Identifier: BSD-2-Clause-Patent
+//
+// **/
+
+
+#string STR_MODULE_ABSTRACT             #language en-US "LOONGARCH CPU Timer Library"
+
+#string STR_MODULE_DESCRIPTION          #language en-US "Provides basic timer support using CPUCFG 0x4 and 0x5 stable counter frequency."
diff --git a/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/CpuTimerLib.c b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/CpuTimerLib.c
new file mode 100644
index 0000000000..349b881cbc
--- /dev/null
+++ b/UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/CpuTimerLib.c
@@ -0,0 +1,226 @@
+/** @file
+  CPUCFG 0x4 and 0x5 for Stable Counter frequency instance of Timer Library.
+
+  Copyright (c) 2023, Loongson Technology Corporation Limited. All rights reserved.<BR>
+
+  SPDX-License-Identifier: BSD-2-Clause-Patent
+**/
+
+#include <Base.h>
+#include <Library/TimerLib.h>
+#include <Library/BaseLib.h>
+#include <Library/DebugLib.h>
+#include <Register/LoongArch64/Cpucfg.h>
(7) It's best to keep the Library includes in sync with the
[LibraryClasses] section from the INF file.

(And to keep the #include directives alphabetically sorted as well.)

You are including <Library/TimerLib.h> here because the file is going to
define those functions. OK. This is the sole justified difference
between the Library #includes here and the [LibraryClasses] section in
the INF file.

Regarding dependencies, you are including <Library/BaseLib.h> and
<Library/DebugLib.h>, but [LibraryClasses] in addition lists PcdLib.
Thus, <Library/PcdLib.h> is missing here. (You are likely getting it
included recursively, but that's not clean enough.)
OK, I see, I will sort them and keep them in sync with corresponding libraries.

+
+/**
+  Calculate clock frequency using CPUCFG 0x4 and 0x5 registers.
+
+  @param  VOID.
(8) This @param is useless and should be avoided, AFAICT. I think the
EccCheck plugin will not complain.
The EccCheck already passed... Any way, I will remove it in V4.

+
+  @return The frequency in Hz.
+
+**/
+UINT32
+EFIAPI
+CalcConstFreq (
(9) Should be STATIC, and should *not* be EFIAPI. It is not a public
library or protocol interface.
I will fix it in V4.

+  VOID
+  )
+{
+  UINT32                 BaseFreq;
+  UINT32                 ClockMultiplier;
+  UINT32                 ClockDivide;
+  CPUCFG_REG4_INFO_DATA  CCFreq;
(10) Per edk2 coding style, when acronyms and abbreviations are grouped
together, each individual acronym itself is supposed to be CamelCase.
For example, when we use "PCI" in identifiers, it becomes "Pci". So I
think this variable should be called "CcFreq", where "CC" stands for
"CPU Config". If I'm wrong about "CC", then please ignore this point.
The two letter 'C' means: first 'C' means Constant, and the second means Crystal. In LoongArch SPEC, this area is named CC_FREQ, which means: Constant frequency timer and the crystal frequency corresponding to the clock used by the timer. So I think CCFreq is more suitable.

+  CPUCFG_REG5_INFO_DATA  CpucfgReg5Data;
+  UINT32                 StableTimerFreq;
+
+  //
+  // Get the the crystal frequency corresponding to the constant
+  // frequency timer and the clock used by the timer.
+  //
+  AsmCpucfg (CPUCFG_REG4_INFO, &CCFreq.Uint32);
+
+  //
+  // Get the multiplication factor and frequency division factor
+  // corresponding to the constant frequency timer and the clock
+  // used by the timer.
+  //
+  AsmCpucfg (CPUCFG_REG5_INFO, &CpucfgReg5Data.Uint32);
+
+  BaseFreq        = CCFreq.Bits.CC_FREQ;
(11) My comment here probably affects the header file where you
introduced the CC_FREQ field name. It should be CcFreq.

Same for MUL and DIV below.
The same to above.

+  ClockMultiplier = CpucfgReg5Data.Bits.CC_MUL & 0xFFFFULL;
+  ClockDivide     = CpucfgReg5Data.Bits.CC_DIV & 0xFFFFULL;
(12) Seeing how the Uint32 members of the unions (?) are used for
populating the bitfields, the ULL suffix on the masks seems superfluous.
The results are stored to UINT32 local variables, too. 0xFFFF should
suffice.
Yes, the ULL does seem redundant, the UINT32 means unsigned INT32 and 0xFFFF shoud be unsigned, I thought when I wrote this code, ULL was added to ensure that the value was unsigned.

+
+  if (!BaseFreq || !ClockMultiplier || !ClockDivide) {
(13) The edk2 coding style does not like scalar types *other* than
BOOLEAN to be used in logical context. Please rewrite as:

  (BaseFreq == 0) || (ClockMultiplier == 0) || (ClockDivide == 0)

+    DEBUG ((DEBUG_ERROR, "LoongArch Stable Timer is not available in the CPU, hence this library cannot be used.\n"));
(13) I think this line is a bit too long (117 chars). I think the
currently accepted limit (?) is 100 characters (but I'm unsure).

Recommend to break this up to multiple lines.

+    StableTimerFreq = 0;
(14) Needless assignment AFAICT.

+    ASSERT (0);
(15) Should be ASSERT (FALSE).

(16) I recommend that a CpuDeadLoop() call be added here too, because
ASSERT()s are compiled out of RELEASE builds.

+  } else {
(17) an "else" branch seems awkward here; code after ASSERT (FALSE) /
CpuDeadLoop() is supposed to be unreachable, so whatever you do here can
just be unnested.
For double (13), (14), (15), (16) and (17), I agree, will fix them in V4. But for 15, did the CpuDeadLoop() be needed? I think if the DEBUG mode doesn't work, the RELEASE mode can not be created.

+    StableTimerFreq = (BaseFreq * ClockMultiplier / ClockDivide);
(18) Unsafe multiplication.

ClockMultiplier is at most 0xFFFF, but BaseFreq (from CC_FREQ) may
theoretically be as high as MAX_UINT32.

- If there is a LoongArch64 specification that places a 0xFFFF limit on
CC_FREQ, then please add such an ASSERT() here.

- If there is no such specification, then cast either BaseFreq or
ClockMultiplier to UINT64. A UINT64 multiplication will not overflow
here (because MAX_UINT32 * 0xFFFF fits in 32+16=48 bits).
Yes, you are right, there is a risk of overflow, I think changing the ClockMultiplier to UINT64 would do the trick, or force transferring the result to ensure it doesn't overflow? Something like: ((UINT64)(BaseFreq * ClockMultiplier) / ClockDivide).


(19) In turn, the result of the division needs to be checked against two
boundaries:

- it must not be zero (could occur if the divisor is too large, relative
to the product). Returning zero from this function would case problems
elsewhere.
I will ASSERT here if the result is zero.

- it must fit in a UINT32. Better yet, change the return type of the
function to UINT64.
Agree.


+  }
+
+  return StableTimerFreq;
+}
+
+/**
+  Stalls the CPU for at least the given number of microseconds.
+
+  Stalls the CPU for the number of microseconds specified by MicroSeconds.
+
+  @param  MicroSeconds  The minimum number of microseconds to delay.
+
+  @return MicroSeconds
+
+**/
+UINTN
+EFIAPI
+MicroSecondDelay (
+  IN UINTN  MicroSeconds
+  )
+{
+  UINTN  Count, Ticks, Start, End;
+
+  Count = (CalcConstFreq () * MicroSeconds) / 1000000;
(20) Unchecked multiplication. CalcConstFreq() may be as large as

  (MAX_UINT32 * 0xFFFF / 1)

and then dependent on MicroSeconds we could overflow even a UINT64 here.

This function does not permit returning errors, so I'm not fully sure
what to do. Normally I'd recommend a safe UINT64 multiplication from
SafeIntLib (and then storing the quotient, from the division, in a UINT64).
OK, I will use the SafeIntLib here.

+  Start = AsmReadStableCounter ();
+  End   = Start + Count;
+
+  do {
+    Ticks = AsmReadStableCounter ();
+  } while (Ticks < End);
The rest of this patch indicates that

(a) the performance counter is just the stable counter
(GetPerformanceCounter() simply calls AsmReadStableCounter()), and that

(b) the range is [BIT2, BIT48-1] (both sides inclusive).
Sorry, The PerformanceCounter function series are updated in our internal, I will update them in V4. I would say, the performance counter is not same to stable counter, and the stable count in LoongArch64 are 64 bits wide, so no warparound occurs. PLS wait for V4.

(21) The problem is that this simple loop does not consider wrap-around.

We should do something like this:

  RETURN_STATUS  Status;
  UINT64         Remaining;
  UINT64         Start;

  Status = SafeUint64Mult (CalcConstFreq (), MicroSeconds, &Remaining);
  ASSERT_RETURN_ERROR (Status);
  if (RETURN_ERROR (Status)) {
    CpuDeadLoop ();
  }

  //
  // for the given number of microseconds, the needed tick count should
  // be rounded upwards
  //
  Status = SafeUint64Add (Remaining, 1000000 - 1, &Remaining);
  ASSERT_RETURN_ERROR (Status);
  if (RETURN_ERROR (Status)) {
    CpuDeadLoop ();
  }

  Remaining = DivU64x32 (Remaining, 1000000);
  Start = AsmReadStableCounter ();

  while (Remaining > 0) {
    UINT64  Stop;
    UINT64  Diff;

    Stop = AsmReadStableCounter ();

    if (Start <= Stop) {
      //
      // no wrap-around
      //
      Diff = Stop - Start;
    } else {
      //
      // wrap-around
      //
      Diff = (BIT48 - Start) + (Stop - BIT2);
    }

    Start = Stop;

    Remaining -= MIN (Remaining, Diff);
  }
Refer to above.

+
+  return MicroSeconds;
+}
+
+/**
+  Stalls the CPU for at least the given number of nanoseconds.
+
+  Stalls the CPU for the number of nanoseconds specified by NanoSeconds.
+
+  @param  NanoSeconds The minimum number of nanoseconds to delay.
+
+  @return NanoSeconds
+
+**/
+UINTN
+EFIAPI
+NanoSecondDelay (
+  IN UINTN  NanoSeconds
+  )
+{
+  UINT32  MicroSeconds;
+
+  if (NanoSeconds % 1000 == 0) {
+    MicroSeconds = NanoSeconds/1000;
+  } else {
+    MicroSeconds = NanoSeconds/1000 + 1;
+  }
(22) MicroSeconds should have type UINTN.
Agree.

(23) Using DivU64x32Remainder() would be more idiomatic.

(

I agree that, if we find that the remainder is nonzero, adding 1 to the
quotient is safe, for the original UINTN range too.

Initial condition:

  ns <= MAX_UINTN

Divide by 1000 (mathematically):

  ns/1000 <= MAX_UINTN/1000

Because floor(x) <= x, we can expand the left hand side:

  floor(ns/1000) <= ns/1000 <= MAX_UINTN/1000

Drop the middle:

  floor(ns/1000) <= MAX_UINTN/1000

Add 1:

  floor(ns/1000) + 1 <= MAX_UINTN/1000 + 1

At the left hand side, we now have the C language result (the truncated,
then incremented, quotinent).

Now, a small detour:

  1/1000 < 1

Multiply by x (assuming x>0):

  x / 1000 < x

Substitute MAX_UINTN for x (MAX_UINTN is positive):

  MAX_UINTN/1000 < MAX_UINTN

Then use this to expand the right hand side:

  floor(ns/1000) + 1 <= MAX_UINTN/1000 + 1 < MAX_UINTN + 1

Drop the middle:

  floor(ns/1000) + 1 < MAX_UINTN + 1

Given that both sides are integers, we get

  floor(ns/1000) + 1 <= MAX_UINTN

which is what we needed.

)
OK, I see. In V4, I will use the DivU64X32Remainder() to change this operation.

+
+  MicroSecondDelay (MicroSeconds);
+
+  return NanoSeconds;
+}
+
+/**
+  Retrieves the current value of a 64-bit free running performance counter.
+
+  Retrieves the current value of a 64-bit free running performance counter. The
+  counter can either count up by 1 or count down by 1. If the physical
+  performance counter counts by a larger increment, then the counter values
+  must be translated. The properties of the counter can be retrieved from
+  GetPerformanceCounterProperties().
+
+  @return The current value of the free running performance counter.
+
+**/
+UINT64
+EFIAPI
+GetPerformanceCounter (
+  VOID
+  )
+{
+  return AsmReadStableCounter ();
+}
+
+/**
+  Retrieves the 64-bit frequency in Hz and the range of performance counter
+  values.
+
+  If StartValue is not NULL, then the value that the performance counter starts
+  with immediately after is it rolls over is returned in StartValue. If
+  EndValue is not NULL, then the value that the performance counter end with
+  immediately before it rolls over is returned in EndValue. The 64-bit
+  frequency of the performance counter in Hz is always returned. If StartValue
+  is less than EndValue, then the performance counter counts up. If StartValue
+  is greater than EndValue, then the performance counter counts down. For
+  example, a 64-bit free running counter that counts up would have a StartValue
+  of 0 and an EndValue of 0xFFFFFFFFFFFFFFFF. A 24-bit free running counter
+  that counts down would have a StartValue of 0xFFFFFF and an EndValue of 0.
+
+  @param  StartValue  The value the performance counter starts with when it
+                      rolls over.
+  @param  EndValue    The value that the performance counter ends with before
+                      it rolls over.
+
+  @return The frequency in Hz.
+
+**/
+UINT64
+EFIAPI
+GetPerformanceCounterProperties (
+  OUT UINT64  *StartValue   OPTIONAL,
+  OUT UINT64  *EndValue     OPTIONAL
+  )
+{
+  if (StartValue != NULL) {
+    *StartValue = BIT2;
+  }
+
+  if (EndValue != NULL) {
+    *EndValue = BIT48 - 1;
+  }
+
+  return CalcConstFreq ();
+}
+
+/**
+  Converts elapsed ticks of performance counter to time in nanoseconds.
+
+  This function converts the elapsed ticks of running performance counter to
+  time value in unit of nanoseconds.
+
+  @param  Ticks     The number of elapsed ticks of running performance counter.
+
+  @return The elapsed time in nanoseconds.
+
+**/
+UINT64
+EFIAPI
+GetTimeInNanoSecond (
+  IN UINT64  Ticks
+  )
+{
+  UINT64  Frequency;
+  UINT64  NanoSeconds;
+  UINT64  Remainder;
+  INTN    Shift;
+
+  Frequency = GetPerformanceCounterProperties (NULL, NULL);
+
+  //
+  //          Ticks
+  // Time = --------- x 1,000,000,000
+  //        Frequency
+  //
+  NanoSeconds = MultU64x32 (DivU64x64Remainder (Ticks, Frequency, &Remainder), 1000000000u);
(24) I understand this is being copied from elsewhere, but the
multiplication by 10^9 should be checked against overflow, using
SafeIntLib. If we have e.g. a 100 MHz clock, then passing in
Ticks=MAX_UINT64 will lead to an overflow. :(
OK, in V4, I will use the SafeIntLib.

+
+  //
+  // Ensure (Remainder * 1,000,000,000) will not overflow 64-bit.
+  // Since 2^29 < 1,000,000,000 = 0x3B9ACA00 < 2^30, Remainder should < 2^(64-30) = 2^34,
+  // i.e. highest bit set in Remainder should <= 33.
+  //
What we want here is (Remainder * 10^9 / Frequency), and in that we want
the product not to overflow 2^64 - 1. Thus, we need

  Remainder * 10^9 <= 2^64 - 1

  Remainder <= (2^64 - 1) / 10^9

The floor -- i.e., integral part -- of the RHS is 18,446,744,073
decimal, or 100_01001011_10000010_11111010_00001001 binary. The most
significant bit that is set is bit 34. But that's not a sufficient
condition, if we only consider the MSB in Remainder, because other bits
that should be clear (such as bit 33) could still be set. The sufficient
condition is thus that the MSB be <= 33. Therefore the comment is OK.

+  Shift        = MAX (0, HighBitSet64 (Remainder) - 33);
Quirky, but correct.

If Remainder is 0, then HighBitSet64 returns -1, and we compare -34 vs.
0 -- we get zero. OK, no shift needed.

Otherwise, if HighBitSet64 returns a value in the range [0..33], then we
compare [-33..0] vs 0 -- we get zero from MAX. OK, no shift needed.

Otherwise, HighBitSet64 returns a value from the range [34..63], then we
compare [1..30] against zero, and the former prevails (gets stored to
Shift). OK.

+  Remainder    = RShiftU64 (Remainder, (UINTN)Shift);
+  Frequency    = RShiftU64 (Frequency, (UINTN)Shift);
I *think* this approach will ensure that Frequency is not zero.
Remainder is strictly smaller than Frequency, and we shift out only so
many LSBs from Remainder (and Frequency) that (Remainder * 10^9) below
*just* not overflow. That means Remainder should not end up being zeroed
out, and so Frequency shouldn't either (the MSB of the original
Frequency is larger than or equal to the MSB of the original Remainder). OK.

+  NanoSeconds += DivU64x64Remainder (MultU64x32 (Remainder, 1000000000u), Frequency, NULL);
So the RHS here may give us a proper NanoSeconds increment, but how do
we know that the addition will not overflow NanoSeconds?

Here's an example:

  Ticks = 18,446,744,055,553,255,925 (0xFFFF_FFFB_C5CC_E9F5)
  Frequency = 999,999,999 (0x3B9A_C9FF)

(

The idea with this example is that dividing Ticks by Frequency and then
multiplying the result with 1,000,000,000 causes a *very slight* blow-up:

  Ticks / 999,999,999 * 1,000,000,000 =
  Ticks * (1,000,000,000 / 999,999,999)

in such a tricky way that this blow-up:

(a) *just* doesn't overflow the first part of the function (where we use
the integral multiples of Frequency), but

(b) does overflow the last part (where we use the remaining, fractional,
multiple of Frequency).

The way to achieve that, after setting Frequency to just below 1GHz for
the above blow-up, is to start with Ticks = MAX_UINT64, perform the
division, then maximize Remainder in comprison to the divisor (i.e.,
Frequency) -- increase Remainder to (Frequency-1) --, then recalculate
Ticks from that *backwards* (increase Ticks the same as we increased
Remainder).

)

And then this happens:

- the initial DivU64x64Remainder (Ticks, Frequency, &Remainder) call
returns 18,446,744,073 (0x4_4B82_FA09) as quotient, and sets Remainder
to 999,999,998 (0x3B9A_C9FE, binary 111011_10011010_11001001_11111110).

- the first MultU64x32() call produces 18,446,744,073,000,000,000 in
NanoSeconds (0xFFFF_FFFF_D5B5_1A00).

- Remainder's MSB is bit 29, thus Shift gets assigned 0.

- Multiplying Remainder by 1,000,000,000 produces
999,999,998,000,000,000 (0xDE0_B6B3_302E_6C00). No overflow, as intended.

- Dividing that product by Frequency produces 999,999,998 (0x3B9A_C9FE)
as the *increment* for NanoSeconds.

- Adding this increment (0x3B9A_C9FE) to NanoSeconds from earlier
(0xFFFF_FFFF_D5B5_1A00) *does* overflow: the result would be
0x1_0000_0000_114F_E3FE!

(25) This proves that, generally speaking, the final "+=" operation
(increment) is unsafe -- in every TimerLib instance in edk2 that uses
this calculation --, and should be replaced with an addition from
SafeIntLib.
Do you recommend changing the final "+=" to the SafeIntLib way? Something like: NanoSeconds = SafeUint64Add (NanoSeconds, DivU64x64Remainder (MultU64x32 (Remainder, 1000000000u), Frequency, NULL), &NanoSeconds).

I'm not asking for a tree-wide fix, but using SafeIntLib here would be nice.


+
+  return NanoSeconds;
+}
diff --git a/UefiCpuPkg/UefiCpuPkg.dsc b/UefiCpuPkg/UefiCpuPkg.dsc
index 074fd77461..8e34a9cd6b 100644
--- a/UefiCpuPkg/UefiCpuPkg.dsc
+++ b/UefiCpuPkg/UefiCpuPkg.dsc
@@ -205,5 +205,8 @@
   UefiCpuPkg/CpuTimerDxeRiscV64/CpuTimerDxeRiscV64.inf
   UefiCpuPkg/CpuDxeRiscV64/CpuDxeRiscV64.inf
 
+[Components.LOONGARCH64]
+  UefiCpuPkg/Library/BaseLoongArch64CpuTimerLib/BaseLoongArch64CpuTimerLib.inf
+
 [BuildOptions]
   *_*_*_CC_FLAGS = -D DISABLE_NEW_DEPRECATED_INTERFACES
(26) And then this should go into MdePkg.dec, according to points (4)
and (5) above.
Same to above.

Thanks
Laszlo





_._,_._,_

Groups.io Links:

You receive all messages sent to this group.

View/Reply Online (#111667) | | Mute This Topic | New Topic
Your Subscription | Contact Group Owner | Unsubscribe [rebecca@openfw.io]

_._,_._,_
--------------Shjr9PWLf6iNpIg2Nwo8fa1e--