From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=lersek@redhat.com; receiver=edk2-devel@lists.01.org Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id C85D021B00DC0 for ; Fri, 10 Nov 2017 01:25:20 -0800 (PST) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7173DA7E5; Fri, 10 Nov 2017 09:29:22 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-120-145.rdu2.redhat.com [10.10.120.145]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0E1C464175; Fri, 10 Nov 2017 09:29:20 +0000 (UTC) To: Ard Biesheuvel Cc: Leif Lindholm , "edk2-devel@lists.01.org" , "Gao, Liming" References: <20171103113352.8604-1-ard.biesheuvel@linaro.org> <20171105055245.xbicmlagfeu7xt2o@bivouac.eciton.net> From: Laszlo Ersek Message-ID: <74fd1f40-70d9-ffd9-01c3-d628efb4dd44@redhat.com> Date: Fri, 10 Nov 2017 10:29:19 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 10 Nov 2017 09:29:22 +0000 (UTC) Subject: Re: [PATCH v2] ArmPlatformPkg/PrePeiCore: seed temporary stack before entering PEI core X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Nov 2017 09:25:21 -0000 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit On 11/09/17 22:11, Ard Biesheuvel wrote: > On 7 November 2017 at 18:13, Ard Biesheuvel wrote: >> On 7 November 2017 at 18:09, Laszlo Ersek wrote: >>> On 11/05/17 17:29, Ard Biesheuvel wrote: >>>> On 5 November 2017 at 16:27, Ard Biesheuvel wrote: >>>>> On 5 November 2017 at 05:52, Leif Lindholm wrote: >>>>>> On Fri, Nov 03, 2017 at 11:33:52AM +0000, Ard Biesheuvel wrote: >>>>>>> DEBUG builds of PEI code will print a diagnostic message regarding >>>>>>> the utilization of temporary RAM before switching to permanent RAM. >>>>>>> For example, >>>>>>> >>>>>>> Total temporary memory: 16352 bytes. >>>>>>> temporary memory stack ever used: 4820 bytes. >>>>>>> temporary memory heap used for HobList: 4720 bytes. >>>>>>> >>>>>>> Tracking stack utilization like this requires the stack to be seeded >>>>>>> with a known magic value, and this needs to occur before entering C >>>>>>> code, given that it uses the stack. Currently, only Nt32Pkg appears >>>>>>> to implement this feature, but it is useful nonetheless, so let's >>>>>>> wire it up for PrePeiCore as well. >>>>>>> >>>>>>> Ref: https://bugzilla.tianocore.org/show_bug.cgi?id=748 >>>>>>> Contributed-under: TianoCore Contribution Agreement 1.1 >>>>>>> Signed-off-by: Ard Biesheuvel >>>>>> >>>>>> OK, this may sound completely unreasonable, but seeing those >>>>>> implementations overwrite callee-saved registers without saving them >>>>>> makes my brain unhappy. (Yes, I know.) >>>>>> >>>>>> Could they either: >>>>>> - Have a comment prepended establishing the implicit ABI of which >>>>>> registers the caller cannot rely on reusing after return. >>>>>> Preferably somewhat echoed at the call site. >>>>>> - Be rewritten to use only scratch registers? >>>>>> >>>>> >>>>> I think it is implied that the startup code does not adhere to the >>>>> AAPCS. That code already uses r5 and r6 without stacking them, simply >>>>> because we're in the middle of preparing the stack and other execution >>>>> context, precisely so the C code we call into can rely on AAPCS >>>>> guarantees. >>>> >>>> >>>> Ehm, hold on, what do you mean by 'call site'? This code just runs and >>>> jumps back to a local label. There are no functions calls here until >>>> the point where we call into C (with the exception of the lovely >>>> ArmPlatformPeiBootAction() we added so Juno can find out how much DRAM >>>> it can use) >>> >>> Please continue the discussion with Leif on this; from my side, I'm >>> happy with the patch (I've sort of deduced what the assembly code does, >>> also relying on your v1 notes). >>> >>> The only eyebrow-raising part was: >>> >>> + MOV64 (x9, FixedPcdGet32 (PcdInitValueInTempStack) |\ >>> + FixedPcdGet32 (PcdInitValueInTempStack) << 32) >>> >>> where we left-shift a constant that is "in theory" UINT32 by 32 binary >>> places, using the << operator. In C that would be undefined behavior, >>> but this is assembly, so what do I know? ¯\_(ツ)_/¯ >>> >>> Acked-by: Laszlo Ersek >>> >> >> Thanks. And you're right, this is not C so no need to worry about that. >> >>> ( >>> >>> By the way, just to see if I remember correctly, isn't STP: >>> >>> +0:stp x9, x9, [x8], #16 >>> >>> the kind of instruction that modifies multiple operands at once, and so >>> if it faults, it cannot be virtualized well? (Because the syndrome >>> register or whatever does not tell the VMM the whole picture about the >>> fault?) >>> >>> Totally irrelevant here, I'm just curious. >>> >> >> STP == STore Pair, and so it stores the values in the registers to >> memory. The only register that gets modified here is x8, due to the >> post-increment. >> > > ... which actually doesn't mean it is not affected by the same issue. > > The reason such instructions are more difficult to virtualize is that > it requires KVM to decode the instruction, rather than read the > syndrome registers that can tell it which register we intended to > read/write from. So it is in fact perfectly feasible to virtualize it, > but the KVM authors just haven't bothered yet. Hm, I'm slightly curious if and how this differs from x86 KVM :) In x86 KVM there are huge instruction tables for emulation etc. Anyway I'm happy this patch is now committed! Thanks! Laszlo > >> But its converse >> >> LDP , , [], # >> >> is indeed such an instruction, given that it modifies three registers >> at once, and so the registers that encode the exception run out of >> space. Note that this only affects virtualized MMIO.