From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received-SPF: Pass (sender SPF authorized) identity=mailfrom; client-ip=17.171.2.60; helo=ma1-aaemail-dr-lapp01.apple.com; envelope-from=afish@apple.com; receiver=edk2-devel@lists.01.org Received: from ma1-aaemail-dr-lapp01.apple.com (ma1-aaemail-dr-lapp01.apple.com [17.171.2.60]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id B08A32114B12E for ; Fri, 21 Sep 2018 16:57:42 -0700 (PDT) Received: from pps.filterd (ma1-aaemail-dr-lapp01.apple.com [127.0.0.1]) by ma1-aaemail-dr-lapp01.apple.com (8.16.0.22/8.16.0.22) with SMTP id w8LNqHD1056114; Fri, 21 Sep 2018 16:57:41 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=mime-version : content-transfer-encoding : content-type : sender : subject : from : in-reply-to : date : cc : message-id : references : to; s=20180706; bh=OTSwAc5unrM+1DTyWh/elm4nYisKgqNIXLxCjCY+cq4=; b=rte3LDXk8PjxAmDhdlHfCvpSxAlZsjPPk3P8JMNIo6wSm+DBr/aMmrQdE6D/Hxv22hzh iLBgVrvKFAf5Ir+oIRTo6doh9IwkG/VfUnBie+2dhG7L4IsEFDFDtuvK/H9HFqlKlSLA DqKmfGueGn3vUMrD1S2PjO81UBo7xbGb9c5KuA5tQ5uOG0eSYXDNXubAXxsStn01FMcC alxNtVDbkSYpp9XzcgOsS1M2hgMjTAQcTxz/NkurN9DBO8sNSaeHLeadY9AgjFjYcioj whKHxae2IxNv0vBcu7c6PlosdDz+dRkQnIzorXX/acxoxj5NBdoQjZWPAMAqFphORrT2 jg== Received: from mr2-mtap-s02.rno.apple.com (mr2-mtap-s02.rno.apple.com [17.179.226.134]) by ma1-aaemail-dr-lapp01.apple.com with ESMTP id 2mmkme7eve-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Fri, 21 Sep 2018 16:57:40 -0700 MIME-version: 1.0 Received: from ma1-mmpp-sz07.apple.com (ma1-mmpp-sz07.apple.com [17.171.128.149]) by mr2-mtap-s02.rno.apple.com (Oracle Communications Messaging Server 8.0.2.3.20180614 64bit (built Jun 14 2018)) with ESMTPS id <0PFF0002DJW35QB0@mr2-mtap-s02.rno.apple.com>; Fri, 21 Sep 2018 16:57:40 -0700 (PDT) Received: from process_viserion-daemon.ma1-mmpp-sz07.apple.com by ma1-mmpp-sz07.apple.com (Oracle Communications Messaging Server 8.0.2.3.20180614 64bit (built Jun 14 2018)) id <0PFF00000J3CUL00@ma1-mmpp-sz07.apple.com>; Fri, 21 Sep 2018 16:57:39 -0700 (PDT) X-Va-A: X-Va-T-CD: 61f6266009e1b0968824c29767d0458c X-Va-E-CD: 0788143b0d1c00296b451999d6e22fea X-Va-R-CD: 909d0c227f797428a7567b2a7d4338de X-Va-CD: 0 X-Va-ID: 0a4b5fa3-4dc5-4950-91f1-36c153a36500 X-V-A: X-V-T-CD: 13268bd1ea7570585d6e6a5a5bf3a652 X-V-E-CD: 0788143b0d1c00296b451999d6e22fea X-V-R-CD: 909d0c227f797428a7567b2a7d4338de X-V-CD: 0 X-V-ID: b3626e7d-4ff6-49d9-8398-51fa4188c7ad Received: from process_milters-daemon.ma1-mmpp-sz07.apple.com by ma1-mmpp-sz07.apple.com (Oracle Communications Messaging Server 8.0.2.3.20180614 64bit (built Jun 14 2018)) id <0PFF00400JI8J700@ma1-mmpp-sz07.apple.com>; Fri, 21 Sep 2018 16:57:39 -0700 (PDT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-09-21_10:,, signatures=0 X-Proofpoint-Scanner-Instance: ma-grpmailp-qapp22.corp.apple.com-10000_instance1 Received: from [17.234.198.76] (unknown [17.234.198.76]) by ma1-mmpp-sz07.apple.com (Oracle Communications Messaging Server 8.0.2.3.20180614 64bit (built Jun 14 2018)) with ESMTPSA id <0PFF00D27JW17X60@ma1-mmpp-sz07.apple.com>; Fri, 21 Sep 2018 16:57:39 -0700 (PDT) Sender: afish@apple.com From: Andrew Fish In-reply-to: Date: Fri, 21 Sep 2018 16:57:21 -0700 Cc: Bill Paul , "edk2-devel@lists.01.org" Message-id: <4F31CED5-DCB4-4834-A0F2-52D410C2F5BF@apple.com> References: To: Vladimir Olovyannikov X-Mailer: Apple Mail (2.3445.6.18) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-09-21_10:, , signatures=0 Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an Armv8 platform X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2018 23:57:43 -0000 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII > On Sep 21, 2018, at 4:24 PM, Vladimir Olovyannikov wrote: > > On Thu, Sep 20, 2018 at 2:52 PM Vladimir Olovyannikov > wrote: >> >> On Wed, Sep 19, 2018 at 5:21 PM Bill Paul wrote: >>> >>> Of all the gin joints in all the towns in all the world, Vladimir >>> Olovyannikov >>> had to walk into mine at 16:58 on Wednesday 19 September 2018 and say: >>> >>>>> From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] >>>>> Sent: Wednesday, September 19, 2018 4:38 PM >>>>> To: Vladimir Olovyannikov >>>>> Cc: edk2-devel@lists.01.org >>>>> Subject: Re: Stack issue after warm UEFI reset and MMU enabling on an >>>>> Armv8 platform >>>>> >>>>> >>>>> On 19 September 2018 at 15:55, Vladimir Olovyannikov >>>>> >>>>> wrote: >>>>>> Hi All, >>>>>> >>>>>> I need UEFI experts help on the problem with Armv8 board on warm UEFI >>>>>> reset. >>>>>> Cold reset works fine. >>>>>> >>>>>> Here is how I set up a warm reset: >>>>>> >>>>>> STATIC >>>>>> EFI_STATUS >>>>>> ShutdownUefiBootServices ( >>>>>> >>>>>> VOID >>>>>> ) >>>>>> >>>>>> { >>>>>> >>>>>> EFI_STATUS Status; >>>>>> UINTN MemoryMapSize; >>>>>> EFI_MEMORY_DESCRIPTOR *MemoryMap; >>>>>> UINTN MapKey; >>>>>> UINTN DescriptorSize; >>>>>> UINT32 DescriptorVersion; >>>>>> UINTN Pages; >>>>>> >>>>>> MemoryMap = NULL; >>>>>> MemoryMapSize = 0; >>>>>> Pages = 0; >>>>>> >>>>>> do { >>>>>> >>>>>> Status = gBS->GetMemoryMap ( >>>>>> >>>>>> &MemoryMapSize, >>>>>> MemoryMap, >>>>>> &MapKey, >>>>>> &DescriptorSize, >>>>>> &DescriptorVersion >>>>>> ); >>>>>> >>>>>> if (Status == EFI_BUFFER_TOO_SMALL) { >>>>>> >>>>>> Pages = EFI_SIZE_TO_PAGES (MemoryMapSize) + 1; >>>>>> MemoryMap = AllocatePages (Pages); >>>>>> >>>>>> // >>>>>> // Get System MemoryMap >>>>>> // >>>>>> Status = gBS->GetMemoryMap ( >>>>>> >>>>>> &MemoryMapSize, >>>>>> MemoryMap, >>>>>> &MapKey, >>>>>> &DescriptorSize, >>>>>> &DescriptorVersion >>>>>> ); >>>>>> >>>>>> } >>>>>> >>>>>> // Don't do anything between the GetMemoryMap() and >>>>>> ExitBootServices() if (!EFI_ERROR(Status)) { >>>>>> >>>>>> Status = gBS->ExitBootServices (gImageHandle, MapKey); >>>>>> if (EFI_ERROR(Status)) { >>>>>> >>>>>> FreePages (MemoryMap, Pages); >>>>>> MemoryMap = NULL; >>>>>> MemoryMapSize = 0; >>>>>> >>>>>> } >>>>>> >>>>>> } >>>>>> >>>>>> } while (EFI_ERROR(Status)); >>>>>> >>>>>> return Status; >>>>>> >>>>>> } >>>>>> >>>>>> Then perform >>>>>> ArmCleanDataCache (); >>>>>> ArmInvalidateDataCache (); >>>>>> ArmDisableInstructionCache (); >>>>>> ArmInvalidateInstructionCache (); >>>>> >>>>> These don't do anything useful on ARM. You can only reliably perform >>>>> cache >>>>> maintenance by virtual address. >>>> >>>> So, should I just remove them altogether? >>>> >>>>>> ArmDisableMmu (); >>>>> >>>>> ... so after this call returns, all bets are off with regards to >>>>> whether >>>>> what is popped from the stack is actually what we pushed when we >>>>> entered >>>>> the function. >>>> >>>> OK, thank you for explanation. >>>> But this call returns back into ResetLib implementation as it should, >>>> and >>>> then there is a direct jump to the start of FV. >>>> Am I doing anything wrong here? >>>> Then, up to the point of enabling of MMU the stack is OK. But right >>>> after >>>> enabling MMU it points at _ModuleEntryPoint end of function in >>>> DxeCoreEntryPoint.c >>>> Am I missing anything? Maybe some stack cleanup before jumping to the >>>> start >>>> of FV? >>> >>> When the MMU is enabled, does the mapping for the stack pages change? That >>> is, >>> could the stack now be mapped to different physical page now? >> Thanks for ideas Bill, >> No, the mapping stays the same. >> The issue is only with warm reset, and only on an A72 board. >> There is another platform on A53 sharing the same code, which has no issues >> with warm reset. >> I cannot explain why. >>> >>> Instead of showing a stack trace, can you dump the stack pages and compare >>> the >>> before and after contents? >> I can clearly see that before and after contents are different. >>> >>> Assuming the same physical memory pages are still being used, then there >>> could >>> be a cache flushing problem. What could happen is: >>> >>> - some stack memory has been touched recently and is now in the data cache >>> - changes are made, which are written to the cache, but not yet flushed >>> out to >>> RAM >>> - enabling the MMU causes a full invalidate of the cache >>> >>> Now when you look at the stack, you see the earlier contents that were in >>> RAM >>> -- the changes previously only written to the cache have been lost. >>> >>> Enabling/disabling caches and MMU is always tricky. I can't say for sure, >>> but >>> I wouldn't be surprised if there's some subtle bug that causes a flush >>> operation to be missed and things may just work by coincidence in the cold >>> start case. >> I might be missing something preparing for warm reset. >> Disabling interrupts does not help though. >> Ard, I switched off all DMA-capable devices, so am just booting into UEFI >> with no disks or network, >> disabled interrupts. The issue is here. Any ideas on how to debug it and >> fix? > Update: when I add this as an experiment: > UINTN StackBottom; > > __asm__ volatile ("mov %0,SP" : "=r"(StackBottom)); > WriteBackInvalidateDataCacheRange((VOID *)StackBottom, > PcdGet64(PcdSystemMemorySize) + > PcdGet64(PcdSystemMemoryBase) - StackBottom); > then the stack data were not corrupted anymore on the next ArmEnableMmu() call, > and warm reset worked (though unreliably, can throw exception on > memcpy down the road on UEFI boot; probably because I invalidated only > the stack area in the experiment). > Considering that A53 board does not have issues, does this means that > ArmInvalidateDataCache() implementation is useless for A72? > Based on this, should the approach be "find all data regions and > invalidate them using InvalidateDataCacheRange()"? Vladimir, We hit an issue like this a while back on x86 and it turned out our sequence was dependent on the C compiler code generation not touching a specific register. It might be worth while to disassemble this code and take a look at the assembler. I'd also point out the __asm__ volatile is a serializing event to the C memory model so it could change how the C code was optimized. You could try __asm__ volatile with a no-op instruction to see if it really is the SP read at this point that fixes the issue. But it is likely the bug will be more obvious if you look at the assembly. Thanks, Andrew Fish >>> >>> -Bill >>> >>>>>> Then jump to start of FV: >>>>>> >>>>>> typedef >>>>>> VOID >>>>>> >>>>>> (EFIAPI *START_FV)( >>>>>> >>>>>> VOID >>>>>> >>>>>> ); >>>>>> StartOfFv = (START_FV)(UINTN)PcdGet64(PcdFvBaseAddress); >>>>>> StartOfFv (); >>>>>> >>>>>> Now this is what happens on warm reset: >>>>>> reset -c warm >>>>>> 1. Until ArmEnableMmu() gets called, everything works as expected. >>>>>> >>>>>> Here is the stack right before ArmEnableMmu() is called: >>>>>> ArmConfigureMmu+0x4f8 >>>>>> InitMmu+0x24 >>>>>> MemoryPeim+0x440 >>>>>> PrePiMain+0x114 >>>>>> PrimaryMain+0x68 >>>>>> CEntryPoint+0xC4 >>>>>> EL2:0x00000000800008BC >>>>>> ----- End of stack info ----- >>>>>> >>>>>> 2. Here is the stack as soon as Mmu is enabled with ArmEnableMmu() : >>>>>> ArmConfigureMmu+0x4fc <-- This one is correct, at line 745 in >>>>>> >>>>>> ArmConfigureMmu() in ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibCore.c >>>>>> (return EFI_SUCCESS) >>>>>> >>>>>> _ModuleEntryPoint+0x24 <-- Wrong. This points directly to >>>>>> >>>>>> ASSERT(FALSE); and to CpuDeadLoop() in DxeCoreEntryPoint.c, lines >>>>>> 59-60. >>>>>> >>>>>> El2:0x000000008E5E8300 <-- Absolutely bogus >>>>>> >>>>>> --- End of stack info --- >>>>>> >>>>>> So, as soon as ArmEnableMmu() exits, execution jumps directly to >>>>>> CpuDeadLoop() in DxeCoreEntryPoint of _ModuleEntryPoint(). >>>>>> >>>>>> Would be grateful for any advice. >>>>>> >>>>>> Thank you, >>>>>> Vladimir >>>> >>>> _______________________________________________ >>>> edk2-devel mailing list >>>> edk2-devel@lists.01.org >>>> https://lists.01.org/mailman/listinfo/edk2-devel >>> -- >>> ============================================================================= >>> -Bill Paul (510) 749-2329 | Senior Member of Technical Staff, >>> wpaul@windriver.com | Master of Unix-Fu - Wind River >>> Systems >>> ============================================================================= >>> "I put a dollar in a change machine. Nothing changed." - George Carlin >>> ============================================================================= > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel