From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.groups.io with SMTP id smtpd.web10.50812.1654011173544128585 for ; Tue, 31 May 2022 08:32:53 -0700 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=V7Ya9l1p; spf=pass (domain: redhat.com, ip: 170.10.129.124, mailfrom: kraxel@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1654011172; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=OO6XGGphJa4uj6wDh94wXOXTP7X+v9+3AbZaA+UQnfA=; b=V7Ya9l1pgi63bKw37mWKw8vMKNsXo9FLv2I5wPvisigp1e4VWqqJQzUkVDL6v4oRXBJBbu wCuTTYMSwzms8aml/XEYymOpKK6kap965gSiASS71exautZaFjiBJmjUD5ITRpjf+y6vbR prg0ivoqXVLtlAAlkYNcFrx5Y/c9cIg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-43-xmeFD_--M8mMeSbaXoRzNA-1; Tue, 31 May 2022 11:32:49 -0400 X-MC-Unique: xmeFD_--M8mMeSbaXoRzNA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1299A9054A8; Tue, 31 May 2022 15:32:16 +0000 (UTC) Received: from sirius.home.kraxel.org (unknown [10.39.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7F6A12166B26; Tue, 31 May 2022 15:32:15 +0000 (UTC) Received: by sirius.home.kraxel.org (Postfix, from userid 1000) id 7A56B180039C; Tue, 31 May 2022 17:32:06 +0200 (CEST) Date: Tue, 31 May 2022 17:32:06 +0200 From: "Gerd Hoffmann" To: "Ni, Ray" Cc: "devel@edk2.groups.io" , "Liu, Zhiguang" , "Dong, Guo" , "You, Benjamin" , "Rhodes, Sean" Subject: Re: [edk2-devel] [PATCH] UefiPayloadPkg: Always split page table entry to 4K if it covers stack. Message-ID: <20220531153206.fkq442gz4divb6xa@sirius.home.kraxel.org> References: <20220531053937.19696-1-zhiguang.liu@intel.com> <20220531074513.fciegyxkrgiwwqem@sirius.home.kraxel.org> <20220531112147.pvy4d6vetsgsqduu@sirius.home.kraxel.org> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=kraxel@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi, > yes:) Actually there is no split at all. The 4K page table is created in the very beginning(before setting to cr3). > So, no TLB cache issue at all. > > I think doing a linux-style page split will be the more robust solution. > > Thanks for explaining the linux behavior. > > Intel's SDM also contain below wordings: > * As noted in Section 4.10.2, the TLBs may subsequently contain multiple translations for the address range if > * software modifies the paging structures so that the page size used for a 4-KByte range of linear addresses > * changes. A reference to a linear address in the address range may use any of these translations. This is probably the section the "only safe if [ ... ] the two entries [ ... ] identical" part refers to. > * Software wishing to prevent this uncertainty should not write to a paging-structure entry in a way that would > * change, for any linear address, both the page size and either the page frame, access rights, or other attributes. > * It can instead use the following algorithm: first clear the P flag in the relevant paging-structure entry (e.g., > * PDE); then invalidate any translations for the affected linear addresses (see above); and then modify the > * relevant paging-structure entry to set the P flag and establish modified translation(s) for the new page size. So linux basically implements this recommendation. > But I still have some doubts about using linux-style page split. > Because it's marked as not present: > 1. Active code should not access data in the 2M region (stack is in the 2M region in our case) > 2. Active code should not in the 2M region (how to guarantee that?) > > How does Linux guarantee the above two points? Easy. It's kernel code changing mappings for userspace, so no need to worry about code removing its own mappings in the first place. It's a different story for edk2 though ... Can this be covered by the page fault handler? Update the entry like the current code does, except for clearing the present bit, then flush tlb, then set the present bit. In case we take a page fault the only action the handler must do is enable the present bit, which might even be possible to do without additional state tracking. Linux most likely has something simliar in the page fault handler. Linux needs it for a different reason, it must handle SMP races. When temporary clearing the present bit linux might get a page fault on *another* cpu which runs userspace code touching the page being updated. take care, Gerd