public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: "Ni, Ray" <ray.ni@intel.com>
To: Gerd Hoffmann <kraxel@redhat.com>,
	"devel@edk2.groups.io" <devel@edk2.groups.io>
Cc: "Liu, Zhiguang" <zhiguang.liu@intel.com>,
	"Dong, Guo" <guo.dong@intel.com>,
	"You, Benjamin" <benjamin.you@intel.com>,
	"Rhodes, Sean" <sean@starlabs.systems>
Subject: Re: [edk2-devel] [PATCH] UefiPayloadPkg: Always split page table entry to 4K if it covers stack.
Date: Tue, 31 May 2022 13:24:56 +0000	[thread overview]
Message-ID: <MWHPR11MB1631167EFDC501F018DF0B018CDC9@MWHPR11MB1631.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20220531112147.pvy4d6vetsgsqduu@sirius.home.kraxel.org>

> > I am not quite sure how Linux handles such case?
> 
> Oh, lovely.  CPU bugs lurking indeed.  linux has this longish comment
> (see mm/huge_memory.c, in the middle of the __split_huge_pmd_locked()
> function):
> 
>         /*
>          * Up to this point the pmd is present and huge and userland has the
>          * whole access to the hugepage during the split (which happens in
>          * place). If we overwrite the pmd with the not-huge version pointing
>          * to the pte here (which of course we could if all CPUs were bug
>          * free), userland could trigger a small page size TLB miss on the
>          * small sized TLB while the hugepage TLB entry is still established in
>          * the huge TLB. Some CPU doesn't like that.
>          * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
>          * 383 on page 105. Intel should be safe but is also warns that it's
>          * only safe if the permission and cache attributes of the two entries
>          * loaded in the two TLB is identical (which should be the case here).
>          * But it is generally safer to never allow small and huge TLB entries
>          * for the same virtual address to be loaded simultaneously. So instead
>          * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
>          * current pmd notpresent (atomically because here the pmd_trans_huge
>          * must remain set at all times on the pmd until the split is complete
>          * for this pmd), then we flush the SMP TLB and finally we write the
>          * non-huge version of the pmd entry with pmd_populate.
>          */
> 
> So linux goes 2M -> not present -> 4K instead of direct 2M -> 4K (and
> does the tlb flush in the not present state), which apparently is needed
> on some CPUs to avoid confusing the tlb cache.
> 
> > Before that's fully understood, we think the page table split for
> > stack does no harm to the functionality and code complexity. That's
> > why we choose this fix first.
> 
> So this basically splits the page right from the start instead of doing
> it later when page attributes are changed.  Which probably avoids the
> huge page landing in the tlb cache, which in turn avoids triggering the
> issues outlined above.

yes:) Actually there is no split at all. The 4K page table is created in the very beginning(before setting to cr3).
So, no TLB cache issue at all.

> 
> I think doing a linux-style page split will be the more robust solution.

Thanks for explaining the linux behavior.

Intel's SDM also contain below wordings:
* As noted in Section 4.10.2, the TLBs may subsequently contain multiple translations for the address range if
* software modifies the paging structures so that the page size used for a 4-KByte range of linear addresses
* changes. A reference to a linear address in the address range may use any of these translations.
* Software wishing to prevent this uncertainty should not write to a paging-structure entry in a way that would
* change, for any linear address, both the page size and either the page frame, access rights, or other attributes.
* It can instead use the following algorithm: first clear the P flag in the relevant paging-structure entry (e.g.,
* PDE); then invalidate any translations for the affected linear addresses (see above); and then modify the
* relevant paging-structure entry to set the P flag and establish modified translation(s) for the new page size.

But I still have some doubts about using linux-style page split.
Because it's marked as not present:
1. Active code should not access data in the 2M region (stack is in the 2M region in our case)
2. Active code should not in the 2M region (how to guarantee that?)

How does Linux guarantee the above two points?

Thanks,
Ray

  reply	other threads:[~2022-05-31 13:25 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-31  5:39 [PATCH] UefiPayloadPkg: Always split page table entry to 4K if it covers stack Zhiguang Liu
2022-05-31  7:45 ` [edk2-devel] " Gerd Hoffmann
2022-05-31  8:03   ` Ni, Ray
2022-05-31 11:21     ` Gerd Hoffmann
2022-05-31 13:24       ` Ni, Ray [this message]
2022-05-31 15:32         ` Gerd Hoffmann
2022-06-08  0:49           ` Ni, Ray
2022-06-08  6:52             ` Gerd Hoffmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR11MB1631167EFDC501F018DF0B018CDC9@MWHPR11MB1631.namprd11.prod.outlook.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox