From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mx.groups.io with SMTP id smtpd.web10.49163.1654003501383467102 for ; Tue, 31 May 2022 06:25:01 -0700 Authentication-Results: mx.groups.io; dkim=fail reason="unable to parse pub key" header.i=@intel.com header.s=intel header.b=mZade2bj; spf=pass (domain: intel.com, ip: 192.55.52.120, mailfrom: ray.ni@intel.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1654003501; x=1685539501; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=VBIYgYQGvsE+18tiztNSNQWhuHXWhO2wIBhFCBgdKn0=; b=mZade2bjr0o5FEHKUyKOkBq85O0mz/0sv+CuQAb1dChPfuu7fQyurplN j+Ab5zjsjNoPT8KrI2LZ35/bxxUjmokxMRG5ZIQ+hRK9DiYZZzLlF51Pt mh0cuhW5qxYuA1/g95X8EO7HRUOpuX/RkmCSk5IHgAW9pnehncosmgWXB 1Zwey2EvdUtIZ/Z/NoZTLkXwLN0WJgAfa4ZysC52s9HW8P2UWKWZJnxbe wh3RPCx+hg5VzNebOpnPW5IdGqylZTkSm+P3ldlyGdW9CPObfzE06Yals spo5krYvu3XLd6+4LRfDTvxYNE9T+zKY/ckLk1VWHZ3bXa8/bhrrE7d5h A==; X-IronPort-AV: E=McAfee;i="6400,9594,10363"; a="274055051" X-IronPort-AV: E=Sophos;i="5.91,265,1647327600"; d="scan'208";a="274055051" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 May 2022 06:25:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.91,265,1647327600"; d="scan'208";a="720304370" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by fmsmga001.fm.intel.com with ESMTP; 31 May 2022 06:24:59 -0700 Received: from orsmsx608.amr.corp.intel.com (10.22.229.21) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27; Tue, 31 May 2022 06:24:59 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx608.amr.corp.intel.com (10.22.229.21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.27 via Frontend Transport; Tue, 31 May 2022 06:24:59 -0700 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (104.47.74.41) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2308.27; Tue, 31 May 2022 06:24:59 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=do8MIdPRDbv4qF5rMgwSM95nBUvzgQezjXYSL1HT5avVG4CE3gbyzuCzjiSx+8pKYHNKN3QKZJUG3esEUlWU2ZZRujbc359zT7te+8CK010iskZpQPt3L5idCQbfHxsWlucoWNps4zCW/NKzFGYPh6nwnRYLLcFTqyZcCPYvroQ+h8roNidcGI9EL6jw2X5uY3BhDmmsqElnSAOXKVvRj8m2FeiD2CGpF/Oj5G3A1a3EZ8oWr7+EpCSTK6RRVE4h0tMy/uuU84WI3ZEv1tTffQwFv6vlJqwJWS43MnCgzos8vhy5Zxv5g49PyPvFUuKLwrCwgR1Iuaw1elDpfZFxiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=RpDOlu0pdkWiLlNV17sc1L313tvKBHNWIBgMk/tb42U=; b=a7O2W7XZ/70syFuZ6wo0cd+kNUbvnJee96+PLR70kcJF4zX0dnne5686aFFE4qGTh36u+TmHPstWwTTIPUe6UN6ibNt/6T5RgtpeDopMM9xwvBM3XxMg8nDgreSDPt0uaSHJB/BCcQyQHEYrgAbco/+JvafHH0ESB5nDs4Ad6tnWwW488pwvSnSOGS8JkoADSh0+QLH/XLar+e2qGCXWTMgLiM9g3D0S03/hbI3+WZuOdddmA2rGfPD5U8TV3ENLudniwShnjKX6HdMXwHo1Xm7RdM8gzTjllbIwCAMSqcfmCPlLIpN52SSRirUEAXphRrySBQHar8qfQkhAJalsRg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MWHPR11MB1631.namprd11.prod.outlook.com (2603:10b6:301:10::10) by SN6PR11MB3200.namprd11.prod.outlook.com (2603:10b6:805:ba::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5293.13; Tue, 31 May 2022 13:24:56 +0000 Received: from MWHPR11MB1631.namprd11.prod.outlook.com ([fe80::4501:93e1:b65d:684c]) by MWHPR11MB1631.namprd11.prod.outlook.com ([fe80::4501:93e1:b65d:684c%11]) with mapi id 15.20.5293.019; Tue, 31 May 2022 13:24:56 +0000 From: "Ni, Ray" To: Gerd Hoffmann , "devel@edk2.groups.io" CC: "Liu, Zhiguang" , "Dong, Guo" , "You, Benjamin" , "Rhodes, Sean" Subject: Re: [edk2-devel] [PATCH] UefiPayloadPkg: Always split page table entry to 4K if it covers stack. Thread-Topic: [edk2-devel] [PATCH] UefiPayloadPkg: Always split page table entry to 4K if it covers stack. Thread-Index: AQHYdLDf+pu1rJsysECnhNZXCbSH3q04m1eAgAACBSCAADp9gIAAHPFg Date: Tue, 31 May 2022 13:24:56 +0000 Message-ID: References: <20220531053937.19696-1-zhiguang.liu@intel.com> <20220531074513.fciegyxkrgiwwqem@sirius.home.kraxel.org> <20220531112147.pvy4d6vetsgsqduu@sirius.home.kraxel.org> In-Reply-To: <20220531112147.pvy4d6vetsgsqduu@sirius.home.kraxel.org> Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-version: 11.6.500.17 dlp-reaction: no-action dlp-product: dlpe-windows authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 7a15d892-ffb5-4636-1ea3-08da4308f457 x-ms-traffictypediagnostic: SN6PR11MB3200:EE_ x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: pW6fqoXZJxm74Lnvyxlo8H4eo+f4xeenEn0y6zg0YJucfHqjDAm9Z3aUkVYR2RB7GaN3qy5kM8OJwYc1DE0CfnWdez2dn15i5hnd/mrbEjC1FLKYZUCXhvvjZL+16VQjlVjHLSK2rnUyBI0tb0G7klrK6xzv42QzdWx8O5Tn7a7MExrJOw5+gkfG4L4dnoSO286xKDu9UPeEugBrLm9h3NPhVS7tEZ7Mq+d5umqPT5DKWXP6ZvUkJiGBjlsZTvr83QM5mnTodOD7Hk5YkI54BjcHSUWZRC3tuTwPytY3TthcmHVMR6e5mIsiuUtsXDKAeDb2Ii4WEpGlL3Mw5J9WHg1FP6hJpMXlOVX4BxXiGSmsgmYhaMo+H1cGTrehukINfkO6PUFXoDJMyL6hEuQTnH/gBaYrqYfBzUgxfNmOK7qjA1tKMnoVGg2RcS/rTiFD0FcXZETef3XeCl/Y+W0UvBRA7VncqclAaBfVVyHpj7VmbLqThclz+//MFomuTnlVg6seNeQ8EJJseaz1DxlJ/Vs6pss5Rp6p+WhSk4fYl/FY3+V7OVGUIqvggClkjGgyMebURX9dQknlymPg38M2AYSKriehvvhBp6u3gu38l88KyHboQ4eo+9SdrZ6eVk5a++i3yQQiMVTo7U1a3HPJt0EAvzlbRKkYDrnUHmc5woov1V2QtAwbKGz9zucXq1fxABf61Lvlx9jTl2EqI0NTeepdljDFeuJw9Va6tWpsO+X0udT3TcY7SBWRq43eat57kPKwDBTrvdbxN3Whwji5PigBoEpwfgX/iJriI3AqHGE= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MWHPR11MB1631.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230001)(366004)(66446008)(64756008)(66476007)(33656002)(508600001)(110136005)(66946007)(66556008)(186003)(966005)(76116006)(122000001)(83380400001)(4326008)(55016003)(38070700005)(2906002)(38100700002)(86362001)(82960400001)(9686003)(26005)(7696005)(6506007)(71200400001)(52536014)(5660300002)(8936002)(316002)(8676002)(54906003);DIR:OUT;SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?xZEadUU7yaX3WvJHXvYsh/EXDs4RzYt20wP6OMflV8gnzCoHjP3rDVqv1CSV?= =?us-ascii?Q?fqwL0SBkQKXp38GrPC649p7dK/K/qAbb4nngW+rqHjelFf7lJPv36c1rkvB9?= =?us-ascii?Q?b9jKd7wqKqW87IWWITsKPxIwUcQmi6Vegn6ZfeZnLFyyZ/3dFG7Pok56AOTL?= =?us-ascii?Q?Gb8Brz9wNdU6KAVSGn6m4GlgV9qQEbLVBi3UlglujXuch92iOiPh9hxq442C?= =?us-ascii?Q?Wrq8VWZa0Yr6Li49owYBi+RreY31+Gv6seXhD1msLbOC5CmxOeyViuXprS53?= =?us-ascii?Q?FJOeLQpsgOMsOgeWkC2yyvq5DsFB+4cWNhK2cDgF4U/l9hbcnLWa6HXqC5I/?= =?us-ascii?Q?A7J8opYE0QEi/H/S1MR8+HBSu6XvpLGzAvA5atJVwsqjyBMwDW/L/R36XkFf?= =?us-ascii?Q?3cA2TKNMWmiseJvAgsSiCGXfrr8avJH4nkT2ASs36hXGqT2B8a9b2ULSx1Iv?= =?us-ascii?Q?lKf5/QdzyjQR4BwHtp2IELKP+jcCnqed1NVq/PIlQWxA/ZtYuSIl+7BLHQJE?= =?us-ascii?Q?uPrWoq12MVjAAEh6fPulJ18o1a+mhNWMpuOByvWnyLoXTtXWcyUClaZ0teUd?= =?us-ascii?Q?fkdLVYazIO5oJ5OYGxyt3ddsxwkfFLdAQepEKnc4qJlOziCtUz9WATRC+nsr?= =?us-ascii?Q?Aj1KQ0n0LiFpJjQaci0HLrCW0MWr/bzkKRbV2mvboMf2YyHWZ++oG19IEbIW?= =?us-ascii?Q?lmN2167RtsXwDU5wPAQBvbmdFAiNdxkdj41S/IXq4aYhM/CqGDaDbA1e2ga9?= =?us-ascii?Q?HmwjbQfca7hZQLbbvRHikH/7XMsBaqCvdruvEVIGuc8lQw45I6tbL/s4ap0L?= =?us-ascii?Q?FlM1YMmK4lUA+lbywhpEXtT7gtD0vep1nayEOtoGYZmAv2smUZzZocEZxPu+?= =?us-ascii?Q?mzE46d/kZx8G1x8RHkSE/4XLTVxYQrYdzdaKgKHBNq2vhRF8iEuHykWsO58+?= =?us-ascii?Q?9fxFa8qlkKo+dByScn6u3mTfYfo9z5Qwket2uWMUJfsdEDG8/Y0uAQnvacVi?= =?us-ascii?Q?ADAAWBfiNfMb+MVlQaj1g5uDXBpSSiye5xI/Snbp+PDTlRPEwprRy+sonI5u?= =?us-ascii?Q?BmRVM+CokmGvHZcuTOke9sl8LJuEEA641Bs1uGSlHgxOWYTGmKU7k2o4DLEo?= =?us-ascii?Q?OcLQxIhrpRH7Q18bPzpSwFr4dwYehsF9lImLM9gobq42YHBTSNI4DwZIw/ai?= =?us-ascii?Q?BgqlUHH9PQ3bIemInMOUBdVzG/x1c8UpQU5Cc3CaK4kjiemzY7YknPh4yr+d?= =?us-ascii?Q?IgFPTtTC+rkzU30nPB8nW3V3XJf1rpetGjg/+RiaGtnatjmGQsuNdSAygNYa?= =?us-ascii?Q?VfCClH/ntvReD5u6biaqhQRlUGoMNlJh02T5Uef/VAhQCv/VJamVP4gGqJ6x?= =?us-ascii?Q?DoigrZ7vQHRrtBSyw083+5tHY9uNv2VYX9CHN6/axmE871J6DZ5ZUi4sEatY?= =?us-ascii?Q?ZVQoK0nJKHrd+IBLlmYpQS6Bt4hCXCDDLonSti5y9Y05fAsqrwpd8IeoJgy3?= =?us-ascii?Q?854rSUe1FtS7Jj98NnL5Ibg4PGmEKeT35SRfctEehwqnIKweM//6DW4lDZDB?= =?us-ascii?Q?E5qfPzJkv8EffhzgOzAIZvYRF5d+yVISl9c+rR29oSLl/W5xUEE1AfhWa8z4?= =?us-ascii?Q?swlAsQ5dTeD1mn8Emy97jU+DgwTin6CtQd484nLLb0RvYdoMQ1n2fB0Sp5Jo?= =?us-ascii?Q?dTDqnv9eUw6yebIOtp2zciH9Qe1nHhQSiSroh+5aoJSFkiYizunm0/xp9P7u?= =?us-ascii?Q?VPnkneYg9g=3D=3D?= MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: MWHPR11MB1631.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7a15d892-ffb5-4636-1ea3-08da4308f457 X-MS-Exchange-CrossTenant-originalarrivaltime: 31 May 2022 13:24:56.4027 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Ad9gdgLAdVzq4GOf4aWxHw3k2Ah3mRQ5LULXUsidmxO5hpQWmZvCoAvyyQknYTD3CXsrnIPXCSgRMZm1Ix82kQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR11MB3200 Return-Path: ray.ni@intel.com X-OriginatorOrg: intel.com Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable > > I am not quite sure how Linux handles such case? >=20 > Oh, lovely. CPU bugs lurking indeed. linux has this longish comment > (see mm/huge_memory.c, in the middle of the __split_huge_pmd_locked() > function): >=20 > /* > * Up to this point the pmd is present and huge and userland has = the > * whole access to the hugepage during the split (which happens i= n > * place). If we overwrite the pmd with the not-huge version poin= ting > * to the pte here (which of course we could if all CPUs were bug > * free), userland could trigger a small page size TLB miss on th= e > * small sized TLB while the hugepage TLB entry is still establis= hed in > * the huge TLB. Some CPU doesn't like that. > * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erra= tum > * 383 on page 105. Intel should be safe but is also warns that i= t's > * only safe if the permission and cache attributes of the two en= tries > * loaded in the two TLB is identical (which should be the case h= ere). > * But it is generally safer to never allow small and huge TLB en= tries > * for the same virtual address to be loaded simultaneously. So i= nstead > * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mar= k the > * current pmd notpresent (atomically because here the pmd_trans_= huge > * must remain set at all times on the pmd until the split is com= plete > * for this pmd), then we flush the SMP TLB and finally we write = the > * non-huge version of the pmd entry with pmd_populate. > */ >=20 > So linux goes 2M -> not present -> 4K instead of direct 2M -> 4K (and > does the tlb flush in the not present state), which apparently is needed > on some CPUs to avoid confusing the tlb cache. >=20 > > Before that's fully understood, we think the page table split for > > stack does no harm to the functionality and code complexity. That's > > why we choose this fix first. >=20 > So this basically splits the page right from the start instead of doing > it later when page attributes are changed. Which probably avoids the > huge page landing in the tlb cache, which in turn avoids triggering the > issues outlined above. yes:) Actually there is no split at all. The 4K page table is created in th= e very beginning(before setting to cr3). So, no TLB cache issue at all. >=20 > I think doing a linux-style page split will be the more robust solution. Thanks for explaining the linux behavior. Intel's SDM also contain below wordings: * As noted in Section 4.10.2, the TLBs may subsequently contain multiple tr= anslations for the address range if * software modifies the paging structures so that the page size used for a = 4-KByte range of linear addresses * changes. A reference to a linear address in the address range may use any= of these translations. * Software wishing to prevent this uncertainty should not write to a paging= -structure entry in a way that would * change, for any linear address, both the page size and either the page fr= ame, access rights, or other attributes. * It can instead use the following algorithm: first clear the P flag in the= relevant paging-structure entry (e.g., * PDE); then invalidate any translations for the affected linear addresses = (see above); and then modify the * relevant paging-structure entry to set the P flag and establish modified = translation(s) for the new page size. But I still have some doubts about using linux-style page split. Because it's marked as not present: 1. Active code should not access data in the 2M region (stack is in the 2M = region in our case) 2. Active code should not in the 2M region (how to guarantee that?) How does Linux guarantee the above two points? Thanks, Ray