From mboxrd@z Thu Jan 1 00:00:00 1970 Authentication-Results: mx.groups.io; dkim=missing; spf=pass (domain: redhat.com, ip: 209.132.183.28, mailfrom: imammedo@redhat.com) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by groups.io with SMTP; Fri, 30 Aug 2019 07:48:07 -0700 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C66243D96B; Fri, 30 Aug 2019 14:48:06 +0000 (UTC) Received: from localhost (unknown [10.43.2.182]) by smtp.corp.redhat.com (Postfix) with ESMTP id E32A960623; Fri, 30 Aug 2019 14:48:03 +0000 (UTC) Date: Fri, 30 Aug 2019 16:48:02 +0200 From: Igor Mammedov To: Laszlo Ersek Cc: "Yao, Jiewen" , "Kinney, Michael D" , Paolo Bonzini , "rfc@edk2.groups.io" , Alex Williamson , "devel@edk2.groups.io" , qemu devel list , "Chen, Yingwen" , "Nakajima, Jun" , Boris Ostrovsky , Joao Marcal Lemos Martins , Phillip Goerl Subject: Re: [edk2-rfc] [edk2-devel] CPU hotplug using SMM with QEMU+OVMF Message-ID: <20190830164802.1b17ff26@redhat.com> In-Reply-To: <033ced1a-1399-968e-cce6-6b15a20b0baf@redhat.com> References: <8091f6e8-b1ec-f017-1430-00b0255729f4@redhat.com> <2b4ba607-f0e3-efee-6712-6dcef129b310@redhat.com> <7f2d2f1e-2dd8-6914-c55e-61067e06b142@redhat.com> <3661c0c5-3da4-1453-a66a-3e4d4022e876@redhat.com> <74D8A39837DF1E4DA445A8C0B3885C503F76FDAF@shsmsx102.ccr.corp.intel.com> <74D8A39837DF1E4DA445A8C0B3885C503F7728AB@shsmsx102.ccr.corp.intel.com> <20190827203102.56d0d048@redhat.com> <033ced1a-1399-968e-cce6-6b15a20b0baf@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 30 Aug 2019 14:48:06 +0000 (UTC) Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Thu, 29 Aug 2019 19:01:35 +0200 Laszlo Ersek wrote: > On 08/27/19 20:31, Igor Mammedov wrote: > > On Sat, 24 Aug 2019 01:48:09 +0000 > > "Yao, Jiewen" wrote: > > >> (05) Host CPU: (OS) Port 0xB2 write, all CPUs enter SMM (NOTE: New CPU > >> will not enter CPU because SMI is disabled) > > I think only CPU that does the write will enter SMM > > That used to be the case (and it is still the default QEMU behavior, if > broadcast SMI is not negotiated). However, OVMF does negotiate broadcast > SMI whenever QEMU offers the feature. Broadcast SMI is important for the > stability of the edk2 SMM infrastructure on QEMU/KVM, we've found. > > https://bugzilla.redhat.com/show_bug.cgi?id=1412313 > https://bugzilla.redhat.com/show_bug.cgi?id=1412327 > > > and we might not need to pull in all already initialized CPUs into SMM. > > That, on the other hand, could be a valid idea. But then the CPU should > use a different method for raising a synchronous SMI for itself (not a > write to IO port 0xB2). Is a "directed SMI for self" possible? theoretically depending on argument in 0xb3, it should be possible to rise directed SMI even if broadcast ones are negotiated. > > [...] > > I've tried to read through the procedure with your suggested changes, > but I'm failing at composing a coherent mental image, in this email > response format. > > If you have the time, can you write up the suggested list of steps in a > "flat" format? (I believe you are suggesting to eliminate some steps > completely.) if I'd sum it up: (01) On boot firmware maps and initializes SMI handler at default SMBASE (30000) (using dedicated SMRAM at 30000 would allow us to avoid save/restore steps and make SMM handler pointer not vulnerable to DMA attacks) (02) QEMU hotplugs a new CPU in reset-ed state and sends SCI (03) on receiving SCI, host CPU calls GPE cpu hotplug handler which writes to IO port 0xB2 (broadcast SMI) (04) firmware waits for all existing CPUs rendezvous in SMM mode, new CPU(s) have SMI pending but does nothing yet (05) host CPU wakes up one new CPU (INIT-INIT-SIPI) SIPI vector points to RO flash HLT loop. (how host CPU will know which new CPUs to relocate? possibly reuse QEMU CPU hotplug MMIO interface???) (06) new CPU does relocation. (in case of attacker sends SIPI to several new CPUs, open question how to detect collision of several CPUs at the same default SMBASE) (07) once new CPU relocated host CPU completes initialization, returns from IO port write and executes the rest of GPE handler, telling OS to online new CPU. > ... jumping to another point: > > >> 2) Let trusted software (SMM and init code) guarantee SMREBASE one by one (include any code runs before SMREBASE) > > that would mean pulling all present CPUs into SMM mode so no attack > > code could be executing before doing hotplug. With a lot of present CPUs > > it could be quite expensive and unlike physical hardware, guest's CPUs > > could be preempted arbitrarily long causing long delays. > > I agree with your analysis, but I slightly disagree about the impact: > > - CPU hotplug is not a frequent administrative action, so the CPU load > should be temporary (it should be a spike). I don't worry that it would > trip up OS kernel code. (SMI handling is known to take long on physical > platforms oo.) In practice, all "normal" SMIs are broadcast already (for > example when calling the runtime UEFI variable services from the OS kernel). > > - The fact that QEMU/KVM introduces some jitter into the execution of > multi-core code (including SMM code) has proved useful in the past, for > catching edk2 regressions. > > Again, this is not a strong disagreement from my side. I'm open to > better ways for synching CPUs during muti-CPU-hotplug. > > (Digression: > > I expect someone could be curious why (a) I find it acceptable (even > beneficial) that "some jitter" injected by the QEMU/KVM scheduling > exposes multi-core regressions in edk2, but at the same time (b) I found > it really important to add broadcast SMI to QEMU and OVMF. After all, > both "jitter" and "unicast SMIs" are QEMU/KVM platform specifics, so why > the different treatment? > > The reason is that the "jitter" does not interfere with normal > operation, and it has been good for catching *regressions*. IOW, there > is a working edk2 state, someone posts a patch, works on physical > hardware, but breaks on QEMU/KVM --> then we can still reject or rework > or revert the patch. And we're back to a working state again (in the > best case, with a fixed feature patch). > > With the unicast SMIs however, it was impossible to enable the SMM stack > reliably in the first place. There was no functional state to return to. I don't really get the last statement, but the I know nothing about OVMF. I don't insist on unicast SMI being used, it's just some ideas about what we could do. It could be done later, broadcast SMI (might be not the best) is sufficient to implement CPU hotplug. > Digression ends.) > > > lets first see if if we can ignore race > > Makes me uncomfortable, but if this is the consensus, I'll go along. same here, as mentioned in another reply as it's only possible in attack case (multiple SMIs + multiple SIPI) so it could be fine to just explode in case it happens (point is fw in not leaking anything from SMRAM and OS did something illegeal). > > and if it's not then > > we probably end up with implementing some form of #1 > > OK. > > Thanks! > Laszlo