From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124])
 by mx.groups.io with SMTP id smtpd.web11.8271.1617971086953266165
 for <devel@edk2.groups.io>;
 Fri, 09 Apr 2021 05:24:47 -0700
Authentication-Results: mx.groups.io;
 dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XAktubha;
 spf=pass (domain: redhat.com, ip: 216.205.24.124, mailfrom: lersek@redhat.com)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1617971086;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=LK7BRYSlBRmx361n0Ge1S6vbVu5YoFe6hRnhvLmkkys=;
	b=XAktubhaT2SsuV8MjSvovYRC4RzXosbIwjrJ3Bewiz8R3/ys1aLOQKPyeDFmWYCW7MGaKL
	mcfM67aVO43W8EY4Q6l8BNfeDe0NtRo9wLa2Se7lrdPVv1zGzow9KZ5A2Mzt6luGw66732
	Tbtkw6PTgCdnsQCFARhxDBzr2lekNaM=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-39-eykt_iObN6mfYZBfglPgYA-1; Fri, 09 Apr 2021 08:24:42 -0400
X-MC-Unique: eykt_iObN6mfYZBfglPgYA-1
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 88FA718B9F03;
	Fri,  9 Apr 2021 12:24:40 +0000 (UTC)
Received: from lacos-laptop-7.usersys.redhat.com (ovpn-115-137.ams2.redhat.com [10.36.115.137])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 9E00D5C1A1;
	Fri,  9 Apr 2021 12:24:38 +0000 (UTC)
Subject: Re: [edk2-devel] [RFC PATCH 00/19] Add AMD Secure Nested Paging (SEV-SNP) support
To: devel@edk2.groups.io, brijesh.singh@amd.com
Cc: James Bottomley <jejb@linux.ibm.com>, Min Xu <min.m.xu@intel.com>,
 Jiewen Yao <jiewen.yao@intel.com>, Tom Lendacky <thomas.lendacky@amd.com>,
 Jordan Justen <jordan.l.justen@intel.com>,
 Ard Biesheuvel <ardb+tianocore@kernel.org>
References: <20210324153215.17971-1-brijesh.singh@amd.com>
 <34a50603-32b9-1476-04a5-8476dd810fe4@redhat.com>
 <213ce382-0b04-16f4-dd77-f9bb2cc32698@amd.com>
From: "Laszlo Ersek" <lersek@redhat.com>
Message-ID: <b4b2a840-9a3c-c45e-884e-cd0eeae4e529@redhat.com>
Date: Fri, 9 Apr 2021 14:24:37 +0200
MIME-Version: 1.0
In-Reply-To: <213ce382-0b04-16f4-dd77-f9bb2cc32698@amd.com>
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16
Authentication-Results: relay.mimecast.com;
	auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=lersek@redhat.com
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit

On 04/08/21 13:59, Brijesh Singh wrote:
> On 4/8/21 4:58 AM, Laszlo Ersek wrote:
>> On 03/24/21 16:31, Brijesh Singh wrote:

>>> At this time we only support the pre-validation. OVMF detects all the available
>>> system RAM in the PEI phase. When SEV-SNP is enabled, the memory is validated
>>> before it is made available to the EDK2 core.
>> Can you describe this in a bit more detail, before I look at the
>> individual patches? Specifically, what existing logic in the PEI phase
>> was taken, and extended, and how?
> 
> One of the key requirement is that the guest private pages much be
> validated before the access. If guest tries to access the pages before
> the validation then it will result in #VC (page-not-validated)
> exception. To avoid the #VC, we propose the validating the memory before
> the access. We will incrementally add the support to lazy validate (i.e
> validate on access).

What is the primary threat (in simple terms please) that validation is
supposed to prevent?

And, against that particular threat, does pre-validation offer some
protection, or will that only come with lazy validation?

> 
> Let me try explaining a bit, the page validation process consist of two
> steps:
> 
> 1. Add the pages in the RMP table -- must be done by the hypervisor
> using the RMPUPDATE instruction. The guest can use VMGEXIT NAEs to ask
> hypervisor to add or remove pages from the RMP table.
> 
> 2. Guest issue the PVALIDATE instruction -- this sets the validate bit
> in the RMP table.
> 
> Similar to SEV, the OVMF_CODE.fd is encrypted through the SNP firmware
> before the launch. The SNP firmware also validates the memory page after
> encrypting. This allows us to boot the initial entry code without guest
> going through the validation process.
> 
> The OVMF reset vector uses few data pages (e.g page table, early Sec
> stack). Access to these data pages will result in #VC. There are two
> approaches we can take to validate these data pages:
> 
> 1. Ask SNP firmware to pre-validate it -- SNP firmware provides an
> special command that can be used to pre-validate the pages without
> affecting the measurement.

This means the two pflash chips, right?

> 
> 2. Enhance the reset vector code to validate the pages.
> 
> For now I choose #1.
> 
> The pre-validation performed by the SNP firmware is sufficient to boot
> through the SEC phase. The SEC phase later decompress the Fv to a new
> memory location. Now we need the OVMF to take over the validation
> procedure.  The series extends the MemEncryptSevLib to add a new helper
> MemEncryptSevSnpValidateRam(). The helper is used to validate the system
> RAM. See patch #12. SEC phase calls the MemEncryptSevSnpValidateRam() to
> validate the output buffer used for the decompression. This was
> sufficient to boot into the PEI phase, see patch #13.

Two questions here:

- Is ACPI S3 in scope for now?

- We need more areas made accessible in SEC than just the decompression
buffer; for example the temporary SEC/PEI heap and stack, and (IIRC)
various special SEV-ES areas laid out via MEMFD. Where are all of those
handled / enumerated?

> The PEI detects
> all the available system RAM. After the memory detection is completed
> the PlatformPei calls the AmdSevSnpInitialize(). The initialization
> routine iterate through the HOB and calls the
> MemEncryptSevSnpValidateRam() to validate all the system RAM. Is it
> possible the more system ram can be detected after the PlatformPei is
> completed ?

That would cause problems even without SEV-SNP (i.e., with plain SEV),
so I'm not worried about it.

> 
> One of the important thing is we should *never* validate the pages
> twice.

What are the symptoms / consequences of:

- the guest accessing an unvalidated page (I understand it causes a #VC,
but what is the direct result of that, when this series is applied?),

- doubly-validating a page?

The first question is relevant because we should crash as soon as we
touch a page we forgot to validate (we shouldn't let any corruption or
similar spread out until we finally realize there's a problem).

The second question is relevant for security I guess. What attacks
become possible, and/or what symptoms are produced, if we
doubly-validate a page?

Furthermore, IIRC, we have separate #VC handlers for the different
firmware phases; do they behave consistently / identicall when a
#VC(page-not-validated) occurs, when this patch set is applied?

My first question is basically asking whether we can *exclusively* rely
on #VC(page-not-validated) faults to tell us that we missed validating a
particular page. If we can do that, then the job is a bit easier,
because from the GPA, we can more or less also derive *when and where*
we should pre-validate the page (at least until validation is done
completely lazily).

> The MemEncryptSevSnpValidateRam() uses a interval search tree to
> keep the record of what has been validated. Before validating the range,
> it lookup in its tree and if it finds that range is already validated
> then do nothing. If it detects an overlap then it will validate only non
> overlapping regions -- see patch #14.

What data structure is used for implementing the interval tree?

I'm not necessarily looking for a data structure with "nice" asymptotic
time complexity. With pre-validation especially, I think simplicity
(ease of review) is more important for the data structure than
performance. If it's not an actual "tree", we shouldn't call it a
"tree". (An "interval tree" is usually an extension of a Red-Black Tree,
and that's not the simplest data structure in existence; although edk2
does offer an rbtree library.)

Furthermore, what you describe above is called idempotency. No matter
how many times we attempt to validate a range, it may (or may not even)
cause an actual change in the first action only. Is this property
(=idempotency) an inherent requirement of the technology, or is it a
simplification of the implementation? Put differently: if you called
CpuDeadLoop() in the validation function any time an overlapping
validation request were received, would that hugely complicate the call
sites?

I'm kind of "obsessing" about idempotency because you say we must
*never* doubly-validate a page, so the *difference* between:
- explicitly crashing on such an attempt,
- and silently ignoring such an attempt,
may be meaningful.

It's kind of the difference between "oops this is a call site *bug*, but
we patched it up", vs. "this is expected of call sites, we should just
handle it internally".

> The patch #18 extend the MemEncrypt{Set,Clear}PageEncMask() to call the
> SNP page state change during the C-bit toggle.

- When exactly do we invalidate?

- Does the time and place of invalidation depend on whether we perform
pre-validation, or lazy validation?

- Is page invalidation idempotent too?

- What is the consequence (security impact) if we forget invalidation?

- There are four page states I can imagine:
  - encrypted for host access, valid for guest access
  - encrypted for host access, invalid for guest access
  - decrypted for host access, valid for guest access
  - decrypted for host access, invalid for guest access

Does a state exist, from these four, that's never used? (Potentially
caught by the hardware?)

Do the patches highlight / explain the validity transitions, in
comments? My understanding is that the C-bit toggle and the guest access
valid/invalid toggle are separate actions, so "middle" states do exist,
but perhaps only temporarily.

I'm curious how it works, for example, with variousvirtio transfers (bus
master read, bus master write, bus master common buffer). In the
IoMmuDxe driver minimally, we access memory both through PTEs with the
C-bit set and through PTEs with the C-bit clear, meaning that
"encrypted, valid", and "decrypted, valid" are both required states. But
that seems to conflict with the notion that "C-bit toggle" be directly
coupled with a "validity toggle".

Put differently, I'm happy that modifying IoMmuDxe appears unnecessary,
but then that tells me I'm missing something about the state transitions
between the above *four* states.

> 
> Please let me know if you have any questions. We can hash out the design
> here before you taking a closure look at the code.

Sorry that I've been (and am being) slow to start reviewing this series.

Thanks,
Laszlo