public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Brijesh Singh <brijesh.singh@amd.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	Jordan Justen <jordan.l.justen@intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Jiewen Yao <jiewen.yao@intel.com>,
	Star Zeng <star.zeng@intel.com>
Subject: Re: [PATCH 1/5] OvmfPkg/VirtioNetDxe: map VRING using VirtioRingMap()
Date: Tue, 5 Sep 2017 23:59:00 +0200	[thread overview]
Message-ID: <f5d121d9-d15b-b2d1-8f6c-405e899e8d5f@redhat.com> (raw)
In-Reply-To: <CAKv+Gu-cs24Tj-+DMBvfdSg6zroxHqDYUsVss-fAKtvshgRQMQ@mail.gmail.com>

On 09/05/17 23:11, Ard Biesheuvel wrote:
> On 5 September 2017 at 21:17, Laszlo Ersek <lersek@redhat.com> wrote:
> [...]
>> ... Based on the discussion in the other thread ("[edk2] [PATCH 0/4]
>> MdeModulePkg: some PCI HC drivers: unmap common buffers at
>> ExitBootServices()"), I'm worried that we might need another
>> restructuring for the IOMMU driver, and for the ExitBootServices()
>> handlers for the virtio drivers (removing the Unmap() calls).
>>
>> If that's the case, then it wouldn't be good if you wasted work on
>> VirtioNetExitBoot() in v2 of this series.
>>
>> Also under patch v1 #4, I requested that you not use
>> VirtioOperationBusMasterRead for DMA that remains pending after the
>> protocol member function returns, because we cannot Unmap()
>> BusMasterRead in VirtioNetExitBoot() without freeing memory. Dependent
>> of the outcome of said other thread, this might not be good advice after
>> all.
>>
>> I'd be pretty disappointed if we had to go back to the drawing board at
>> this stage. In my opinion, the UEFI-2.7 spec doesn't offer any
>> facilities to us that would let us reliably and safely Unmap() bus
>> master operations (= re-encrypt memory) at ExitBootServices(), for such
>> PCI device drivers that we cannot modify. Whatever we do at this point
>> looks like a hack:
>>
>>
>> * Option #1: modify Virtio and other PCI drivers to use only
>>   CommonBuffer operations for asynch DMA, and manually Unmap() those
>>   operations in the ExitBootServices() handlers of the drivers. In
>>   addition, guarantee that Unmap() for CommonBuffer will not release
>>   memory.
>>
>>   This is the approach I've been supporting. We could implement it for
>>   OVMF, because the community controls most of the platform (QEMU,
>>   KVM, OVMF), OVMF is 100% open source, and we can propose patches for
>>   other (e.g. MdeModulePkg) PCI drivers in edk2, if necessary.
>>
>>   Problem: PCI drivers are not required by the spec, or the Driver
>>   Writer's Guide, to Unmap() on ExitBootServices() (and then to Unmap()
>>   only CommonBuffer). Also, PciIo implementations are not required by
>>   the spec to behave like this (= not free memory when Unmap()ing
>>   CommonBuffer). We can satisfy those assumptions in OvmfPkg, but
>>   perhaps not in MdeModulePkg drivers.
>>
> 
> I think you will have much bigger problems with out of tree PCI
> drivers that never use Map/Unmap in the first place, because stuff
> just works on PCs if you omit them.

I'm currently not worried about out-of-tree PCI drivers in the least :)
I'm worried about finding common grounds with MdeModulePkg drivers,
because they do use Map/Unmap, they seek to conform to the UEFI spec
(mostly), and they are an example / reference implementation for other
dirvers / driver writers.

If out-of-tree drivers break, let them break (as first step); exactly
for the reason you state.

> This is actually the reason I am quite pleased with this SEV support:
> it means x86 drivers will start breaking in the same way as they would
> have on ARM systems with non-coherent DMA or non-1:1 mapped PCI, and
> vendors will have to clean up their code anyway, not just for ARM
> compatibility.

I certainly agree this is a benefit!

>From the virtio conversion thus far (even from the mostly-reviewer
side), I can say it is a lot of work.

> The only requirement imposed on DMA capable devices is that they cease
> to perform DMA after ExitBootServices. Anything beyond that should not
> be the responsibility of the device driver, simply because that
> requirement did not exist before, and so we cannot go back in time and
> add it.

Fine by me (as a general guideline, not necessarily as a practical one);
what can we use instead, for closing down the IOMMU holes late enough,
originally opened by the PciIo.Map() calls?

> 
>> * Option #2: add an ExitBootServices() handler to the IOMMU driver, and
>>   clean up mappings (= re-encrypt memory) after the PCI / Virtio drivers
>>   are finalized in their ExitBootServices() handlers. Ignore mappings in
>>   the drivers' ExitBootServices() handlers.
>>
>>   Problem: the keyword is "after". Under this approach, we *must* clean
>>   up the mappings in the IOMMU driver, but we *must not* do that before
>>   the device drivers are finished. And the UEFI spec does not allow us
>>   to express a dependency order between ExitBootServices() handlers.
>>
>>   We could hack that around by deferring the actual cleanup to *another*
>>   event handler, signaled from the IOMMU's "normal" ExitBootServices()
>>   handler.
>>
>>   Problem: the UEFI spec does not promise that signaling events from
>>   ExitBootServices() handlers is safe.
>>
>>   Problem: if some PCI driver does the same trick for whatever reason in
>>   its ExitBootServices() handler, then we're back to square one.
>>
>>
> 
> I think this is the only feasible option. Fortunately, we don't have
> to solve this at the spec level, but only have to deal with the
> implementation.
> 
> What we need is a method in the IOMMU protocol that is invoked when
> the ExitBootServices implementation is about to return EFI_SUCCESS

Yes, after "everything else" is done.

(Of course, this is the age-old problem with UEFI, when components that
were originally meant to be independent now try to order themselves in
some fashion. For example, "let me just install, or patch, this ACPI
table or configuration table quickly at ReadyToBoot, *after* everyone
else is done, and I get a good look at system state". Now combine two
such DXE drivers into a firmware, and hilarity ensues as they fight for
the last word.)

No facility exists to my knowledge (on the spec level) that would enable
such fine delaying or dependency ordering between EBS handlers. The only
ordering is between notification functions enqueued at TPL_NOTIFY vs.
TPL_CALLBACK, and there is a guarantee (I think?) that if you get
signalled demonstrably later (i.e., not via an event group), then you
get queued for later, within the same TPL.

The problem (for this discussion) is that any random PCI driver can
register its EBS event notification function at either TPL_NOTIFY or at
TPL_CALLBACK, plus that all EBS events are signaled together as an event
group. Consequently, if the IOMMU driver registers its EBS notifier at
TPL_NOTIFY, it is guaranteed to run earlier than all notifiers with
TPL_CALLBACK (which is exactly what we *don't* want). If the IOMMU
registers its EBS handler at TPL_CALLBACK, then (due to being signaled
through a large event group) the notifier will still be invoked
somewhere in the middle of a bunch of other TPL_CALLBACK-level notifiers
-- that is, not necessarily in the last position.

Hence my floating of a hack to re-queue the notification (to another
TPL_CALLBACK handler), so that we get to the "end of the low-prio
queue". But calling SignalEvent() from an EBS handler is not explicitly
permitted in the spec. (Below you even suggest that an EBS handler
should not call any boot service.)


Of course, if CoreExitBootServices() can be updated specifically for
this use case, I won't protest.

>
> (which means it could be the second time it was called).

Side remark: the CoreExitBootServices() implementation does not notice
memory map changes incurred by the ExitBootServices() handlers, in my
interpretation.

CoreExitBootServices() has a MapKey (= "memmap generation") check early
on, in CoreTerminateMemoryMap(). This check catches memmap changes
between the last GetMemoryMap(), and the call to ExitBootServices().

After this check succeeds, the notify functions are kicked, and on this
path, CoreExitBootServices() simply cannot return any other value than
EFI_SUCCESS. So whatever mess the individual callbacks make, it doesn't
notice or report.

Perhaps this is better for your argument, actually. I'm not fully sure.

> This severely
> limits what the method is actually capable of doing, and I think it
> should not be allowed to call any boot or DXE services at all, but it
> should still be sufficient for some linked list traversals and
> CopyMem() operations, and whatever else is needed to re-encrypt the
> memory.

Yes, the necessary actions are sufficiently low-level for this. The
problem is the ordering.

> 
> And actually, this is not only useful for SEV, other IOMMU drivers
> will have to deal with the same issue: in most cases, you'll want to
> disable it before handing over to the OS, but this can never be done
> safely in a EBS event handler for precisely the reasons you pointed
> out. Most PCI devices probably deal with this gracefully, but tearing
> down mappings should simply be avoided if a device could still be
> accessing it.
> 

Fully agreed. Once Jiewen adds the policy option to lock down VT-d at
EBS (I hope I understood that plan right), dependent on OS support for
the IOMMU, the IOMMU faults can show up just the same at EBS, until the
rest of the PCI driver EBS handlers finish up.

Thanks!
Laszlo


  reply	other threads:[~2017-09-05 21:56 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-01 11:24 [PATCH 0/5] OvmfPkg/VirtioNetDxe: map host address to device address Brijesh Singh
2017-09-01 11:24 ` [PATCH 1/5] OvmfPkg/VirtioNetDxe: map VRING using VirtioRingMap() Brijesh Singh
2017-09-05 11:47   ` Laszlo Ersek
2017-09-05 18:57     ` Brijesh Singh
2017-09-05 20:17       ` Laszlo Ersek
2017-09-05 21:11         ` Ard Biesheuvel
2017-09-05 21:59           ` Laszlo Ersek [this message]
2017-09-05 22:18             ` Ard Biesheuvel
2017-09-05 22:37               ` Laszlo Ersek
2017-09-05 23:03                 ` Ard Biesheuvel
2017-09-01 11:24 ` [PATCH 2/5] OvmfPkg/VirtioNetDxe: alloc RxBuf using AllocateSharedPages() Brijesh Singh
2017-09-05 15:06   ` Laszlo Ersek
2017-09-01 11:24 ` [PATCH 3/5] OvmfPkg/VirtioNetDxe: dynamically alloc transmit header Brijesh Singh
2017-09-06  9:11   ` Laszlo Ersek
2017-09-01 11:24 ` [PATCH 4/5] OvmfPkg/VirtioNetDxe: map virtio-net transmit request buffer Brijesh Singh
2017-09-05 12:41   ` Laszlo Ersek
2017-09-05 12:44     ` Laszlo Ersek
2017-09-06  8:11     ` Laszlo Ersek
2017-09-01 11:24 ` [PATCH 5/5] OvmfPkg/VirtioNetDxe: negotiate VIRTIO_F_IOMMU_PLATFORM Brijesh Singh
2017-09-06  7:33   ` Laszlo Ersek
2017-09-07 22:55 ` [PATCH 0/5] OvmfPkg/VirtioNetDxe: map host address to device address Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5d121d9-d15b-b2d1-8f6c-405e899e8d5f@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox