public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: "Zhoujian (jay)" <jianjay.zhou@huawei.com>
Cc: "Yao, Jiewen" <jiewen.yao@intel.com>,
	"edk2-devel@lists.01.org" <edk2-devel@lists.01.org>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	"liujunjie (A)" <liujunjie23@huawei.com>,
	"wangxin (U)" <wangxinxin.wang@huawei.com>,
	"wujing (O)" <wujing42@huawei.com>,
	"dengkai (A)" <dengkai1@huawei.com>
Subject: Re: Question about hotplugging NIC devices to an empty pci-bridge
Date: Tue, 25 Dec 2018 11:18:26 +0100	[thread overview]
Message-ID: <a476cac1-178d-c707-bdf9-ee11cab343ae@redhat.com> (raw)
In-Reply-To: <B2D15215269B544CADD246097EACE7473B8E9FC9@DGGEMM528-MBX.china.huawei.com>

Brief answer while I'm on PTO.

(It's difficult to reply to this thread in any sensible manner, because
of the brain-damaged top-posting that outlook and gmail perpetuate. I'll
try my best anyway, but you might have to reverse the order of my
answers for getting a good logical explanation. Again, the damage is
self-inflicted here; use a better MUA please.)

On 12/21/18 14:50, Zhoujian (jay) wrote:
>> -----Original Message-----
>> From: Yao, Jiewen [mailto:jiewen.yao@intel.com]
>> Sent: Friday, December 21, 2018 1:28 PM
>> To: Zhoujian (jay) <jianjay.zhou@huawei.com>;
>> edk2-devel@lists.01.org; lersek@redhat.com
>> Cc: Huangweidong (C) <weidong.huang@huawei.com>; liujunjie (A)
>> <liujunjie23@huawei.com>; wangxin (U) <wangxinxin.wang@huawei.com>;
>> wujing (O) <wujing42@huawei.com>; dengkai (A) <dengkai1@huawei.com>
>> Subject: RE: Question about hotplugging NIC devices to an empty
>> pci-bridge

When you hotplug a traditional PCI, or PCI Express, device, at OS
runtime, the OS can generally only satisfy the resource requirements of
the device from reserved (pre-allocated) resources. This means that
hotplug plans have to be considered in advance when the initial PCI
enumeration and resource assignment occurs, in the firmware. The
reservations should be considered / propagated upstream (to the root
complex(es)) from the leaf bridge(s) where the hotplug actions are
expected. PciBusDxe covers the propagation, but the "leaves" have to
expose the reservations ("paddings").

The default reservation sizes may be both wasteful and insufficient. One
example for waste is when you have many traditional PCI bridges, each
requiring 4KB IO space, but the platform doesn't have much IO space in
total (the theoretical maximum is 64KB anyway), and so you run out of IO
space during enumeration.

More below:

>>
>> You need have a PciHotPlug driver to produce the
>> EFI_PCI_HOT_PLUG_INIT_PROTOCOL
>>
>> One example:
>> https://github.com/tianocore/edk2/tree/master/OvmfPkg/PciHotPlugInitDxe
>> Laszlo added it. He may provide comment on how to use it.
>>
>> Another example:
>> https://github.com/tianocore/edk2-platforms/tree/devel-
>> MinPlatform/Platform/Intel/KabylakeOpenBoardPkg/Features/PciHotPlug
>> This is to add Thunderbolt support in Kabylake platform.
>
> I've checked the dsc, and confirmed that the OVMF.fd already had the
> PciHotPlug driver.
> Then I found the resource info through the debug log like below:
>
> InitRootBridge: populated root bus 0, with room for 255 subordinate bus(es)
> RootBridge: PciRoot(0x0)
>   Support/Attr: 70069 / 70069
>     DmaAbove4G: No
> NoExtConfSpace: Yes
>      AllocAttr: 3 (CombineMemPMem Mem64Decode)
>            Bus: 0 - FF Translation=0
>             Io: C000 - FFFF Translation=0
>            Mem: C0000000 - FBFFFFFF Translation=0
>     MemAbove4G: 41800000000 - 41FFFFFFFFF Translation=0
>           PMem: FFFFFFFFFFFFFFFF - 0 Translation=0
>    PMemAbove4G: FFFFFFFFFFFFFFFF - 0 Translation=0
>
> In the OvmfPkg/PlatformPei/Platform.c, the function
> MemMapInitialization sets the PciIoBase=0xC000 and PciIoSize=0x4000(On
> Q35, the PciIoBase=0x6000 and PciIoSize=0xA000).
>
> So my question are:
> 1)Why the default value of PciIoBase is 0xC000, each pci-bridges needs
> 0x0fff IO window, which means only 4 pci-bridges can be reserved?

The IO space aperture sizes that you see on i440fx and Q35 in
OvmfPkg/PlatformPei emerge like that simply because those are the
largest contiguous IO space ranges that fit between IO ports that belong
to platform devices.

If you run

  git blame -- OvmfPkg/PlatformPei/Platform.c

you soon end up with a pointer to commit bba734ab4c7c
("OvmfPkg/PlatformPei: provide 10 * 4KB of PCI IO Port space on Q35",
2016-05-17). The commit message on that commit should help, and it also
mentions

  https://bugzilla.redhat.com/show_bug.cgi?id=1333238

which is where I had investigated the IO space sizes that were
*practically* available on i440fx and Q35.

> 2)If I set the PciIoBase=0x1000, PciIoSize=0xA000 and start a vm with
> 8 empty pci-bridges, hotpluging a virtual nic to the pci-bridge, the
> problem is disappearing.
>   But will this cause any side effects?

Yes, it could; if you override PciIoBase like this, then PciBusDxe may
easily allocate IO BARs of devices such that they overlap IO ports of
other (built-in, platform) devices.

The solution to the IO space shortage is to use Q35 with a PCI Express
(that is, not traditional PCI) hierarchy. PCI Express devices are
required to function without IO BARs, and you can use PCI Express Root
Ports, and Switches (consisting from Upstream Ports and a number of
Downstream Ports) without consuming IO space at all.

This is documented in great detail in the following two documents in the
QEMU source tree:

[1] docs/pcie.txt
[2] docs/pcie_pci_bridge.txt

Now, if you switch to Q35 / PCIE, then you likely won't run out of IO
space; however, the other issue may still arise, where not enough MMIO
is reserved for hot-plugging devices with large MMIO demands.

For that, OvmfPkg/PciHotPlugInitDxe implements the firmware side for
QEMU's "PCI resource reservation capability". This is a vendor-specific
PCI capability (in traditional config space) that can be added to the
generic PCI Express Root Port device model of QEMU, using the
appropriate command line switches (see again [1] and [2]). When you do
that, PciHotPlugInitDxe instructs PciBusDxe to reserve the given sizes
from the given resource types on the given root port, and then you can
hotplug a large device at OS runtime into that root port.

For more details (beyond the two documents above), please refer to

[3] git log -- OvmfPkg/PciHotPlugInitDxe
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1434740#c5
[5] https://lists.01.org/pipermail/edk2-devel/2017-September/015296.html

More below:

>>> -----Original Message-----
>>> From: Zhoujian (jay) [mailto:jianjay.zhou@huawei.com]
>>> Sent: Friday, December 21, 2018 11:04 AM
>>> To: Yao, Jiewen <jiewen.yao@intel.com>; edk2-devel@lists.01.org;
>>> lersek@redhat.com
>>> Cc: Huangweidong (C) <weidong.huang@huawei.com>; liujunjie (A)
>>> <liujunjie23@huawei.com>; wangxin (U) <wangxinxin.wang@huawei.com>;
>>> wujing (O) <wujing42@huawei.com>; dengkai (A) <dengkai1@huawei.com>
>>> Subject: RE: Question about hotplugging NIC devices to an empty
>>> pci-bridge
>>>
>>> I've tried to set PcdPciBusHotplugDeviceSupport to be true in
>>> MdeModulePkg.dec like below:
>>> gEfiMdeModulePkgTokenSpaceGuid.PcdPciBusHotplugDeviceSupport|TRUE
>>> |BOOLEAN|0x0001003d
>>> But the problem still exists. Is there any steps I missed? Or some
>>> infos need to populate to OVMF by Qemu?
>>>
>>> Could you give me more infos?
>>>
>>> Thanks,
>>> Jay Zhou
>>>
>>>> -----Original Message-----
>>>> From: Yao, Jiewen [mailto:jiewen.yao@intel.com]
>>>> Sent: Thursday, December 20, 2018 8:09 PM
>>>> To: Zhoujian (jay) <jianjay.zhou@huawei.com>;
>>>> edk2-devel@lists.01.org
>>>> Cc: Huangweidong (C) <weidong.huang@huawei.com>; liujunjie (A)
>>>> <liujunjie23@huawei.com>; wangxin (U)
>>> <wangxinxin.wang@huawei.com>; wujing (O)
>>>> <wujing42@huawei.com>; dengkai (A) <dengkai1@huawei.com>
>>>> Subject: RE: Question about hotplugging NIC devices to an empty
>>> pci-bridge
>>>>
>>>> Maybe you can use EFI_PCI_HOT_PLUG_INIT_PROTOCOL to reserve some
>>> resource.
>>>>
>>>> See MdePkg\Include\Protocol\PciHotPlugInit.h
>>>>
>>>> Thank you
>>>> Yao Jiewen
>>>>
>>>>> -----Original Message-----
>>>>> From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On
>>>>> Behalf
>>> Of
>>>>> Zhoujian (jay)
>>>>> Sent: Thursday, December 20, 2018 7:34 PM
>>>>> To: edk2-devel@lists.01.org
>>>>> Cc: Huangweidong (C) <weidong.huang@huawei.com>; liujunjie (A)
>>>>> <liujunjie23@huawei.com>; wangxin (U)
>>> <wangxinxin.wang@huawei.com>;
>>>>> wujing (O) <wujing42@huawei.com>; dengkai (A)
>>> <dengkai1@huawei.com>
>>>>> Subject: [edk2] Question about hotplugging NIC devices to an empty
>>>>> pci-bridge
>>>>>
>>>>> Hi all,
>>>>>
>>>>> The issue occurs when I started a virtual machine in UEFI way by
>>>>> libvirt on qemu-kvm platform, the vm is configured with 8
>>>>> pci-bridges on root bus0. I hotplug a device like virtual nic to
>>>>> an empty pci-bridge which has no device connected. Login the vm, I
>>>>> can see the device by "lspci"", but it didn't show by "ifconfig
>>>>> -a". Dmesg shows like
>>>> below:
>>>>> pci 0000:04:01.0: BAR 0: no space for [mem size 0x00010000 64bit
>>>>> pref] pci
>>>>> 0000:04:01.0: BAR 0: failed to assign [mem size 0x00010000 64bit
>>>>> pref] pci
>>>>> 0000:04:01.0: BAR 3: no space for [mem size 0x00004000 64bit pref]
>>>>> pci
>>>>> 0000:04:01.0: BAR 3: failed to assign [mem size 0x00004000 64bit
>>>>> pref]
>>>>>
>>>>> Reboot the vm, everything turns back to normal and I can see the
>>>>> new hotplugged nic by "ifconfig -a".
>>>>>
>>>>> Use the OVMF compiling from latest edk2 source code, the same
>>> problem
>>>>> arises.
>>>>>
>>>>> So, my questions are:
>>>>> 1) the generic PCI bus driver in edk2 does not allocate IO and/or
>>>>> MMIO for a bridge if there is no device behind the Currently, if
>>>>> you bridge that consume that kind of resource?
>>>>> 2) What's the purpose of this strategy?
>>>>> 3) Why don't allocate resource to all bridges like seabios?
>>>>> 4) Is there any switch for me to turn off this constraint so that
>>>>> every pci-bridge including empty ones can be assigned IO and
>>>>> memory
>>> window?
>>>>> Otherwise, each time I hotplug a device to empty pci-bridge, a
>>>>> reboot operation should be implemented to use the device?
>>>>>
>>>>> Any help will be appreciated, Thanks!

Currently, the resource reservation capability is implemented on the
Generic PCI Express Root Port device model, which is only usable on Q35.
If you really want to hotplug a traditional PCI device, *while* sizing
the reservation appropriately, I believe you'll have to:
- size the reservation on a Root Port as needed,
- cold-plug a PCIE-PCI bridge first into the Root Port,
- hotplug the traditional PCI device into the PCIE-PCI bridge.

(You can also *hot*plug the PCIE-PCI bridge itself, because
<https://bugzilla.tianocore.org/show_bug.cgi?id=656> has been fixed, but
then remember to reserve bus numbers as well, at the Root Port level.)

We worked out this exact scenario with another developer earlier, on the
SeaBIOS mailing list. Please read through the thread below:

  [SeaBIOS] hotplug failure issue on pci-bridge
  http://mid.mail-archive.com/da8e8d1c-ab1e-c790-0c34-ef094a438a77@linux.intel.com
  https://mail.coreboot.org/hyperkitty/list/seabios@seabios.org/thread/WKHZ6LVPOAXRPPT4M5HZKUPON2Z7EZWB/

Hope this helps,
Laszlo


  reply	other threads:[~2018-12-25 10:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-20 11:34 Question about hotplugging NIC devices to an empty pci-bridge Zhoujian (jay)
2018-12-20 12:09 ` Yao, Jiewen
2018-12-20 13:03   ` Zhoujian (jay)
2018-12-21  3:04   ` Zhoujian (jay)
2018-12-21  5:27     ` Yao, Jiewen
2018-12-21 13:50       ` Zhoujian (jay)
2018-12-25 10:18         ` Laszlo Ersek [this message]
2018-12-26  6:04           ` Zhoujian (jay)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a476cac1-178d-c707-bdf9-ee11cab343ae@redhat.com \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox