From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by mx.groups.io with SMTP id smtpd.web08.384.1610734929045732082 for ; Fri, 15 Jan 2021 10:22:09 -0800 Authentication-Results: mx.groups.io; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g9dElNwD; spf=pass (domain: redhat.com, ip: 216.205.24.124, mailfrom: lersek@redhat.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610734928; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pQo5nnp/oCZbBumBEvP51UIERXikpCxof8FJB8IxhMw=; b=g9dElNwDx63QQyha/6i58uVsHRkTsCc7kPRcxUS4ALHmnIY36Qu8obf55kg9VLcFqR1xk4 ijSJeMT/t5V/XlWdTxawAUdZ6xAhtif8I1yoESoPvPqH0ptoN6fLhuITadkN/O1UNKznA3 QO04lIY+MlBNSLc/omFAkR0Z69pGvFc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-251-TFcrfa2ON_iCrTrXUSmcOg-1; Fri, 15 Jan 2021 13:22:04 -0500 X-MC-Unique: TFcrfa2ON_iCrTrXUSmcOg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6DA3E9CDA4; Fri, 15 Jan 2021 18:22:03 +0000 (UTC) Received: from lacos-laptop-7.usersys.redhat.com (ovpn-112-152.ams2.redhat.com [10.36.112.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2F42F5D9C6; Fri, 15 Jan 2021 18:22:01 +0000 (UTC) Subject: Re: [PATCH v2 02/10] OvmfPkg: raise PcdShellFileOperationSize to 128KB To: =?UTF-8?Q?Philippe_Mathieu-Daud=c3=a9?= , devel@edk2.groups.io Cc: Ard Biesheuvel , Jordan Justen References: <20210113085453.10168-1-lersek@redhat.com> <20210113085453.10168-3-lersek@redhat.com> <6ca294e1-d19a-8d80-94c4-372c06b0df6b@redhat.com> <9ecb1a31-001f-f57e-c97c-afd527383fa9@redhat.com> From: "Laszlo Ersek" Message-ID: Date: Fri, 15 Jan 2021 19:22:00 +0100 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=lersek@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Hi Phil, On 01/15/21 16:58, Philippe Mathieu-Daudé wrote: > On 1/15/21 11:09 AM, Laszlo Ersek wrote: >> On 01/15/21 10:34, Philippe Mathieu-Daudé wrote: >>> On 1/13/21 9:54 AM, Laszlo Ersek wrote: >>>> >>>> The virtio-fs daemon of QEMU advertizes a 128KB maximum buffer size by >>>> default, for the FUSE_WRITE operation. >>> >>> I delayed this patch review because I couldn't find where this value >>> is advertized in QEMU (virtiofsd is very new to me). Can you enlighten >>> me please? >> >> Yes, absolutely. It's really difficult to find. I actually started to >> capture all that information in what would be commit 6f7bc7196ff2 >> ("OvmfPkg/VirtioFsDxe: implement the wrapper function for FUSE_WRITE", >> 2020-12-21), or perhaps in this patch even -- but in the end, it was >> such a divergent set of commits, originating actually in the "libfuse" >> project, from which virtiofsd was forked, that I decided to drop it. >> >> So, in virtiofsd, you actually have to look for the *page count* that >> corresponds to 128KB -- FUSE_DEFAULT_MAX_PAGES_PER_REQ (32). >> >> In the fuse_session_new() function, we have >> >> se->conn.max_write = UINT_MAX; >> se->bufsize = FUSE_MAX_MAX_PAGES * getpagesize() + FUSE_BUFFER_HEADER_SIZE; >> >> In the do_init() function, which handles the FUSE_INIT request, we have >> >> size_t bufsize = se->bufsize; >> >> then >> >> if (!(arg->flags & FUSE_MAX_PAGES)) { >> size_t max_bufsize = FUSE_DEFAULT_MAX_PAGES_PER_REQ * getpagesize() + >> FUSE_BUFFER_HEADER_SIZE; >> if (bufsize > max_bufsize) { >> bufsize = max_bufsize; >> } >> } >> >> and later >> >> if (se->conn.max_write > bufsize - FUSE_BUFFER_HEADER_SIZE) { >> se->conn.max_write = bufsize - FUSE_BUFFER_HEADER_SIZE; >> } >> >> and then >> >> outarg.max_write = se->conn.max_write; >> >> It's super convoluted, because a bunch of flags are handled, and they >> all control the buffer size one way or another, and so the various >> limits are converted and enforced back and forth. In practice, the >> virtio-fs driver in OVMF does not set the FUSE_MAX_PAGES flag in the >> FUSE_INIT request, and so in virtiofsd, we first reduce "bufsize" from >> >> FUSE_MAX_MAX_PAGES * getpagesize() + FUSE_BUFFER_HEADER_SIZE; >> >> to >> >> FUSE_DEFAULT_MAX_PAGES_PER_REQ * getpagesize() + FUSE_BUFFER_HEADER_SIZE; >> >> and then calculate "max_write" by subtracting FUSE_BUFFER_HEADER_SIZE. >> The result is >> >> FUSE_DEFAULT_MAX_PAGES_PER_REQ * getpagesize() >> >> which is 128 KB. >> >> >> In the libfuse project , one >> related commit is: >> >> https://github.com/libfuse/libfuse/commit/027d0d17c8a4 >> >> and before that, >> >> https://github.com/libfuse/libfuse/commit/97f4a9cb4fc69 >> >> If you try to peek even before that, you end up with historical commits >> like >> >> https://github.com/libfuse/libfuse/commit/065f222cd5850 >> >> So basically the *origin* of max_write=128KB is untraceable :) But, at >> least the present behavior can be found; it comes from libfuse commit >> 027d0d17c8a4. >> >> >> Approaching the same topic from the FUSE client in the Linux kernel, >> commit 5da784cce430 ("fuse: add max_pages to init_out", 2018-10-01) is >> relevant; it also points to the following discussion on LKML: >> . >> >> Again it's convoluted; the direct answer to your question is, >> "max_write=128KB comes from FUSE_DEFAULT_MAX_PAGES_PER_REQ, because the >> FUSE_MAX_PAGES flag is not requested in the FUSE_INT operation". > > Thanks a lot for the full explanation! > I'm glad I asked this early enough so this was fresh in your > memory :) At least we got it archived on the list. > > Shouldn't QEMU announce this via a better mechanism? I.e. a > fw_cfg entry? This is so convoluted (as you said) than I'm > worried libfuse change a parameter and we end having a debug > nightmare figuring the root cause... Wait, I think you don't have the full picture. (1) In the FUSE_INIT *response*, virtiofsd advertizes "max_write". This value is critically important to the virtio-fs driver in OVMF (OvmfPkg/VirtioFsDxe), and indeed said driver adheres to this limit closely. When the driver serves an EFI_FILE_PROTOCOL.Write() request, it breaks the write into chunks of "max_write" bytes, for individual FUSE_WRITE operations to be sent to virtiofsd. Please refer to edk2 commits - 6f7bc7196ff2 ("OvmfPkg/VirtioFsDxe: implement the wrapper function for FUSE_WRITE", 2020-12-21), and - 44f43f94cebe ("OvmfPkg/VirtioFsDxe: implement EFI_FILE_PROTOCOL.Write()", 2020-12-21). In fact, max_write=128KB is obvious; the reason for that is that the virtiofsd log contains this value, when the FUSE_INIT request is processed and answered. This is one half of the picture -- the important half. (2) The much less important half is this patch. This patch is a "best effort" optimization, based on the *empirical value* of 128KB that's advertized for max_write. If virtiofsd were modified and it changed the max_write default to 64KB or 256KB, then this PCD setting would no longer match. But, that would not be a tragedy, exactly how it is not a tragedy *right now* that the current PCD setting is 4KB. The patch is a small, static optimization based on empirical data. It is not foolproof, and it doesn't *have* to be. So, the complexity is entirely hidden in how virtiofsd calculates the 128 KB value for max_write. That's in fact a virtiofsd implementation detail. What matters is that the OVMF driver for virtio-fs sees max_write=128KB *clearly announced* in the FUSE_INIT response, and that the driver rigorously honors that limit. Conversely, the present patch is absolutely secondary; it is a "low hanging fruit", ad-hoc optimization, at a higher level. The PCD change affects e.g. the CP command in the UEFI shell, even if you only use a FAT filesystem. In that case, the PCD change is entirely pointless -- but it does not hurt, and the change is still very simple. It's unreasonable to expect that the CP command flexibly adhere to the optimal / maximal write size for the particular filesystem it is writing to. That's not the job of the CP command -- it is the job of the filesystem driver under CP. But, by increasing the buffer size of CP (and of some other UEFI shell commands that work with files), based on some empirical data, we *permit* the virtio-fs driver to work better under CP -- without harming CP on filesystems different from virtio-fs. As the commit message says, "And when a Virtio Filesystem is not used, a 128KB chunk size is still not particularly wasteful". > > Anyhow this is not blocking this patch, so: > Reviewed-by: Philippe Mathieu-Daude Thanks! Laszlo