From: "Laszlo Ersek" <lersek@redhat.com>
To: Liran Alon <liran.alon@oracle.com>, devel@edk2.groups.io
Cc: nikita.leshchenko@oracle.com, aaron.young@oracle.com,
jordan.l.justen@intel.com, ard.biesheuvel@linaro.org
Subject: Re: [edk2-devel] [PATCH v2 15/17] OvmfPkg/PvScsiDxe: Support sending SCSI request and receive response
Date: Fri, 27 Mar 2020 22:05:17 +0100 [thread overview]
Message-ID: <fd33f341-22de-1d78-6ee6-4da006f2b3a8@redhat.com> (raw)
In-Reply-To: <6f32838a-d136-050e-7a05-f817da7954a8@oracle.com>
On 03/27/20 14:04, Liran Alon wrote:
>
> On 27/03/2020 14:26, Laszlo Ersek wrote:
>> On 03/25/20 17:10, Liran Alon wrote:
>>> +/**
>>> + Returns if PVSCSI request ring is full
>>> +**/
>>> +STATIC
>>> +BOOLEAN
>>> +PvScsiIsReqRingFull (
>>> + IN CONST PVSCSI_DEV *Dev
>>> + )
>>> +{
>>> + PVSCSI_RINGS_STATE *RingsState;
>>> + UINT32 ReqNumEntries;
>>> +
>>> + RingsState = Dev->RingDesc.RingState;
>>> + ReqNumEntries = 1U << RingsState->ReqNumEntriesLog2;
>>> + return (RingsState->ReqProdIdx - RingsState->CmpConsIdx) >=
>>> ReqNumEntries;
>>> +}
>> (Just some thoughts, not a request for changing the code.)
>>
>> Normally I prefer accessing buffers shared with the device though
>> volatile-qualified pointers.
>>
>> Meaning, in this case, that every "PCI host" pointer (i.e., each pointer
>> that is associated with a PVSCSI_DMA_DESC) would have to be
>> volatile-qualified. In particular:
>>
>> - in patch#13, PVSCSI_RING_DESC would have to be updated like this:
>>
>>> typedef struct {
>>> volatile PVSCSI_RINGS_STATE *RingState;
>>> PVSCSI_DMA_DESC RingStateDmaDesc;
>>>
>>> volatile PVSCSI_RING_REQ_DESC *RingReqs;
>>> PVSCSI_DMA_DESC RingReqsDmaDesc;
>>>
>>> volatile PVSCSI_RING_CMP_DESC *RingCmps;
>>> PVSCSI_DMA_DESC RingCmpsDmaDesc;
>>> } PVSCSI_RING_DESC;
>> - in patch#14, PVSCSI_DEV would change as follows:
>>
>>> typedef struct {
>>> UINT32 Signature;
>>> EFI_PCI_IO_PROTOCOL *PciIo;
>>> EFI_EVENT ExitBoot;
>>> UINT64 OriginalPciAttributes;
>>> PVSCSI_RING_DESC RingDesc;
>>> volatile PVSCSI_DMA_BUFFER *DmaBuf;
>>> PVSCSI_DMA_DESC DmaBufDmaDesc;
>>> UINT8 MaxTarget;
>>> UINT8 MaxLun;
>>> UINTN WaitForCmpStallInUsecs;
>>> EFI_EXT_SCSI_PASS_THRU_PROTOCOL PassThru;
>>> EFI_EXT_SCSI_PASS_THRU_MODE PassThruMode;
>>> } PVSCSI_DEV;
>> After these changes, the compiler would (justifiedly) flag a bunch of
>> code locations casting away the volatile qualification -- for example,
>> in the above function, in the assignment to the "RingsState" local
>> variable.
>>
>> Clearly, most of these compilation errors would have to be fixed (not
>> suppressed), because they would be valid. Meaning:
>>
>> - you'd have to volatile-qualify the "RingsState" local variable in all
>> of PvScsiIsReqRingFull(), PvScsiGetCurrentRequest(),
>> PvScsiWaitForRequestCompletion();
>>
>> - you'd also have to volatile-qualify the return types of
>> PvScsiGetCurrentRequest() and PvScsiWaitForRequestCompletion();
>>
>> - you'd have to update PopulateRequest() and HandleResponse() too; and
>> the most annoying part of that would be that you could no longer use
>> CopyMem() and ZeroMem() -- because those functions take
>> pointer-to-void parameters, rather than pointer-to-volatile-void ones.
>>
>> (FWIW, we wouldn't have to change the PvScsiFreeSharedPages() prototype
>> -- it would be OK to cast away volatile in those calls, as we wouldn't
>> dereference the pointers in that case.)
>>
>> So... the reason I'm not actually requesting these
>> volatile-qualifications is that (a) your use of MemoryFence() seems
>> mostly OK, and (b) the UEFI Driver Writer's guide recommends *either*
>> volatile *or* MemoryFence(). Of course using both techniques at the same
>> time is not a problem -- and in code I write I actually like to use both
>> at the same time --, but just one suffices too. (See section 4.2.6
>> "Memory ordering" in the DWG.)
>>
>> The reason I'm writing this up here is because I want the "record" (the
>> mailing list archive) to show that we have considered this topic
>> explicitly.
> I prefer to remain with only memory fences if that's OK by you.
Yes, that's fine.
> As the code is written now.
> As it's allows for potential compiler optimization and leads to more
> readable code in my opinion.
The UEFI Driver Writer's Guide makes the same argument -- it favors
explicit MemoryFence()s over volatile. So your suggestion is entirely
valid and I agree with it.
>> Back to your patch:
>>
>> On 03/25/20 17:10, Liran Alon wrote:
>>> + //
>>> + // This cast is safe as MaxLun is defined as UINT8
>>> + //
>>> + Request->Lun[1] = (UINT8)Lun;
>>> + Request->SenseLen = Packet->SenseDataLength;
>> Ah, *now* I understand why you chose MAX_UINT8 as the size of
>> "PVSCSI_DMA_BUFFER.SenseData". Because, "Packet->SenseDataLength" has
>> type UINT8, and this way you guarantee that the SCSI client's
>> "Packet->SenseDataLength" will always fit in the DMA buffer.
>>
>> Good solution, but it *absolutely* needs to be documented in patch#14
>> ("OvmfPkg/PvScsiDxe: Introduce DMA communication buffer") -- in fact,
>> see my question (4) under patch#14.
> Please read the response I have written you to your patch#14 review.
> Where I suggest we define a constant in IndustryStandard/Scsi.h for the
> limit of the total length of SenseData that is defined to be 252
> according to SCSI specification.
MdePkg macro is good, but it should be decoupled from this series.
>>
>> (2) Also, please add a comment here that a "Dev->DmaBuf->SenseData"
>> overflow is not possible due to "Packet->SenseDataLength" having type
>> UINT8.
>>
>> This would be a comment in the same vein as the "MaxLun" reference just
>> above -- I find *that* comment very helpful, too.
> OK.
>>> +
>>> + return EFI_SUCCESS;
>>> +}
>>> +
>>> +/**
>>> + Handle the PVSCSI device response:
>>> + - Copy returned data from DMA communication buffer.
>>> + - Update fields in Extended SCSI Pass Thru Protocol packet as
>>> required.
>>> + - Translate response code to EFI status code and host adapter status.
>>> +**/
>>> +STATIC
>>> +EFI_STATUS
>>> +HandleResponse (
>>> + IN PVSCSI_DEV *Dev,
>>> + IN OUT EFI_EXT_SCSI_PASS_THRU_SCSI_REQUEST_PACKET *Packet,
>>> + IN CONST PVSCSI_RING_CMP_DESC *Response
>>> + )
>>> +{
>>> + //
>>> + // Check if device returned sense data
>>> + //
>>> + if (Response->ScsiStatus ==
>>> EFI_EXT_SCSI_STATUS_TARGET_CHECK_CONDITION) {
>>> + //
>>> + // Fix SenseDataLength to amount of data returned
>>> + //
>>> + if (Packet->SenseDataLength > Response->SenseLen) {
>>> + Packet->SenseDataLength = (UINT8)Response->SenseLen;
>>> + }
>>> + //
>>> + // Copy sense data from DMA communication buffer
>>> + //
>>> + CopyMem (
>>> + Packet->SenseData,
>>> + Dev->DmaBuf->SenseData,
>>> + Packet->SenseDataLength
>>> + );
>>> + } else {
>>> + //
>>> + // Signal no sense data returned
>>> + //
>>> + Packet->SenseDataLength = 0;
>>> + }
>>> +
>>> + //
>>> + // Copy device output from DMA communication buffer
>>> + //
>>> + if (Packet->DataDirection == EFI_EXT_SCSI_DATA_DIRECTION_READ) {
>>> + CopyMem (Packet->InDataBuffer, Dev->DmaBuf->Data,
>>> Packet->InTransferLength);
>>> + }
>> I'm unfamilar with the PVSCSI device model, but I think this is not
>> general enough. The "PVSCSI_RING_CMP_DESC.DataLen" field suggests that
>> short reads are possible at least in theory.
>>
>> (5) If a short read occurs (Response->DataLen <
>> Packet->InTransferLength), then we should adjust
>> "Packet->InTransferLength", and also copy that many bytes only.
>>
>> (6) I think it would be prudent to update "Packet->OutTransferLength"
>> too, for short writes.
> As you can see below, this is done in case device return
> Response->HostStatus as either PvScsiBtStatDatarun or
> PvScsiBtStatDataUnderrun.
>>
>>> +
>>> + //
>>> + // Report target status
>>> + //
>>> + Packet->TargetStatus = Response->ScsiStatus;
>>> +
>>> + //
>>> + // Host adapter status and function return value depend on
>>> + // device response's host status
>>> + //
>>> + switch (Response->HostStatus) {
>>> + case PvScsiBtStatSuccess:
>>> + case PvScsiBtStatLinkedCommandCompleted:
>>> + case PvScsiBtStatLinkedCommandCompletedWithFlag:
>>> + Packet->HostAdapterStatus = EFI_EXT_SCSI_STATUS_HOST_ADAPTER_OK;
>>> + return EFI_SUCCESS;
>>> +
>>> + case PvScsiBtStatSelTimeout:
>>> + Packet->HostAdapterStatus =
>>> + EFI_EXT_SCSI_STATUS_HOST_ADAPTER_SELECTION_TIMEOUT;
>>> + return EFI_TIMEOUT;
>>> +
>>> + case PvScsiBtStatDatarun:
>>> + case :
>>> + //
>>> + // Report residual data in overrun/underrun
>>> + //
>>> + if (Packet->DataDirection == EFI_EXT_SCSI_DATA_DIRECTION_READ) {
>>> + Packet->InTransferLength = Response->DataLen;
>>> + } else {
>>> + Packet->OutTransferLength = Response->DataLen;
>>> + }
>> OK, if we are sure that (a) the device will always report short
>> reads/writes like this, and that (b) the above assignments will never
>> cause InTransferLength / OutTransferLength to *grow*, then the
>> InTransferLength / OutTransferLength adjustments are sufficiently
>> covered.
> I believe both of these are indeed true.
> Even though that current QEMU VMware PVSCSI device emulation code have a
> bug that it never sets this in pvscsi_command_complete() when it does
> set BTSTAT_DATARUN...
>> Still:
>>
>> (8) The CopyMem() call above should not copy garbage (at the tail).
> I don't think it matters. We don't guarantee anything on the content in
> Packet->InDataBuffer beyond Packet->InTransferLength.
> I think the code is simpler how it is currently written.
I'm not convinced, but this is not a question I feel very strongly
about. I OK to go with your preference.
>>
>> Honestly, *if* the PVSCSI device model always sets "Response->DataLen",
> I don't think this is the case.
>> then I would prefer if:
>>
>> - we always updated InTransferLength / OutTransferLength (regardless of
>> "Response->HostStatus"),
>>
>> - and we only used these case labels (PvScsiBtStatDatarun /
>> PvScsiBtStatDataUnderrun) for setting "Packet->HostAdapterStatus".
>>
>>> + Packet->HostAdapterStatus =
>>> + EFI_EXT_SCSI_STATUS_HOST_ADAPTER_DATA_OVERRUN_UNDERRUN;
>>> + return EFI_BAD_BUFFER_SIZE;
>> I think EFI_BAD_BUFFER_SIZE is invalid here. According to the UEFI spec,
>> EFI_BAD_BUFFER_SIZE means "The SCSI Request Packet was not executed".
>> But that's not the case here -- we do have a partially completed
>> transfer.
>
> Hmm... According to the documentation above EFI_SCSI_PASS_THRU_PASSTHRU
> in MdePkg/Include/Protocol/ScsiPassThru.h:
>
> @retval EFI_BAD_BUFFER_SIZE The SCSI Request Packet was
> executed, but the
> entire DataBuffer could not be
> transferred.
> The actual number of bytes
> transferred is returned
> in TransferLength. See
> HostAdapterStatus,
> TargetStatus, SenseDataLength, and
> SenseData in
> that order for additional status
> information.
>
> So I don't know who to believe... It does seem to me that this
> documentation in the code makes more sense
> and then my current code is correct. What do you think?
You are looking at the wrong protocol header file. The top of this
header file bears the comment
SCSI Pass Through protocol as defined in EFI 1.1.
and the UEFI-2.8 spec does not define EFI_SCSI_PASS_THRU_PROTOCOL; it
only refers to Mantis ticket 845
<https://mantis.uefi.org/mantis/view.php?id=845> with subject
"EFI_SCSI_PASS_THRU_PROTOCOL replacement".
Instead, please consult EFI_EXT_SCSI_PASS_THRU_PASSTHRU in
"MdePkg/Include/Protocol/ScsiPassThruExt.h". There, the
EFI_BAD_BUFFER_SIZE return value conforms to the UEFI 2.8 spec ("The
SCSI Request Packet was not executed").
>
>>
>> (9) Thus I feel we should use a "break" here.
>>
>>> +
>>> + case PvScsiBtStatBusFree:
>>> + Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_BUS_FREE;
>>> + break;
>>> +
>>> + case PvScsiBtStatInvPhase:
>>> + Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_PHASE_ERROR;
>>> + break;
>>> +
>>> + case PvScsiBtStatSensFailed:
>>> + Packet->HostAdapterStatus =
>>> + EFI_EXT_SCSI_STATUS_HOST_ADAPTER_REQUEST_SENSE_FAILED;
>>> + break;
>>> +
>>> + case PvScsiBtStatTagReject:
>>> + case PvScsiBtStatBadMsg:
>>> + Packet->HostAdapterStatus =
>>> + EFI_EXT_SCSI_STATUS_HOST_ADAPTER_MESSAGE_REJECT;
>>> + break;
>>> +
>>> + case PvScsiBtStatBusReset:
>>> + Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_BUS_RESET;
>>> + break;
>>> +
>>> + case PvScsiBtStatHaTimeout:
>>> + Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_TIMEOUT;
>>> + return EFI_TIMEOUT;
>>> +
>>> + case PvScsiBtStatScsiParity:
>>> + Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_PARITY_ERROR;
>>> + break;
>>> +
>>> + default:
>>> + Packet->HostAdapterStatus =
>>> EFI_EXT_SCSI_STATUS_HOST_ADAPTER_OTHER;
>>> + break;
>>> + }
>>> +
>>> + return EFI_DEVICE_ERROR;
>>> +}
>>> +
>>>
>>> //
>>> // Ext SCSI Pass Thru
>>> //
>>> @@ -144,7 +528,62 @@ PvScsiPassThru (
>>> IN EFI_EVENT Event OPTIONAL
>>> )
>>> {
>>> - return EFI_UNSUPPORTED;
>>> + PVSCSI_DEV *Dev;
>>> + EFI_STATUS Status;
>>> + PVSCSI_RING_REQ_DESC *Request;
>>> + PVSCSI_RING_CMP_DESC *Response;
>>> +
>>> + Dev = PVSCSI_FROM_PASS_THRU (This);
>>> +
>>> + if (PvScsiIsReqRingFull (Dev)) {
>>> + return EFI_NOT_READY;
>>> + }
>>> +
>>> + Request = PvScsiGetCurrentRequest (Dev);
>>> +
>>> + Status = PopulateRequest (Dev, Target, Lun, Packet, Request);
>>> + if (EFI_ERROR (Status)) {
>>> + return Status;
>>> + }
>>> +
>>> + //
>>> + // Writes to Request must be globally visible before making request
>>> + // available to device
>>> + //
>>> + MemoryFence ();
>>> + Dev->RingDesc.RingState->ReqProdIdx++;
>>> +
>> (10) Please insert another MemoryFence () here.
>
> That would be unnecessary and wrong.
>
> The MemoryFence() here is used to make sure the request is globally
> visible before the update to the producer-index.
I agree.
> As in any
> circular-buffer implementation.
> There is no need for an additional MemoryFence() here.
>
> Note that the MMIO access below is guaranteed to be globally visible
> only after the write to the producer-index.
Yes, that was the goal of my suggestion. What guarantees it?
> If EDK2 MMIO accessors wouldn't have guaranteed this, you would have a
> very broken code base...
> Similar to why Linux MMIO accessors (e.g. writel()) macros guarantee these.
>
> For example, see how MdePkg/Library/BaseIoLibIntrinsic/IoLib.c
> MmioWrite32() internally calls MemoryFence() before and after MMIO
> access itself.
So basically you are saying that I proposed the right thing, except
there is no need to spell it out here, because the MMIO accessor
primitives already cover that internally :)
I admit that I have not been aware of the internal fences!
(And given that there is a specific commit in the git history to push
the fences into the source file you mention, namely 9de780dcd6208, I do
think my suggestion was not "wrong", only unnecessary.)
I do agree that the MemoryFence() need not be added in this spot. Thanks
for making me aware of the internal fences!
>
>>
>>> + Status = PvScsiMmioWrite32 (Dev, PvScsiRegOffsetKickRwIo, 0);
>>> + if (EFI_ERROR (Status)) {
>>> + //
>>> + // If kicking the host fails, we must fake a host adapter error.
>>> + // EFI_NOT_READY would save us the effort, but it would also
>>> suggest that
>>> + // the caller retry.
>>> + //
>>> + return ReportHostAdapterError (Packet);
>>> + }
>>> +
>>> + Status = PvScsiWaitForRequestCompletion (Dev);
>>> + if (EFI_ERROR (Status)) {
>>> + //
>>> + // If waiting for request completion fails, we must fake a host
>>> adapter
>>> + // error. EFI_NOT_READY would save us the effort, but it would
>>> also suggest
>>> + // that the caller retry.
>>> + //
>>> + return ReportHostAdapterError (Packet);
>>> + }
>>> +
>> (11) Please insert a MemoryFence() here.
>
> Why is a MemoryFence() needed here? I don't think that's true.
>
> PvScsiWaitForRequestCompletion() ends with an MMIO write which is
> guaranteed to be a memory fence.
Yes, I see that now. My point was that a fence needed to *occur* here. I
didn't realize it was already covered, internally.
> Thus, there is no need for a MemoryFence() here (to serve as a rmb()) to
> make sure the completion-descriptor is globally visible.
Agreed.
Thanks,
Laszlo
next prev parent reply other threads:[~2020-03-27 21:05 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-25 16:09 [PATCH v2 00/17] OvmfPkg: Support booting from VMware PVSCSI controller Liran Alon
2020-03-25 16:09 ` [PATCH v2 01/17] OvmfPkg/PvScsiDxe: Create empty driver Liran Alon
2020-03-26 14:44 ` [edk2-devel] " Laszlo Ersek
2020-03-25 16:09 ` [PATCH v2 02/17] OvmfPkg/PvScsiDxe: Install DriverBinding protocol Liran Alon
2020-03-25 16:09 ` [PATCH v2 03/17] OvmfPkg/PvScsiDxe: Report name of driver Liran Alon
2020-03-25 16:09 ` [PATCH v2 04/17] OvmfPkg/PvScsiDxe: Probe PCI devices and look for PvScsi Liran Alon
2020-03-25 16:09 ` [PATCH v2 05/17] OvmfPkg/PvScsiDxe: Install stubbed EXT_SCSI_PASS_THRU Liran Alon
2020-03-25 16:09 ` [PATCH v2 06/17] OvmfPkg/PvScsiDxe: Report the number of targets and LUNs Liran Alon
2020-03-25 16:09 ` [PATCH v2 07/17] OvmfPkg/PvScsiDxe: Translate Target & LUN to/from DevicePath Liran Alon
2020-03-25 16:09 ` [PATCH v2 08/17] OvmfPkg/PvScsiDxe: Open PciIo protocol for later use Liran Alon
2020-03-25 16:09 ` [PATCH v2 09/17] OvmfPkg/PvScsiDxe: Backup/Restore PCI attributes on Init/UnInit Liran Alon
2020-03-26 17:04 ` [edk2-devel] " Laszlo Ersek
2020-03-25 16:09 ` [PATCH v2 10/17] OvmfPkg/PvScsiDxe: Enable MMIO-Space & Bus-Mastering in PCI attributes Liran Alon
2020-03-26 17:12 ` Laszlo Ersek
2020-03-25 16:09 ` [PATCH v2 11/17] OvmfPkg/PvScsiDxe: Define device interface structures and constants Liran Alon
2020-03-26 17:19 ` [edk2-devel] " Laszlo Ersek
2020-03-25 16:10 ` [PATCH v2 12/17] OvmfPkg/PvScsiDxe: Reset adapter on init Liran Alon
2020-03-26 18:25 ` [edk2-devel] " Laszlo Ersek
2020-03-25 16:10 ` [PATCH v2 13/17] OvmfPkg/PvScsiDxe: Setup requests and completions rings Liran Alon
2020-03-26 20:51 ` Laszlo Ersek
2020-03-25 16:10 ` [PATCH v2 14/17] OvmfPkg/PvScsiDxe: Introduce DMA communication buffer Liran Alon
2020-03-26 22:17 ` Laszlo Ersek
2020-03-27 0:05 ` Liran Alon
2020-03-27 13:35 ` Laszlo Ersek
2020-03-27 21:31 ` Liran Alon
2020-03-30 11:29 ` Laszlo Ersek
2020-03-25 16:10 ` [PATCH v2 15/17] OvmfPkg/PvScsiDxe: Support sending SCSI request and receive response Liran Alon
2020-03-27 11:26 ` [edk2-devel] " Laszlo Ersek
2020-03-27 13:04 ` Liran Alon
2020-03-27 13:20 ` Liran Alon
2020-03-27 21:05 ` Laszlo Ersek
2020-03-27 21:05 ` Laszlo Ersek [this message]
2020-03-27 22:04 ` Liran Alon
2020-03-27 22:17 ` Liran Alon
2020-03-28 19:18 ` Liran Alon
2020-03-30 11:23 ` Laszlo Ersek
2020-03-30 11:12 ` Laszlo Ersek
2020-03-30 10:30 ` Laszlo Ersek
2020-03-25 16:10 ` [PATCH v2 16/17] OvmfPkg/PvScsiDxe: Reset device on ExitBootServices() Liran Alon
2020-03-25 16:10 ` [PATCH v2 17/17] OvmfPkg/PvScsiDxe: Enable device 64-bit DMA addresses Liran Alon
2020-03-26 22:29 ` [edk2-devel] " Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fd33f341-22de-1d78-6ee6-4da006f2b3a8@redhat.com \
--to=devel@edk2.groups.io \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox