From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from userp2130.oracle.com (userp2130.oracle.com [156.151.31.86]) by mx.groups.io with SMTP id smtpd.web12.10892.1585314302182082092 for ; Fri, 27 Mar 2020 06:05:02 -0700 Authentication-Results: mx.groups.io; dkim=fail reason="body hash did not verify" header.i=@oracle.com header.s=corp-2020-01-29 header.b=oXnhfbGF; spf=pass (domain: oracle.com, ip: 156.151.31.86, mailfrom: liran.alon@oracle.com) Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 02RD3HGA079273; Fri, 27 Mar 2020 13:05:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2020-01-29; bh=lsZunHAXbiWRSMv84Ro/rnHuRJ5XENk9Q0ekdnFBvSM=; b=oXnhfbGF8hLbvjbXfC+hw1QNAW9XX7K6p5twC+eYewmN8fbRT8Sy5SXiGamd5uoWu0QC bR95+AVevf1BUjP8f3liPwb2ACx2quEcwZ3qNqA+En/L/S7+HuM2lTPDy0hhorYgo3Oy XdG3FFxmHYtKFW/hNjyLSJvGQ+ElQrk1uKE2NlCP32vA3wVCvqkzZO2Q+RnQsss9K9NK qQpQ6dTaM9PBgCUKUj2WTj2JvDhmkDcYvopvSzg6gCHHOJ2FtOyPtjPm5rigvo+lVJze w4dUcOecxbe82XoXFelyQU92tKG3d4OT1Gi/0QUDuwXVg1kNYHj5C8VVd8FTU2IXYURj /Q== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by userp2130.oracle.com with ESMTP id 301459aysk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 27 Mar 2020 13:05:01 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 02RD1ldg142540; Fri, 27 Mar 2020 13:05:01 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userp3020.oracle.com with ESMTP id 3003gnx1aw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 27 Mar 2020 13:05:00 +0000 Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id 02RD4xAB015149; Fri, 27 Mar 2020 13:04:59 GMT Received: from [192.168.14.112] (/79.180.216.197) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 27 Mar 2020 06:04:59 -0700 Subject: Re: [edk2-devel] [PATCH v2 15/17] OvmfPkg/PvScsiDxe: Support sending SCSI request and receive response To: Laszlo Ersek , devel@edk2.groups.io Cc: nikita.leshchenko@oracle.com, aaron.young@oracle.com, jordan.l.justen@intel.com, ard.biesheuvel@linaro.org References: <20200325161005.16743-1-liran.alon@oracle.com> <20200325161005.16743-16-liran.alon@oracle.com> From: "Liran Alon" Message-ID: <6f32838a-d136-050e-7a05-f817da7954a8@oracle.com> Date: Fri, 27 Mar 2020 16:04:55 +0300 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9572 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 mlxlogscore=999 bulkscore=0 phishscore=0 adultscore=0 spamscore=0 malwarescore=0 suspectscore=2 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003270120 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9572 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 phishscore=0 priorityscore=1501 bulkscore=0 lowpriorityscore=0 mlxlogscore=999 adultscore=0 suspectscore=2 impostorscore=0 mlxscore=0 spamscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2003020000 definitions=main-2003270120 X-MIME-Autoconverted: from 8bit to quoted-printable by userp2130.oracle.com id 02RD3HGA079273 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 27/03/2020 14:26, Laszlo Ersek wrote: > On 03/25/20 17:10, Liran Alon wrote: >> +/** >> + Returns if PVSCSI request ring is full >> +**/ >> +STATIC >> +BOOLEAN >> +PvScsiIsReqRingFull ( >> + IN CONST PVSCSI_DEV *Dev >> + ) >> +{ >> + PVSCSI_RINGS_STATE *RingsState; >> + UINT32 ReqNumEntries; >> + >> + RingsState =3D Dev->RingDesc.RingState; >> + ReqNumEntries =3D 1U << RingsState->ReqNumEntriesLog2; >> + return (RingsState->ReqProdIdx - RingsState->CmpConsIdx) >=3D ReqNu= mEntries; >> +} > (Just some thoughts, not a request for changing the code.) > > Normally I prefer accessing buffers shared with the device though > volatile-qualified pointers. > > Meaning, in this case, that every "PCI host" pointer (i.e., each pointe= r > that is associated with a PVSCSI_DMA_DESC) would have to be > volatile-qualified. In particular: > > - in patch#13, PVSCSI_RING_DESC would have to be updated like this: > >> typedef struct { >> volatile PVSCSI_RINGS_STATE *RingState; >> PVSCSI_DMA_DESC RingStateDmaDesc; >> >> volatile PVSCSI_RING_REQ_DESC *RingReqs; >> PVSCSI_DMA_DESC RingReqsDmaDesc; >> >> volatile PVSCSI_RING_CMP_DESC *RingCmps; >> PVSCSI_DMA_DESC RingCmpsDmaDesc; >> } PVSCSI_RING_DESC; > - in patch#14, PVSCSI_DEV would change as follows: > >> typedef struct { >> UINT32 Signature; >> EFI_PCI_IO_PROTOCOL *PciIo; >> EFI_EVENT ExitBoot; >> UINT64 OriginalPciAttributes; >> PVSCSI_RING_DESC RingDesc; >> volatile PVSCSI_DMA_BUFFER *DmaBuf; >> PVSCSI_DMA_DESC DmaBufDmaDesc; >> UINT8 MaxTarget; >> UINT8 MaxLun; >> UINTN WaitForCmpStallInUsecs; >> EFI_EXT_SCSI_PASS_THRU_PROTOCOL PassThru; >> EFI_EXT_SCSI_PASS_THRU_MODE PassThruMode; >> } PVSCSI_DEV; > After these changes, the compiler would (justifiedly) flag a bunch of > code locations casting away the volatile qualification -- for example, > in the above function, in the assignment to the "RingsState" local > variable. > > Clearly, most of these compilation errors would have to be fixed (not > suppressed), because they would be valid. Meaning: > > - you'd have to volatile-qualify the "RingsState" local variable in all > of PvScsiIsReqRingFull(), PvScsiGetCurrentRequest(), > PvScsiWaitForRequestCompletion(); > > - you'd also have to volatile-qualify the return types of > PvScsiGetCurrentRequest() and PvScsiWaitForRequestCompletion(); > > - you'd have to update PopulateRequest() and HandleResponse() too; and > the most annoying part of that would be that you could no longer use > CopyMem() and ZeroMem() -- because those functions take > pointer-to-void parameters, rather than pointer-to-volatile-void one= s. > > (FWIW, we wouldn't have to change the PvScsiFreeSharedPages() prototype > -- it would be OK to cast away volatile in those calls, as we wouldn't > dereference the pointers in that case.) > > So... the reason I'm not actually requesting these > volatile-qualifications is that (a) your use of MemoryFence() seems > mostly OK, and (b) the UEFI Driver Writer's guide recommends *either* > volatile *or* MemoryFence(). Of course using both techniques at the sam= e > time is not a problem -- and in code I write I actually like to use bot= h > at the same time --, but just one suffices too. (See section 4.2.6 > "Memory ordering" in the DWG.) > > The reason I'm writing this up here is because I want the "record" (the > mailing list archive) to show that we have considered this topic > explicitly. I prefer to remain with only memory fences if that's OK by you. As the=20 code is written now. As it's allows for potential compiler optimization and leads to more=20 readable code in my opinion. > Back to your patch: > > On 03/25/20 17:10, Liran Alon wrote: >> + // >> + // This cast is safe as MaxLun is defined as UINT8 >> + // >> + Request->Lun[1] =3D (UINT8)Lun; >> + Request->SenseLen =3D Packet->SenseDataLength; > Ah, *now* I understand why you chose MAX_UINT8 as the size of > "PVSCSI_DMA_BUFFER.SenseData". Because, "Packet->SenseDataLength" has > type UINT8, and this way you guarantee that the SCSI client's > "Packet->SenseDataLength" will always fit in the DMA buffer. > > Good solution, but it *absolutely* needs to be documented in patch#14 > ("OvmfPkg/PvScsiDxe: Introduce DMA communication buffer") -- in fact, > see my question (4) under patch#14. Please read the response I have written you to your patch#14 review. Where I suggest we define a constant in IndustryStandard/Scsi.h for the=20 limit of the total length of SenseData that is defined to be 252=20 according to SCSI specification. > > (2) Also, please add a comment here that a "Dev->DmaBuf->SenseData" > overflow is not possible due to "Packet->SenseDataLength" having type > UINT8. > > This would be a comment in the same vein as the "MaxLun" reference just > above -- I find *that* comment very helpful, too. OK. >> + >> + return EFI_SUCCESS; >> +} >> + >> +/** >> + Handle the PVSCSI device response: >> + - Copy returned data from DMA communication buffer. >> + - Update fields in Extended SCSI Pass Thru Protocol packet as requi= red. >> + - Translate response code to EFI status code and host adapter statu= s. >> +**/ >> +STATIC >> +EFI_STATUS >> +HandleResponse ( >> + IN PVSCSI_DEV *Dev, >> + IN OUT EFI_EXT_SCSI_PASS_THRU_SCSI_REQUEST_PACKET *Packet, >> + IN CONST PVSCSI_RING_CMP_DESC *Response >> + ) >> +{ >> + // >> + // Check if device returned sense data >> + // >> + if (Response->ScsiStatus =3D=3D EFI_EXT_SCSI_STATUS_TARGET_CHECK_CO= NDITION) { >> + // >> + // Fix SenseDataLength to amount of data returned >> + // >> + if (Packet->SenseDataLength > Response->SenseLen) { >> + Packet->SenseDataLength =3D (UINT8)Response->SenseLen; >> + } >> + // >> + // Copy sense data from DMA communication buffer >> + // >> + CopyMem ( >> + Packet->SenseData, >> + Dev->DmaBuf->SenseData, >> + Packet->SenseDataLength >> + ); >> + } else { >> + // >> + // Signal no sense data returned >> + // >> + Packet->SenseDataLength =3D 0; >> + } >> + >> + // >> + // Copy device output from DMA communication buffer >> + // >> + if (Packet->DataDirection =3D=3D EFI_EXT_SCSI_DATA_DIRECTION_READ) = { >> + CopyMem (Packet->InDataBuffer, Dev->DmaBuf->Data, Packet->InTrans= ferLength); >> + } > I'm unfamilar with the PVSCSI device model, but I think this is not > general enough. The "PVSCSI_RING_CMP_DESC.DataLen" field suggests that > short reads are possible at least in theory. > > (5) If a short read occurs (Response->DataLen < > Packet->InTransferLength), then we should adjust > "Packet->InTransferLength", and also copy that many bytes only. > > (6) I think it would be prudent to update "Packet->OutTransferLength" > too, for short writes. As you can see below, this is done in case device return=20 Response->HostStatus as either PvScsiBtStatDatarun or=20 PvScsiBtStatDataUnderrun. > >> + >> + // >> + // Report target status >> + // >> + Packet->TargetStatus =3D Response->ScsiStatus; >> + >> + // >> + // Host adapter status and function return value depend on >> + // device response's host status >> + // >> + switch (Response->HostStatus) { >> + case PvScsiBtStatSuccess: >> + case PvScsiBtStatLinkedCommandCompleted: >> + case PvScsiBtStatLinkedCommandCompletedWithFlag: >> + Packet->HostAdapterStatus =3D EFI_EXT_SCSI_STATUS_HOST_ADAPTER_= OK; >> + return EFI_SUCCESS; >> + >> + case PvScsiBtStatSelTimeout: >> + Packet->HostAdapterStatus =3D >> + EFI_EXT_SCSI_STATUS_HOST_ADAPTER_SELECTION_TIMEOUT; >> + return EFI_TIMEOUT; >> + >> + case PvScsiBtStatDatarun: >> + case : >> + // >> + // Report residual data in overrun/underrun >> + // >> + if (Packet->DataDirection =3D=3D EFI_EXT_SCSI_DATA_DIRECTION_RE= AD) { >> + Packet->InTransferLength =3D Response->DataLen; >> + } else { >> + Packet->OutTransferLength =3D Response->DataLen; >> + } > OK, if we are sure that (a) the device will always report short > reads/writes like this, and that (b) the above assignments will never > cause InTransferLength / OutTransferLength to *grow*, then the > InTransferLength / OutTransferLength adjustments are sufficiently > covered. I believe both of these are indeed true. Even though that current QEMU VMware PVSCSI device emulation code have a=20 bug that it never sets this in pvscsi_command_complete() when it does=20 set BTSTAT_DATARUN... > Still: > > (8) The CopyMem() call above should not copy garbage (at the tail). I don't think it matters. We don't guarantee anything on the content in=20 Packet->InDataBuffer beyond Packet->InTransferLength. I think the code is simpler how it is currently written. > > Honestly, *if* the PVSCSI device model always sets "Response->DataLen", I don't think this is the case. > then I would prefer if: > > - we always updated InTransferLength / OutTransferLength (regardless of > "Response->HostStatus"), > > - and we only used these case labels (PvScsiBtStatDatarun / > PvScsiBtStatDataUnderrun) for setting "Packet->HostAdapterStatus". > >> + Packet->HostAdapterStatus =3D >> + EFI_EXT_SCSI_STATUS_HOST_ADAPTER_DATA_OVERRUN_UNDERRU= N; >> + return EFI_BAD_BUFFER_SIZE; > I think EFI_BAD_BUFFER_SIZE is invalid here. According to the UEFI spec= , > EFI_BAD_BUFFER_SIZE means "The SCSI Request Packet was not executed". > But that's not the case here -- we do have a partially completed > transfer. Hmm... According to the documentation above EFI_SCSI_PASS_THRU_PASSTHRU=20 in MdePkg/Include/Protocol/ScsiPassThru.h: =A0 @retval EFI_BAD_BUFFER_SIZE=A0=A0=A0=A0=A0=A0 The SCSI Request Packe= t was=20 executed, but the =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 entire DataBuffer could not be=20 transferred. =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 The actual number of bytes=20 transferred is returned =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 in TransferLength. See=20 HostAdapterStatus, =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 TargetStatus, SenseDataLength, and=20 SenseData in =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 that order for additional status=20 information. So I don't know who to believe... It does seem to me that this=20 documentation in the code makes more sense and then my current code is correct. What do you think? > > (9) Thus I feel we should use a "break" here. > >> + >> + case PvScsiBtStatBusFree: >> + Packet->HostAdapterStatus =3D EFI_EXT_SCSI_STATUS_HOST_ADAPTER_= BUS_FREE; >> + break; >> + >> + case PvScsiBtStatInvPhase: >> + Packet->HostAdapterStatus =3D EFI_EXT_SCSI_STATUS_HOST_ADAPTER_= PHASE_ERROR; >> + break; >> + >> + case PvScsiBtStatSensFailed: >> + Packet->HostAdapterStatus =3D >> + EFI_EXT_SCSI_STATUS_HOST_ADAPTER_REQUEST_SENSE_FAILED= ; >> + break; >> + >> + case PvScsiBtStatTagReject: >> + case PvScsiBtStatBadMsg: >> + Packet->HostAdapterStatus =3D >> + EFI_EXT_SCSI_STATUS_HOST_ADAPTER_MESSAGE_REJECT; >> + break; >> + >> + case PvScsiBtStatBusReset: >> + Packet->HostAdapterStatus =3D EFI_EXT_SCSI_STATUS_HOST_ADAPTER_= BUS_RESET; >> + break; >> + >> + case PvScsiBtStatHaTimeout: >> + Packet->HostAdapterStatus =3D EFI_EXT_SCSI_STATUS_HOST_ADAPTER_= TIMEOUT; >> + return EFI_TIMEOUT; >> + >> + case PvScsiBtStatScsiParity: >> + Packet->HostAdapterStatus =3D EFI_EXT_SCSI_STATUS_HOST_ADAPTER_= PARITY_ERROR; >> + break; >> + >> + default: >> + Packet->HostAdapterStatus =3D EFI_EXT_SCSI_STATUS_HOST_ADAPTER_= OTHER; >> + break; >> + } >> + >> + return EFI_DEVICE_ERROR; >> +} >> + >> >> // >> // Ext SCSI Pass Thru >> // >> @@ -144,7 +528,62 @@ PvScsiPassThru ( >> IN EFI_EVENT Event OPTIONA= L >> ) >> { >> - return EFI_UNSUPPORTED; >> + PVSCSI_DEV *Dev; >> + EFI_STATUS Status; >> + PVSCSI_RING_REQ_DESC *Request; >> + PVSCSI_RING_CMP_DESC *Response; >> + >> + Dev =3D PVSCSI_FROM_PASS_THRU (This); >> + >> + if (PvScsiIsReqRingFull (Dev)) { >> + return EFI_NOT_READY; >> + } >> + >> + Request =3D PvScsiGetCurrentRequest (Dev); >> + >> + Status =3D PopulateRequest (Dev, Target, Lun, Packet, Request); >> + if (EFI_ERROR (Status)) { >> + return Status; >> + } >> + >> + // >> + // Writes to Request must be globally visible before making request >> + // available to device >> + // >> + MemoryFence (); >> + Dev->RingDesc.RingState->ReqProdIdx++; >> + > (10) Please insert another MemoryFence () here. That would be unnecessary and wrong. The MemoryFence() here is used to make sure the request is globally=20 visible before the update to the producer-index. As in any=20 circular-buffer implementation. There is no need for an additional MemoryFence() here. Note that the MMIO access below is guaranteed to be globally visible=20 only after the write to the producer-index. If EDK2 MMIO accessors wouldn't have guaranteed this, you would have a=20 very broken code base... Similar to why Linux MMIO accessors (e.g. writel()) macros guarantee thes= e. For example, see how MdePkg/Library/BaseIoLibIntrinsic/IoLib.c=20 MmioWrite32() internally calls MemoryFence() before and after MMIO=20 access itself. > >> + Status =3D PvScsiMmioWrite32 (Dev, PvScsiRegOffsetKickRwIo, 0); >> + if (EFI_ERROR (Status)) { >> + // >> + // If kicking the host fails, we must fake a host adapter error. >> + // EFI_NOT_READY would save us the effort, but it would also sugg= est that >> + // the caller retry. >> + // >> + return ReportHostAdapterError (Packet); >> + } >> + >> + Status =3D PvScsiWaitForRequestCompletion (Dev); >> + if (EFI_ERROR (Status)) { >> + // >> + // If waiting for request completion fails, we must fake a host a= dapter >> + // error. EFI_NOT_READY would save us the effort, but it would al= so suggest >> + // that the caller retry. >> + // >> + return ReportHostAdapterError (Packet); >> + } >> + > (11) Please insert a MemoryFence() here. Why is a MemoryFence() needed here? I don't think that's true. PvScsiWaitForRequestCompletion() ends with an MMIO write which is=20 guaranteed to be a memory fence. Thus, there is no need for a MemoryFence() here (to serve as a rmb()) to=20 make sure the completion-descriptor is globally visible. Thanks, -Liran