From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com [IPv6:2607:f8b0:4003:c06::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 0CFE21A1E6E for ; Fri, 9 Sep 2016 04:13:36 -0700 (PDT) Received: by mail-oi0-x234.google.com with SMTP id q188so19332643oia.3 for ; Fri, 09 Sep 2016 04:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=+cqgSdbHuAK9uZQQonVQQUPZCnTWzBXU/1S4Fx46NW0=; b=dO/CjHl6/6EJnFG8aNyFAgw92x1oDQMsPNpesES7lnDjlPz7JAk/a57X+gqIfMcQFm jI6K1it6/7qVVES1dKXr1pwV7lVyG9oMgwl3bg0He6DOaHVV7L8PJkG8urbSjoulJRrZ pV5mY2dkGGH0k06c6BZIo/ANgR+GN5lwXFy6w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=+cqgSdbHuAK9uZQQonVQQUPZCnTWzBXU/1S4Fx46NW0=; b=YEuBN+muo39v8K2yWujMHm+W6DZGFtPJSlNZgth5UcFs7A+2dDoGrBgo7bOrn+wyON 4YlPcU5DZu6ptkvlpa5dNWHsS7ldxql7kbT5MOVK404QXo36duPeM7jPLwx95Iu0/gHe F3L5UtCyFGJpIAnK3NzcgRG/dYNaexDVYwpvOzk5F4isxRgBVg/OD5xEqmAf9r9U33ZG MQOGsx9wOpBr2MTczuE0LpOjqERMrAhxHNNSSnkBZDae1t+HPT0XfgImzyHRQHQJtxix 810jdoTR6TMAFviilAACcbtPjNCAXHlz29Rh2GBiEE4saJI7NzBk6ja1bEshlhe4ch+P 8ouw== X-Gm-Message-State: AE9vXwNJPNM+cNXHoEYQ6pMMarpgT5FhGwO7CpxqyDXJp+0TtCBydnDRgY36Bt9McQbQj0/vHGbT9Wow1giuVzqg X-Received: by 10.202.102.140 with SMTP id m12mr5095422oik.24.1473419615289; Fri, 09 Sep 2016 04:13:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.204.195 with HTTP; Fri, 9 Sep 2016 04:13:34 -0700 (PDT) In-Reply-To: <20160909110839.GL16080@bivouac.eciton.net> References: <1473355925-26678-1-git-send-email-ard.biesheuvel@linaro.org> <20160909110839.GL16080@bivouac.eciton.net> From: Ard Biesheuvel Date: Fri, 9 Sep 2016 12:13:34 +0100 Message-ID: To: Leif Lindholm Cc: edk2-devel-01 Subject: Re: [PATCH] ArmPlatformPkg/NorFlashDxe: use strictly aligned CopyMem() X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Sep 2016 11:13:36 -0000 Content-Type: text/plain; charset=UTF-8 On 9 September 2016 at 12:08, Leif Lindholm wrote: > On Thu, Sep 08, 2016 at 06:32:05PM +0100, Ard Biesheuvel wrote: >> The UEFI spec stipulates that unaligned accesses should be enabled >> on CPUs that support them, which means all of them, given that we >> no longer support pre-v7 ARM cores, and the AARCH64 bindings mandate >> support for unaligned accesses unconditionally. >> >> This means that one should not assume that CopyMem () is safe to call >> on regions that may be mapped using device attributes, which is the >> case for the NOR flash. Since we have no control over the mappings when >> running under the OS, and given that write accesses require device >> mappings, we should not call CopyMem () in the read path either, but >> use our own implementation that is guaranteed to take alignment into >> account. > > This is an important fix - but one comment (and a tiny bit of > bikeshedding) below: > >> Contributed-under: TianoCore Contribution Agreement 1.0 >> Signed-off-by: Ard Biesheuvel >> --- >> ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashDxe.c | 130 +++++++++++++++++++- >> 1 file changed, 128 insertions(+), 2 deletions(-) >> >> diff --git a/ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashDxe.c b/ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashDxe.c >> index 3abbe5cb32bc..45d1f0c20d52 100644 >> --- a/ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashDxe.c >> +++ b/ArmPlatformPkg/Drivers/NorFlashDxe/NorFlashDxe.c >> @@ -744,6 +744,132 @@ NorFlashWriteBlocks ( >> return Status; >> } >> >> +/** >> + Copy Length bytes from Source to Destination, using aligned accesses only > > Maybe a comment here clarifying that we're following CopyMem semantics > (buffers may overlap) rather than memcpy semantics (buffers must not > overlap)? > Well, now that you mention it, we don't require those semantics, and so I could rip out the code that deals with that. >> + >> + @param DestinationBuffer The target of the copy request. >> + @param SourceBuffer The place to copy from. >> + @param Length The number of bytes to copy. >> + >> + @return Destination >> + >> +**/ >> +STATIC >> +VOID * >> +AlignedCopyMem ( >> + OUT VOID *DestinationBuffer, >> + IN CONST VOID *SourceBuffer, >> + IN UINTN Length >> + ) >> +{ >> + UINT8 *Destination8; >> + CONST UINT8 *Source8; >> + UINT32 *Destination32; >> + CONST UINT32 *Source32; >> + UINT64 *Destination64; >> + CONST UINT64 *Source64; > > Can we be sure the compiler won't occasionally do something "clever" > unless the above are all pointer-to-volatile? > Well, the original version uses 'volatile' to prevent the compiler from using its intrinsic memcpy(), but in this case, that would be the best outcome. In fact, I briefly considered just using memcpy() ... >> + UINTN Alignment; >> + >> + if ((((UINTN)DestinationBuffer & 0x7) == 0) && (((UINTN)SourceBuffer & 0x7) == 0) && (Length >= 8)) { > > Could we have a macro in here to keep the line length down? > Like > BOTH_ALIGNED(x, y, z) (!(((UINTN)x & ((z >> 3) - 1)) | ((UINTN)y & ((z >> 3) - 1)))) > called as > BOTH_ALIGNED(DestinationBuffer, SourceBuffer, 64) > BOTH_ALIGNED(DestinationBuffer, SourceBuffer, 32) > Sure. >> + if (SourceBuffer > DestinationBuffer) { >> + Destination64 = (UINT64*)DestinationBuffer; >> + Source64 = (CONST UINT64*)SourceBuffer; >> + while (Length >= 8) { >> + *(Destination64++) = *(Source64++); >> + Length -= 8; >> + } >> + >> + // Finish if there are still some bytes to copy >> + Destination8 = (UINT8*)Destination64; >> + Source8 = (CONST UINT8*)Source64; >> + while (Length-- != 0) { >> + *(Destination8++) = *(Source8++); >> + } >> + } else if (SourceBuffer < DestinationBuffer) { >> + Destination64 = (UINT64*)((UINTN)DestinationBuffer + Length); >> + Source64 = (CONST UINT64*)((UINTN)SourceBuffer + Length); >> + >> + // Destination64 and Source64 were aligned on a 64-bit boundary >> + // but if length is not a multiple of 8 bytes then they won't be >> + // anymore. >> + >> + Alignment = Length & 0x7; >> + if (Alignment != 0) { >> + Destination8 = (UINT8*)Destination64; >> + Source8 = (CONST UINT8*)Source64; >> + >> + while (Alignment-- != 0) { >> + *(--Destination8) = *(--Source8); >> + --Length; >> + } >> + Destination64 = (UINT64*)Destination8; >> + Source64 = (CONST UINT64*)Source8; >> + } >> + >> + while (Length > 0) { >> + *(--Destination64) = *(--Source64); >> + Length -= 8; >> + } >> + } >> + } else if ((((UINTN)DestinationBuffer & 0x3) == 0) && (((UINTN)SourceBuffer & 0x3) == 0) && (Length >= 4)) { >> + if (SourceBuffer > DestinationBuffer) { >> + Destination32 = (UINT32*)DestinationBuffer; >> + Source32 = (CONST UINT32*)SourceBuffer; >> + while (Length >= 4) { >> + *(Destination32++) = *(Source32++); >> + Length -= 4; >> + } >> + >> + // Finish if there are still some bytes to copy >> + Destination8 = (UINT8*)Destination32; >> + Source8 = (CONST UINT8*)Source32; >> + while (Length-- != 0) { >> + *(Destination8++) = *(Source8++); >> + } >> + } else if (SourceBuffer < DestinationBuffer) { >> + Destination32 = (UINT32*)((UINTN)DestinationBuffer + Length); >> + Source32 = (CONST UINT32*)((UINTN)SourceBuffer + Length); >> + >> + // Destination32 and Source32 were aligned on a 32-bit boundary >> + // but if length is not a multiple of 4 bytes then they won't be >> + // anymore. >> + >> + Alignment = Length & 0x3; >> + if (Alignment != 0) { >> + Destination8 = (UINT8*)Destination32; >> + Source8 = (CONST UINT8*)Source32; >> + >> + while (Alignment-- != 0) { >> + *(--Destination8) = *(--Source8); >> + --Length; >> + } >> + Destination32 = (UINT32*)Destination8; >> + Source32 = (CONST UINT32*)Source8; >> + } >> + >> + while (Length > 0) { >> + *(--Destination32) = *(--Source32); >> + Length -= 4; >> + } >> + } >> + } else { >> + if (SourceBuffer > DestinationBuffer) { >> + Destination8 = (UINT8*)DestinationBuffer; >> + Source8 = (CONST UINT8*)SourceBuffer; >> + while (Length-- != 0) { >> + *(Destination8++) = *(Source8++); >> + } >> + } else if (SourceBuffer < DestinationBuffer) { >> + Destination8 = (UINT8*)DestinationBuffer + Length; >> + Source8 = (CONST UINT8*)SourceBuffer + Length; >> + while (Length-- != 0) { >> + *(--Destination8) = *(--Source8); >> + } >> + } >> + } >> + return DestinationBuffer; >> +} >> + >> EFI_STATUS >> NorFlashReadBlocks ( >> IN NOR_FLASH_INSTANCE *Instance, >> @@ -791,7 +917,7 @@ NorFlashReadBlocks ( >> SEND_NOR_COMMAND (Instance->DeviceBaseAddress, 0, P30_CMD_READ_ARRAY); >> >> // Readout the data >> - CopyMem(Buffer, (UINTN *)StartAddress, BufferSizeInBytes); >> + AlignedCopyMem (Buffer, (UINTN *)StartAddress, BufferSizeInBytes); > > Since you're changing these call sites anyway, can we make the casts > to VOID * for clarity? > Yep, better. >> >> return EFI_SUCCESS; >> } >> @@ -832,7 +958,7 @@ NorFlashRead ( >> SEND_NOR_COMMAND (Instance->DeviceBaseAddress, 0, P30_CMD_READ_ARRAY); >> >> // Readout the data >> - CopyMem (Buffer, (UINTN *)(StartAddress + Offset), BufferSizeInBytes); >> + AlignedCopyMem (Buffer, (UINTN *)(StartAddress + Offset), BufferSizeInBytes); > > / > Leif > >> >> return EFI_SUCCESS; >> } >> -- >> 2.7.4 >>