From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-x22b.google.com (mail-io0-x22b.google.com [IPv6:2607:f8b0:4001:c06::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 2C8821A1DEC for ; Thu, 11 Aug 2016 14:45:36 -0700 (PDT) Received: by mail-io0-x22b.google.com with SMTP id q83so8797982iod.1 for ; Thu, 11 Aug 2016 14:45:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=fHNs1hX2xmX1jRDnnRChpRITywUofNaF/qimB2uvDiE=; b=SVrn8UpRKXsJkjGhj5Y3imywvgydK4xPW9BFBIL2SzlekpLQxZu/6mCUfgM/Ixq2MA 3ciAa/Ipymd1ihqLDuNleSPQu8n4nc055TBQ06VZBSIK3Xq0crt8NCH0b36TxZdmPSCi rgK+/1lTfmDesv4WMhfXKSPuUapAl6LeAXOp8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=fHNs1hX2xmX1jRDnnRChpRITywUofNaF/qimB2uvDiE=; b=dKgbRFvXQi9if29K/X7Nt42/L4HDmGOczVAYUj0N3d5WiHup/yAIznHSyLKYL+AjcV xe8iadWnRSrGYumxmIowh7HPdJkb1XpwV16Hfy89Wrgqr87LOC550Q+PrmBz8m8Knl49 mDceyFZD0rN6j+WKzNJ5QKsLmJRs/36qoqh/BfwUu738ZGM/nhz1Wnc570Q9Ul/b5hMF pbh/OxkFMeCFYh2Rz0EApHftjuPRo48ZRB6v6RmNaQEmjrV4lmUjZ6rBbVBeiHTUi/w0 ScIdPwyJU7cRvjiNhmPIRHyyHEesmM1rahtSsXl7iBudKOgG00CKL8nIaW5rVsW5JM5Q fEQg== X-Gm-Message-State: AEkoous9dHecBGjK3AMXTZk7i9/9+wYhfvPRp9HWVysnJJdt3kDTuZkbpqYYK1S+Qm4yGrga3fGwtSCm1NpakJ1C X-Received: by 10.107.135.22 with SMTP id j22mr15906439iod.56.1470951935329; Thu, 11 Aug 2016 14:45:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.204.195 with HTTP; Thu, 11 Aug 2016 14:45:34 -0700 (PDT) In-Reply-To: References: <1470939632-32198-1-git-send-email-ard.biesheuvel@linaro.org> From: Ard Biesheuvel Date: Thu, 11 Aug 2016 23:45:34 +0200 Message-ID: To: "Cohen, Eugene" Cc: "edk2-devel@lists.01.org" , "leif.lindholm@linaro.org" , "liming.gao@intel.com" Subject: Re: [PATCH 1/3] ArmPkg/CompilerIntrinsicsLib: replace memcpy and memset with C code X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Aug 2016 21:45:36 -0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 11 August 2016 at 23:34, Cohen, Eugene wrote: >> This replaces the various implementations of memset and memcpy, >> including the ARM RTABI ones (__aeabi_mem[set|clr]_[|4|8]) with >> a single C implementation for each. The ones we have are either not >> very sophisticated (ARM), or they are too sophisticated (memcpy() on >> AARCH64, which may perform unaligned accesses) or already coded in >> C >> (memset on AArch64). > > Ard, > > I'm concerned about the performance impact of this change... there's a re= ason for all that complexity and it's to optimize performance. > > Why does memcpy performance matter? In addition to the overall memcpy st= uff scattered around C code we have an instance that is particularly sensit= ive to memcpy performance. For DMA operations when invoking double-bufferi= ng or access to portions of a buffer that is common mapped (i.e. uncached o= n non-coherent DMA systems) the impact of a non-optimized memcpy is enormou= s compared to the optimized ones because the penalty is amplified by orders= of magnitude due to uncached memory access latency. > That code would be using CopyMem(), no? This only serves the compiler generated calls, which are few since Tianocore does not allow initialized locals. > So I would ask that before a change like this is brought in that we chara= cterize the cached-cached and cached-uncached (and perhaps unaligned cached= -cached) performance across the implementations. Based on my experience I'= m expecting both cases will take a massive performance hit. > > From your commit message I'm inferring that the problem you're solving is= to play nice in environments that can't tolerate unaligned access like whe= n the MMU is off. I get that - and I think a variant of the library that p= lays nice in these limited cases makes sense. However, I don't think we sh= ould drag down the performance down of the rest of the environment where we= spend the vast majority of our time executing. > > Eugene > > >