From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id AD85381F9C for ; Wed, 25 Jan 2017 18:39:04 -0800 (PST) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP; 25 Jan 2017 18:39:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,287,1477983600"; d="scan'208,217";a="58306573" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga006.fm.intel.com with ESMTP; 25 Jan 2017 18:39:04 -0800 Received: from fmsmsx123.amr.corp.intel.com (10.18.125.38) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 25 Jan 2017 18:39:03 -0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by fmsmsx123.amr.corp.intel.com (10.18.125.38) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 25 Jan 2017 18:39:02 -0800 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.88]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.132]) with mapi id 14.03.0248.002; Thu, 26 Jan 2017 10:39:00 +0800 From: "Yao, Jiewen" To: Pete Batard , "edk2-devel@lists.01.org" CC: "Kinney, Michael D" , "Yao, Jiewen" Thread-Topic: [edk2] [PATCH 0/5] MdeModulePkg/EbcDxe: add ARM support Thread-Index: AQHSdj2o4CpLCtRayUyHTU7UYcUFiaFKC/LA Date: Thu, 26 Jan 2017 02:38:59 +0000 Message-ID: <74D8A39837DF1E4DA445A8C0B3885C503A8E5B10@shsmsx102.ccr.corp.intel.com> References: In-Reply-To: Accept-Language: zh-CN, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] MIME-Version: 1.0 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 Subject: Re: [PATCH 0/5] MdeModulePkg/EbcDxe: add ARM support X-BeenThere: edk2-devel@lists.01.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: EDK II Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jan 2017 02:39:04 -0000 Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable HI Pete Thanks to add a new arch support to EBC module. I like the idea to keep compatibility: > a) EBC executables that were produced with the older version of the > specs will run exactly as they did on EBC VMs that comply with the newer > version of the specs Thank you to let us know that we need to entail a minor UEFI spec change, I= would like to double confirm with Mike Kinney. Do we have any concern to adopt EDKII patch before UEFI spec change? BTW: Is the ERCR submitted ? > Now, whereas this solution is self-contained, it does entail a minor > change to the UEFI EBC specs, that mandates the insertion of call > signatures at compilation time into the unused part of the 64-bit > function pointer data object used by BREAK 5 (see 3.3). Thank you Yao Jiewen > -----Original Message----- > From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Pe= te > Batard > Sent: Tuesday, January 24, 2017 8:30 PM > To: edk2-devel@lists.01.org > Subject: [edk2] [PATCH 0/5] MdeModulePkg/EbcDxe: add ARM support > > (This e-mail is fairly lengthy, so an Executive Summary is provided, for > those who don't want to go through a wall of text). > > 0. Executive Summary > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > 0.1 Preamble > ------------ > > One of the most vexing aspect of EFI Byte Code (EBC) proposal from the > UEFI specs is that its EDK2 implementation has somewhat fallen short of > its implicit goal of universality, due to the non availability of an EBC > VM for all supported architectures. > > As a consequence, we feel that this has resulted in major backtracking > on EBC, such as EBC not being made a mandatory part of UEFI firmware > implementations (in the same way as FAT) or not having a default > provision for EBC bootloaders (e.g. /efi/boot/bootebc.arm). > > Shortly after Ard Biesheuvel provided an EBC implementation for ARM64, > last August, questions were raised with regards to being able to do the > same for ARM due to issues with trying to work with the calling > convention on that platform, and more specifically, with the 64-bit > parameter marshalling that is required there. At the time, these > problems were deemed difficult to tackle without the collaboration of > third parties (such as external toolchain developers) or without having > to restrict the scope of what EBC applications could do (such as > limiting their access to only "known" EDK2 interfaces, for which > parameter marshalling specifics would have been added). > > This series of patches attempts to remedy all that, by proposing an ARM > EBC implementation that solves the issues mentioned above in a generic > and entirely self contained manner (i.e. within the EDK2). With this, > the EDK2 should finally enable the execution of the same EBC binary > across ALL supported UEFI architectures, and thus complete the implicit > goal of EBC. > > 0.2 Solution Overview > --------------------- > > The gist of our marshalling solution can be summarized with being able > to access a 16-bit value, at runtime, for native <-> EBC layer > transition calls, that indicates which of (up to) 16 function call > parameters is 64-bit. In turn, this enables the ARM EBC VM to "realign" > said parameters to a 64-bit boundary as needed. > > Hereafter, we will refer to these 16-bit values as a "call signatures". > > Now, whereas this solution is self-contained, it does entail a minor > change to the UEFI EBC specs, that mandates the insertion of call > signatures at compilation time into the unused part of the 64-bit > function pointer data object used by BREAK 5 (see 3.3). > > However, this non breaking change is both backward and forward > compatible. Especially, once the specs change is effected, and for *ALL* > current EBC archs (IA32, IA64, X64, AARCH64): > > a) EBC executables that were produced with the older version of the > specs will run exactly as they did on EBC VMs that comply with the newer > version of the specs > > b) EBC executables that are produced with the newer version of the specs > will run and perform the same, on EBC VMs that comply with either the > older or newer version of the specs. > > Also, and specifically for ARM (since other platforms are unaffected by > such concerns), this whole proposal makes it possible for: > > - Existing EBC binaries, that do not invoke BREAK 5 (i.e. no native to > EBC calls) to run onto the proposed ARM EBC VM without any changes. > Especially, these applications can perform EBC to native calls, on ARM, > with no adverse effects. > > - Existing EBC binaries, that do invoke BREAK 5, but that don't include > signatures, to partially run on ARM. In this case, the ARM EBC VM will > return an error status for calls issued from native to the EBC, allowing > a native app to acknowledge incompatibility issues and potentially let > the developer know that the adding of signature is needed. > > - Existing EBC binaries, that do invoke BREAK 5, to be (relatively > easily) patched for ARM EBC compatibility. This, for instance, is > demonstrated with the EDK2 FAT EBC binary in 4.3. > > - The updated EDK2 EBC toolchain to produce EBC applications that run on > *ALL* EBC VMs, including the newly added ARM, as well as other VMs. > > 0.3 Patch Overview > ------------------ > > The patch series is broken down into 5 parts: > > - 1/5 relates to preliminary changes, within the common EBC code, that > enable VmReadIndex##() functions to optionally return the decoded Const > and Natural parts, as we need this data for our proposed Stack Tracker. > It is done by adding an optional pointer to a {Const, Natural} struct, > that is to be filled when the pointer is not NULL. With the introduction > of this change, the patch sets all of the new optional pointers to NULL, > so no actual behavioural change occurs. > > - 2/5 introduces the basic ARM EBC VM, as was proposed by Ard as a PoC > port of the ARRCH64 version. Note that this change alone is enough to > get standalone EBC code to run on ARM, but may result in the parameter > marshalling issues we mentioned above, for any call that transitions > between ARM native and EBC, due to potential "misaligned" 64-bit paramete= rs. > > - 3/5 fixes the issue of calling from EBC into native ARM, in a > completely self contained manner, through the addition of a "Stack > Tracker". This Stack Tracker is enough to dynamically resolve, at call > time, whether a specific parameter is 64-bit or not, and thus whether it > needs to be realigned. This is done through a buffer that starts at > 1/64th of the stack size and is grown dynamically if needed. We > currently estimate that most EBC applications should not need to use > more than this initially allocated space, as, in most cases, the stack > tracker should require less than 1.5 bits to track every 32 bits of > stack data. > > - 4/5 fixes the mirror issue of calling from native into EBC. This is > the part that requires a specs change, as the thunking code for the ARM > platform must have some knowledge about the parameter signature of the > function being called into, to re-align 64-bit parameters. This specific > section adds handling and processing of call signatures at runtime. It > should be noted that this patch also bumps the global version of the EBC > VM from 1.0 to 1.1 as a means to indicate whether an EBC VM is compliant > with the new specs (though this really only affects ARM, and none of the > other archs). > > - 5/5 adds the insertion of call signatures into EBC binaries at EDK2 > compilation time, in compliance with the proposed specs change. This is > done by introducing the 2 new python tools: one that works around the > intel EBC compiler (iec) to parse a C source and generates an optional > signature data for this source, and one that processes these signature > files, at link time, to patch the final binary with the signature data > that will be required at runtime. Eventually, we expect the generation > of signature data to become part of the iec, thus rendering the first > tool obsolete. > > > 1. A refresher of the ARM calling convention issue > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > As reminder, the major issue we face when trying to implement an EBC VM > on ARM has to do with the marshalling of 64-bit parameters from EBC to > native and from native to EBC. > > This is because, as per the AAPCS (Procedure Call Standard for the ARM > Architecture), ARM has the requirement that any 64-bit function call > parameters must be aligned to a 64-bit boundary (or an even register, > for register parameters), whereas, by its nature, the EBC call stack on > ARM is packed to 32-bit, and therefore 64-bit parameters are not > guaranteed to be aligned > > Without counter-measures, this results in code, such as one calling from > EBC into a simple (UINT32, UINT64) native function, or calling from > native into a (VOID*, INT64) EBC function, that ends up with garbage > parameter data. > > The one solution we see to work around this is through the use of call > signatures, that tell us, at call time, where 64-bit arguments are > located so that we can realign them. However, as we demonstrate in this > proposal, the provision of call signatures does not need to be intrusive > with regards to the development flow. > > > 2. EBC to native marshalling: The Stack Tracker > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > 2.1. Overview > ------------- > > On any architecture other than EBC, the idea of using a stack tracker to > determine the size of a call parameter would sound like a simplistic > approach. After all, how can one tell if a sequence of 32-bit values > being pushed onstack is meant to be used as two 32-bit parameters, or a > single 64-bit parameter, with the low and high words being pushed > separately. > > However, a careful reading of the EBC specs (UEFI 2.6, section 21.9.3) > enables us to conclude the following: On an EBC platform, any non-64-bit > call parameter will be enqueued as a natural. > > Furthermore, because EBC naturals can only be enqueued in an atomic > manner (in other words, it is not possible to use a combination of > shorter PUSHes or MOV's to add a natural onstack), then, by tracking > natural operations, which we can easily do in the VM, it is possible to > determine where non 64-bit parameters have been enqueued, and therefore > also deduce where 64-bit parameters are located. > > From there, we devise that we can add a "stack tracker" on ARM, to > monitor the EBC executable's stack operations for which, in order to > minimize the amount of data required for tracking, we will use sets of 2 > bit sequences, using the following encoding: > - 01b -> a natural has been enqueued on stack > - 00b -> a contiguous set of (non-natural) 64-bit data is present on stac= k > - 1xb -> start of dual 2-bit sequence (4-bits). x along with the the > next 2 bits indicates the number of contiguous bytes of data that have > been enqueued (as non natural data). For instance a 10b 01b sequence may > be used to indicate that a PUSH8 equivalent operation has been effected. > > The dual 2-bit sequences are needed as an application may be enqueuing > non-natural parameters with the aim of constructing a (potential) 64-bit > parameter. > > For instance, if we have a sequence of 4 MOVIw @R0, ..., we want the > stack tracker to be able to ultimately resolve the enqueued data as a > bona-fide 64-bit parameter if needed so that, as data is being enqueued, > we see the stack tracker being updated as such: > > 10b 10b (16 bits of data enqueued) > 11b 00b (32 bits of data) > 11b 10b (48 bits of data) > 00b (64-bit of data -> the dual 2-bit sequence is now collapsed into > a single 2-bit sequence) > > The use of 2 bit for an actual 64-bit "stride" of data, vs. 4 bits for > other lengths, is of course intended as a form of basic compression to > reduce the amount of space required for stack tracking, since we expect > the frequency of 64-bit and natural stack elements to be a lot higher > than smaller sized elements. > > 2.2 Call time usage > ------------------- > > With this in effect, at call time, we can look into the stack tracer to > determine whether each (potential) parameter is either natural or > 64-bit, and then construct a 16-bit call signature. Note that, as per > the ARM calling convention, both register and non register 64-bit > parameters must to be aligned, which, for register parameters, that > means r0 or r2 must be used as the first word of the 64-bit argument. > > Also, in case this needs to be clarified, please note that, even as we > state that there are only 2 types of parameters that an EBC VM can use > for an EBC to native call (Natural or 64-bit), this does not imply that > an EBC application cannot call into a native function that, say, takes a > BYTE as a parameter. Only that, if it does so, the EBC specs requires > that it must reserve space for a Natural onstack. > > Now, one element we cannot determine is just how many parameters the > target call takes. However this is something that can be safely ignored, > as there are no issues associated with passing a larger parameter call > stack than what is actually needed. The extra (potential) parameters we > enqueue will simply be ignored. > > In our implementation we therefore set the maximum number of parameters > that a native function call may deal with to 16, which means that we > always assume that the function call might take 16 arguments. This is > based on what we've seen other VMs do, as well as what we consider safe > for the official UEFI interface calls. Currently, we are not aware of > any calls in the EDK2 taking more than 16 parameters, and we also don't > expect user applications to pass more than 16 parameters. > > Finally, it should be pointed out that, because we don't know the actual > number of parameters for a call, we may still attempt to process some > dual 2-bit sequences as part of our 16-bit call signature creation. > However, if we do, we know that they cannot apply to a formal parameter, > so we can either choose to ignore them, or just let the implementation > toggle call signature bits indiscriminately (as we do in our proposal). > > 2.3 Space considerations > ------------------------ > > The stack tracker is designed to grow dynamically and is currently > allocated to 1/64th of the total stack space on startup (as a buffer > that is separate from the stack. We initially considered reserving the > stack tracker as part of the the stack buffer, but dismissed that > approach). Currently, each time a reallocation is needed, the stack > tracker is set to double in space. > > For typical executables then, even those who tend to err towards high > stack usage, we don't expect the initial bufferspace to outgrow the > 1/32th (total stack space) mark, especially as a large part of the stack > will be reserved for what the stack tracker sees as contiguous sways of > 64-bit elements, which will be stored in 1/32th. > > However, we still need to consider the theoretical worst case scenario, > where someone create an EBC application that consists only of: > PUSHn ... > MOV(I)b @R0, .... > PUSHn ... > for as much stack space as is available (R0 being the EBC stack pointer). > > In this case, the stack tracker will allocate 2 bits for each PUSHn and > 4 for each MOV (because they equate to pushing a byte), but none of the > 4 bit sequences will ever be collapsable into a 2-bit one (once we have > accumulated enough bytes to form a 64-bit) which means that 32 bits + 32 > bits of actual stack data (because while 8 bits are pushed, they are > aligned to 32 on the next PUSHn) =3D 64 bits are encoded into 6 bits in > the stack tracker, or ~1/10th of stack space. And since we double the > stack tracker size each time it needs to be reallocated, this means that > the very worst case scenario we can see in terms of space would be a > stack tracker that needs to occupy 1/8th of the stack data at worst. > > Even if that was a realistic scenario, we don't consider that the > drawback of having to (potentially) reserve an extra 1/8th outweighs the > advantages of allowing ARM users, to benefit from an EBC VM. Still, we > will point out that this is not a scenario we ever expect to see in a > practical application. Instead the worst scenario we expect for an > exceedingly stack heavy EBC executable, that enqueues a lot of <=3D 32 bi= t > elements, would be 1/16th of stack space (or 64KB, since the default > stack buffer is ~1MB), which we think is very reasonable, even on ARM. > Furthermore, we consider it a fair estimate that 99% of applications > will never need more than the initial 1/64th (or 16KB) allocated for > stack tracking. > > 2.4 Additional considerations > ----------------------------- > > 2.4.1 Local variables onstack > > Typically, at the beginning of a subroutine, a MOV R0, R0(0,-n) will be > used by the compiler to reserve space for local variables. > > For instance, a compiler may insert MOV R0, R0(0,-1024) to reserve space > for 1024 bytes onstack. And while developers and users can reasonably > expect this to be a near-instantaneous operation, if our stack tracker > is going to read that operation as a set of 128 x 64-bit longwords being > allocated, and then perform 128 x {read byte; modify 2 bits; write byte} > operations, one may have misgivings about the performance impact of > tracking the stack. > > However, the stack tracker is designed to recognize operations > pertaining to repeated sequences of data, and optimize them. In short, > in the current implementation, the sequence above will typically result > in the stack tracker simply zeroing a set of 128 bytes in one go, > instead of trying to repeatedly update individual 2-bit sequences. > > Note that this holds true even if there is a need to propagate a dual > 2-bit sequence as part of tracking a large set of 64-bit longwords. > > To confirm this, and as part of our test suite, we also have a test > where half the stack is reserved and then released for local data, 10 > times in a row, and we see that the execution of this test is near > instantaneous in QEMU, confirming that there should be no performance > bottleneck (see 4.1 -> Realloc). > > 2.4.2) Stack buffer switching > > First, we must point out that, per our testing, NONE of the current EBC > VM implementations from the EDK2 currently allow the EBC stack buffer to > be switched to a different buffer by an EBC application at runtime. > Especially the following EBC assembly code will freeze execution on > MOVREL, on all current VMs: > > EfiMain: > MOV R6, R0 > MOVREL R0, StackTop > MOV R0, R6 > RET > > section '.data' data readable writeable > StackBuf: dq 255 > StackTop: dq 1 > > Nonetheless, in case this ever becomes a possibility, our stack tracking > proposal does have provision for stack buffer switching. The only > limitation we have (and this is really the only limitation we see for > the whole proposal), is that it can only handle one level of switching. > In other words, provided stack switching was possible (which currently > isn't the case) the stack tracker wouldn't be able to properly track > parameters if a second stack buffer switching occurs within code that > executes against a stack buffer that was already switched. However, the > proposal should still be fine if only a single level of stack switching > occurs (i.e. we should be able to track switch/restore, no matter how > many times such switching is repeated during the execution of an > application). > > Thus, considering that: > 1. It does not currently seem possible to switch stack buffer on any arch > 2. The EBC compiler does not offer the ability to manipulate the stack > pointer directly in the first place > 3. Stack switching only becomes an issue if done recursively > > We consider that this one limitation of our implementation can be > dismissed as too unrealistic and cannot be construed as a showstopper. > > 2.4.3 Delta stack pointer updates > > Outside of the obvious PUSH/POP operations, the stack tracker does track > mathematical/logical modification of R0, by computing the delta from the > previous R0 value. Most of the time this delta would only have a const > component, which we then try to resolve to a complete or partial set of > 64-bit consecutive values. However, there also exist instances, such as > MOV R0, R0(+n,+c), where we will have both a constant and a natural part > to track. > > While updating the stack tracker itself with such data is not > technically an issue, one may wonder if the order in which natural and > constants are processed might have an effect on our ability to determine > where 64-bit parameters are located. > > However, when one looks more closely at the validity of such concerns, > the conclusion will be drawn that such an operation can never be used to > fill actual call parameters (which would have to be optional in the > first place, since it is of course impossible, with the current EBC > specs, to use such an operation to pass actual data), as neither the > specs nor the VM make any promise as to the order in which constant and > natural parameters are processed. Therefore a programmer cannot assume > how its call parameter stack will be set from invoking such an > operation. Thus, as far as tracking data to determine if a parameter is > natural or 64-bit, the order induced by the operation above is > irrelevant, and we are therefore free to pick whichever order makes most > sense for our implementation. > > 2.4.4 "Cloaked" stack operations > > "Cloaked" stack operation are stack operations that are not effected > using R0 as the stack pointer. For instance, someone may copy R0 into > R1, then alter the data pointed by either R0 or R1, and then copy R1 > (which may or may not have been modified) back to R0 . And whereas stack > switching (i.e. trying to have R0 point to an address that isn't within > the current stack buffer) currently break VM execution, moving R0 within > the stack buffer, even in a cloaked manner, is something that the EBC VM > can and does perform without issues. > > Ultimately, there are two types if cloaking that may come into effect, > which we'll call positive and negative cloaking. > > Positive cloaking (dequeuing) is a non issue. This basically intervenes > when the restored R0 is at a higher address than the original one. Since > stacks grow downwards, this means that dequeueing of data has been > issued, which we can handle with relative ease by going through our > existing tracker data, and removing natural and const elements, > according to their size, until we match the R0 address delta. > > Negative cloaking (enqueuing) could be seen as more problematic, as one > may consider that, since we're not tracking anything but R0, someone may > use something like an R1 index manipulation to enqueue natural and const > call parameters, before eventually assigning R1 to R0, which would > defeat our tracking ability. > > However, for all purposes, negative cloaked stack pointer updates should > never be used to enqueue call arguments. Especially, by NOT explicitly > declaring whether an argument is a natural or a 64 bit, through a direct > stack operation, one would be deliberately misinterpreting the intent of > the specs, which states (21.9.3) that "Parameters are pushed on the VM > stack" and pretty much implies direct stack operation. At the very > least, we do not expect C applications to use negative cloaking (as C > does not have provisions for anything like this), and furthermore, > should we believe that specs may be misinterpreted, we could add a > "directly" qualifier, so that all ambiguity is removed. > > 2.5 Code changes overview > ------------------------- > > The stack tracker is introduced with PATCH 3/5, and contains the > following code changes: > > 2.5.1: In EbcVmTest.h we add a new pointer to an optional, opaque and > arch-specific structure. This is the structure that will be used for our > Stack Tracker on ARM. It is important to note that, in the common code, > we use the presence or absence of this pointer (whether the pointer is > NULL) to determine whether stack tracking is in operation. We preferred > this approach to using #ifdef MDE_ARM in the code, as we believe it is > cleaner. > > 2.5.2: In EbcExecute.c we add stack tracking for any instruction that > may require it. This is done for anything that deal with accessing the > content pointed by R0 (the stack tracker) or manipulating R0 directly, > including mathematical operations on R0, as well as (obviously) PUSH and > POP operations. > Depending on whether VmPtr->StackTracker is NULL (currently, only ARM) > and whether the instructions affects R0, we then invoke one of > UpdateStackTracker() or UpdateStackTrackerFromDelta(). > > 2.5.3: Since we introduce the 2 functions above, we also add a blank > EbcStackTracker.c, at the top level, to be used on any arch that doesn't > require stack tracking. Calls are defined as empty functions there. > > 2.5.4: For ARM alone, we add Arm/EbcStackTracker.c which contains the > definition for the 2 functions above, as well as any other stack tracker > support function, such as the ones that deal with stack tracker buffer > allocation/release. > > 2.5.6: In Arm/EbcSupport.c, we add the necessary calls for stack tracker > allocation/release as well as modify EbcLLCALLEX() to use a new > EbcLLCALLEXNativeArm() which takes an extra argument for the current > argument layout as returned by the stack tracked. > > 2.5.7: in Arm/EbcLowLevel.S, we define the EbcLLCALLEXNativeArm, which > takes care of properly aligning up to 16 64-bit parameters, according to > the argument layout. > > > 3. Native to EBC native marshalling: Internal call signatures > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > 3.1 Overview > ------------ > > Once again, we start from the principle that a call signature is needed, > at layer transition time (native -> EBC invocation), so that we can > realign 64-bit call parameters as needed. > > In this case, since the application performing the call request is > native, we cannot use anything like a stack tracker (which wouldn't work > for native code anyway) and instead, need to ensure that we can have the > call signature at our disposal when we perform thunking. > > Two elements that works to our advantage for this part are that: > - The code for which we need signature awareness is the EBC code itself, > in other words, code that should be produced using the EKD2 EBC toolchain= . > - Every single function call for which we need a signature must also be > function call for which thunking will be set using BREAK 5. > > From there, it's easy to devise a solution that consists of modifying > the EBC generation toolchain, so that it adds a 16-bit call signature > into the 64-bits offsets used by BREAK 5 (which only ever use a 32-bit > payload), to make signatures available when thunking is invoked. > > 3.2 Implementation > ------------------ > > After having confirmed that, as per specs, all of the current EBC VM > implementations do ignore the high 32-bit part of the 64-bit used by > BREAK 5 (which means that we can alter this existing data without > incurring any drawbacks), we identify it as the best place to store the > 16-bit signature, along with an extra 16-bit marker, which we'll then > use to detect EBC binaries that were compiled without signatures. > The other reasons that make us want to use this element are that: > - This is space that is already available in an EBC binary (i.e. no need > to add extra data/instructions) > - This can enable the patching of existing EBC binaries. > - It makes logical sense to have it there, since the call signatures are > related to functions that requires BREAK 5 invocation. > > When BREAK 5 is invoked, we should therefore be able to copy this > signature (if available) into the EBC_INSTRUCTION_BUFFER structure, and > subsequently use that data during call thunking, to align 64-bit paramete= r. > > Of course, while we will now require EBC binaries to be decorated with > additional signature data, we don't want EBC developers to have to go > through the process of inserting these signatures manually. Instead, we > automate the signature insertion so that it will run at compilation > time. To that effect, we introduce 2 Python scripts in BaseTools: > - GenEbcSignature, invoked after each object generation, parses a > preprocessed C source, along with the object file, to create the > signature data, which is then stored into a corresponding .sig files. > - PatchEbcSignature, invoked at final application link time, processes > the .sig files as well as the .map data and .efi binary, and inserts the > signatures at the relevant location in the final binary > > Currently, and because we expect the intel EBC compiler to be updated to > follow the new specs (since it is really the best place to perform such > processing, as it has access to the C parser, lexer as well as the full > preprocessed source, and can more easily determine the nature of > function call arguments), we see GenEbcSignature as a stopgap solution > until said compiler is updated. > > Therefore, GenEbcSignature was designed to be very basic with regards to > the ability to properly detect 64-bit parameters. For instance, it > expects the processed source to follow the EDK coding conventions and it > also requires straight INT64 or UINT64 parameters to be used (i.e. no > redefinitions of these basic types). Of course, since any aspect of the > signature generation and insertion can be amended, we are ready to > modify the proposal according to what intel sees as the best course of > action with regards to iec integration. > > On the other hand, we expect PatchEbcSignature to remain part of the EBC > toolchain in one form or another, as signature insertion intervenes post > linking, and the EBC linker is not something that was written > specifically for EBC (regular Microsoft linker) and there isn't much > performance/optimization to be had for not having an extra step here. > > Finally, both scripts currently rely on the intel EBC compiler > referencing externally callable functions with a "_plabel" suffix, which > is what we empirically identified as intel's marker for such calls. Of > course, not knowing the internals of the iec, it is possible that this > assertion does not hold, in which case, there again, we can work with > intel iec developers to refine it... > > 3.3 Proposed specs change > ------------------------- > > UEFI Specs Version 2.6 (January, 2016) are used for all the changes > highlighted below. > > Proposed alterations/insertions are to be found within brackets [ ] > > * Section 21.8 -> BREAK -> BREAK 5: > > "Create thunk. This causes the interpreter to create a thunk for the EBC > entry point whose 32-bit IP-relative offset is stored in the low part of > a 64-bit data address in VM register R7[, and whose call signature is > stored in the high part. For details on how the signature should be > generated, see section 21.12.10.2]. The interpreter then > replaces the contents of the memory location pointed to by R7 to point > to the newly created thunk (...)" > > * Section 21.12.10.2: Thunking Native Code to EBC > > "Typical C code to install a generic protocol is shown below. > EFI_STATUS Foo(UINT32 Arg1, UINT[64] Arg2); > (...) > > "To support thunking native code to EBC, the EBC compiler resolves (...) > * Associated relocations[ and optional call parameter alignment] for > the above > > [In order to perform optional parameter alignment, the EBC toolchain is > required to insert a 16-bit call signature, along with a 16-bit marker, > in the high 32-bit word of the 64-bit function pointer data object. > > A bit of the call signature is set to 1 if a parameter is 64-bit or 0 > otherwise, with the first parameter at bit 0. If a function call uses > less than 16 parameters, any unused bit should be set to 0. EBC function > calls with more than 16 parameters are not supported. > > The 16-bit signature should then be written into bits 32 to 47 of the > 64-bit function pointer data object, and bits 48 to 63 set to 0x2EBC. > > Thus, for the (UINT32, UINT64) function call above, the 64-bit function > pointer data object that the EBC toolchain would need to store at > Foo_pointer is: > > (Foo - Foo_pointer - 4) + (0x0002 << 32) + (0x2EBC << 48)] > > 3.4 Code changes overview > ------------------------- > > o PATCH 4/5: > > 3.4.1: In EbcVmTest.h we bump the VM version minor from 0 to 1. This is > because, while there isn't any actual incompatibility being introduced > for existing VMs, we feel that EBC developers may still want the ability > to detect if the VM they are running against is call-signature > compatible (v1.1) or not (v1.0) and possibly take action as a result. > > 3.4.2: In EbcInt.h we define the EBC_CALL_SIGNATURE marker, and > introduce a new flag for the ARM create-thunk function, to indicate > whether a call signature needs to be processed. > > 3.4.3: In EbcExecute.c we modify ExecuteBREAK() to read the call > signature (if present) and then set pass that signature along with the > FLAG_THUNK_SIGNATURE flag to the arch-specific EbcCreateThunks(). Note > that, for any other arch than ARM, the flags were already ignored, so no > changes are needed. > > 3.4.5: In Arm/EbcSupport.c, we modify EbcCreateThunks() so that it reads > the new flag and signature, if provided, and store it into the private > thunk data (InstructionBuffer.EbcCallSignature) > > In EbcInterpret() we check whether the signature is present, and if so, > align parameters as needed. If not, we return an > EFI_INCOMPATIBLE_VERSION status. > > 3.4.6: Arm/EbcSupport.S is also modified to handle the new > EbcCallSignature word of InstructionBuffer. > > o PATCH 5/5: > > 3.4.7: BaseTools\Source\Python\GenEbcSignature\GenEbcSignature.py is the > new call signature generation script. It is meant to be called after iec > has generated an object file, and is set to take the preprocessed source > on stdin (so that we can parse function call declarations from headers) > along with the object file, and generates signature data (which, in its > current form, is the python serialized data from a dictionary). > Basically, we parse the COFF object and locate the symbol table, where > we identify all the symbols that have a _plabel suffix. When then try to > locate each symbol in the preprocessed source, as possible function > calls, and, if we find one, detect whether it has [U]INT64 parameters > (from either a function declaration or definition). We then use python's > "pickle" functionality to serialize our call signature dictionary. > > 3.4.8: BaseTools\Source\Python\PatchEbcSignature\PatchEbcSignature.py is > the call signature insertion script. This time, we process the .map file > for the produced .efi to identify the address of _plable suffixed > function calls. These will be the addresses we need to insert signatures > into. > Then, we take a list of either .sig or .lib files (which we convert to > .sig path), and unserialize them to build a full dictionary of call > signatures. > Then, after a few sanity checks, we insert the signatures, along with > the markers. > > 3.4.9: In BaseTools\Conf\build_rule.template, we add the step that call > on GenEbc/PatchEbcSignature when EBC binaries are being produced. > > Note that, as they are introduced in this proposal, these calls > currently have the -v (verbose) flag set, so that additional information > about the call signature generation and insertion is displayed during > compilation. Eventually, we want to remove the -v flag. > > 3.4.10: In BaseTools\Conf\tools_def.template, we add 2 new variables for > the new tools. > > The rest of the changes should be explicit. > > > 4. Test Suite > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > A comprehensive validation test suite is provided, in order to > demonstrate that the proposal does work as advertised. As may be > expected, these tests are based on using the QEMU_ARM firmware that is > generated from an EDK2 source tree where these patches have been applied. > > For convenience, it will also be assumed that: > - Windows x64 is being used as the test platform > - QEMU 2.7.0 or later (64-bit version) is available and installed under > C:\Program Files\qemu\ (NB: 2.7.x or 2.8.x should work fine for ARM, but > I found that the 2.8.x precompiled Windows QEMU binary had issues with > AARCH64) > - One has cloned the fasmg EBC Assembler [1] (which contains most of the > test suite) into C:\fasmg-ebc\ > > 4.1 EBC -> native ARM test suite > -------------------------------- > > The test suite for the stack tracker can be found in the EBC assembler, > which is an Open Source EBC assembler [1], based on fasmg, that was > developed in parallel to this proposal (but that isn't directly related > to it). The use of an EBC assembler makes it convenient to both compile > and validate/debug test applications (through the EBC Debugger). Also, > some aspects of what we are testing (such as cloaked stack > manipulations) would be difficult to test outside of assembly. > > Besides the EBC test applications, we do require a native UEFI driver, > that will install a set of native protocols, which we can call into for > testing. This driver [2], which is also provided as part of the EBC > assembler, is written in C (and compiled as a gnu-efi based VS2015 > solution, for convenience reasons). Both pre-compiled ARM and IA32 > driver binaries are provided if needed. > > To run this part of the suite, whose prime purpose is to validate the > stack tracker, one should navigate to the stack_stracker\ subdirectory > of the EBC Assembler and run something like: > > C:\fasmg-ebc\stack_tracker> make qemu > > This will download the required files as needed (such as the latest > fasmg assembler, or the QEMU ARM firmware), assemble the EBC test > programs, and then run all the tests in an ARM QEMU environment. > > The suite is comprised of: > > - Matrix test [3], that tests every single of the 16 possible > combinations for a 4-parameter native call. In other words, this > validates that that every single parameter we pass, from (UINTN, UINTN, > UINTN, UINTN) to (UINT64, UINT64, UINT64, UINT64), is received, without > mangling, by the ARM native driver. > > - Max test [4], that confirms that 16 parameters can be successfully > received. We perform 3 sets of tests here: 16 native parameters, 16 > 64-bit parameters and 16 intermixed. > > - Cloaked test [5], that performs a set of stack operations using R1 > instead of R0 as the stack pointer, while interspersing the queuing of > actual parameters for a native function call. > > - Realloc test [6], that forces the stack tracker to grow (realloc) its > buffer, by reserving half the stack as local space, and also that tests > the speed at which the stack tracker is able to process half the stack > being reserved/restored as a local space, by repeating the operation 10 > times in a row > > Note that a Switch test was also written, that attempts to switch the > stack buffer, but since this is an operation that does not work on ANY > of the EBC VMs, it is left out of the suite. > > Of course, if you don't want to use the pre-built QEMU_EFI_ARM.fd > firmware, which will be downloaded from my servers, you can build and > copy your own in the stack_tracker\ directory. > > Needless to say, if the patch series has been properly applied, all of > the tests above will report a "PASS" status, confirming that the stack > tracker works. > > 4.2 Native ARM -> EBC test suite > -------------------------------- > > This time, we want to test the reverse operation of marshalling from ARM > to EBC, so we need to create an EBC protocol driver (driver.asm - [7]), > similar to the native C driver we created for the previous test, along > with a native application (native.c in the native/ subdirectory [8]) > that will call into the protocols installed by the EBC driver. > > The native test application includes: > > - A complete matrix test, similar to the one used to validate the stack > tracker (i.e. 16 protocol calls taking 4 arguments that are all possible > combinations of UINTN or UINT64) > > - An additional set of protocol calls, that take 16 parameters in all. > > It should also be noted that, since the fasmg-ebc assembler already has > provision for the insertion of call signatures into the BREAK 5 data > (through its 'EXPORT' macro), there is no need to patch the EBC binary. > > Then, to run the test suite, one can simply run (in the fasmg-ebc root > directory) > > C:\fasmg-ebc> make driver qemu arm > > To compile and install the EBC driver in qemu, and invoke the native > test application. > > You can also invoke the EBC debugger if you replace 'qemu' with 'debug' > (a relevant debugger binary will be downloaded automatically). This test > suite can also be run for other architectures that ARM by replacing > 'arm' with one of 'x64', 'ia32' or 'aa64' (again, the relevant firmware > will be downloaded automatically if not already provided). > > On particular note, if you try to run this test suite for IA32, you will > see that the 'MaxParam64' test (which validates the ability for calls to > take 16 64-bit parameters) does fail, as the IA32 EBC VM doesn't seem to > currently have been designed to handle that many arguments. > > 4.3 Patching the EDK2's FAT EBC binary > -------------------------------------- > > Finally, we conclude this introductory note with a real-life example of > how one can take an existing EBC binary and patch it, so that it will > run in all VMs, including ARM. This also enables us to further validate > this proposal, by demonstrating that a fairly complex existing EBC > application can and does indeed run without issues on ARM. > > One thing we need to be clear about from the onset, is that this step is > NOT something that we expect any EBC developer to have to go through. > Instead, they should just be able to recompile their code, with the > patched version of the EBC toolchain, and when they do so, they will > find that the required signatures have been automatically inserted in > the resulting EBC binary. > > This exercise is only to demonstrate that, if one really needs to, this > proposal also makes is possible to insert signatures into existing EBC > binaries, to allow them to run on the new ARM VM. > > For this example, we will use the FAT EBC binary driver currently > included in the EDK2 (under FatBinPkg/EnhancedFatDxe/Ebc/Fat.efi). > > Because the proposed ARM VM already takes care of EBC -> native ARM > handling (through the stack tracker), the only part we need to concern > ourselves with are the call signatures for native ARM -> EBC invocations. > > What we first need to identify then, are the 64-bit locations where the > 32-bit offsets that are used in conjunction with BREAK 5 are stored. > Obviously, these the elements should be located in the data section, and > furthermore, we can infer that they should be easily recognisable as > 32-bit negative data offsets (most likely in 0xFFFF.... or 0xFFFE.... > since the executable isn't that large, and the code sections can be > expected to be set before data sections), followed by 4 zeroed bytes. > > We can also leverage some knowledge of the Microsoft linker with regards > to how it generates DLL entrypoint addresses (which is what is really > being used behind the scenes to generate the 32-bit BREAK 5 offsets) as > it seems to always place these offsets padded to a 16-byte alignement. > Therefore, we can easily identify that there exist 23 BREAK 5 data > locations in the data section, with the first one being at address > 0x000109e0, and the last at 0x00010eb0. These mark the addresses at > which we will need to add call signatures. > > However, while it can easily help us find the locations we are after, > the binary enough is not enough to help us determine the call signature > data. In this specific case, we will consider that one also has access > to a .map file that is generated, as part of the EDK2 EBC toolchain, > during final linking (for us, that would be something like > "edk2\Build\Fat\RELEASE_VS2015\EBC\FatPkg\EnhancedFatDxe\Fat\OUTPUT\Fa > t.map). > If a map file is not available, one will of course need to use other > means to "guess" what each of the BREAK5 data call is for. Also, it > doesn't matter if that .map file isn't the exact one that was generated > with the binary (and as a matter of fact, even as the FAT EBC binary was > updated very recently, most of this procedure was conducted against the > 2015.08 version of the binary, using a map file that was more that one > year more recent), as we just use it to get a list of calls, along with > their expected order. This is because, if you look at the .map file you > can find that all the EBC calls that may be invoked from native will be > suffixed with a "_plabel". > > From there, we can deduce that the 23 addresses we have found, and in > the order we found them, are respectively for: > > 00109e0 _DriverUnloadHandler() > 0010c80 FatDriverBindingStop() > 0010c90 FatDriverBindingStart() > 0010ca0 FatDriverBindingSupported() > 0010d20 FatComponentNameGetControllerName() > 0010d30 FatComponentNameGetDriverName() > 0010d50 FatOnAccessComplete() > 0010d90 FatOpenVolume() > 0010da0 FatFlushEx() > 0010db0 FatWriteEx() > 0010dc0 FatReadEx() > 0010dd0 FatOpenEx() > 0010de0 FatFlush() > 0010df0 FatSetInfo() > 0010e00 FatGetInfo() > 0010e10 FatSetPosition() > 0010e20 FatGetPosition() > 0010e30 FatWrite() > 0010e40 FatRead() > 0010e50 FatDelete() > 0010e60 FatClose() > 0010e70 FatOpen() > 0010eb0 InternalEmptyFunction() > > Because the EBC FAT driver was updated recently, we may also find that > the addresses we identified also match the ones from the .map file > (minus an 0x1000000 offset), but, as we tried to point out, this is not > an absolute requirement and one does not necessarily have to use the > exact same .map file as the one generated for the binary they are trying > to patch. > > Now, looking at the source/headers (which can also be deduced from the > .map), we find that, out of these 23, only 3 functions need to have a > signature call that is non-zero (i.e. 3 calls actually use 64-bit > parameters). Those are: > > 0010dd0 FatOpenEx(EFI_FILE_PROTOCOL*, EFI_FILE_PROTOCOL**, CHAR16*, > UINT64, UINT64, EFI_FILE_IO_TOKEN*) > -> 011000b > 0010e10 FatSetPosition(EFI_FILE_PROTOCOL*, UINT64 Position) > -> 10b > 0010e70 FatOpen(EFI_FILE_PROTOCOL*, EFI_FILE_PROTOCOL**, CHAR16*, > UINT64, UINT64) > -> 11000b > > From there, we have everything required to insert the call signatures > (along with their 0x2EBC marker) into the Fat.efi, a fully patched > version of which can be found at [9]. If you diff this file with the one > from the EDK2, you will be able to confirm that the code section is > unchanged, and that the only minimal change that was applied is that > signatures have been inserted in the data section. > > This patched binary can then be used to confirm that the updated EBC > driver runs as expected on ARM, as well as existing platforms. > > This can be achieved using a QEMU firmware in which the native FAT > driver had replaced with an NTFS driver, and then booting from an NTFS > partition containing the patched FAT EBC driver produced by this > procedure. Through this, would were able to demonstrated that data could > be repeatedly accessed from a FAT partition without any issue. The only > thing worth mentioning is that (at least on QEMU) the driver may > sometimes be very slow to load as it accesses the FAT partition, but > this is behaviour which we observed for both ARM and AARCH64, which we > suspect has to do with the emulation layer(s). > > If you want to run this test, under the same conditions as the ones we > used (again, all the required files will be downloaded automatically), > you can issue the following at the root of the EBC Assembler: > > C:\fasmg-ebc> make hello qemu arm ntfs > > You can also run a similar test against AARCH64 by replacing 'arm' with > 'aa64'. > > A similar test was of course performed with the recompiled EBC FAT > driver, as produced through the updated EBC toolchain, and no issues > were observed there either. > > The fact that one can patch the existing EBC FAT driver and run it > without issues in the proposed ARM VM, or that the FAT driver produced > from the EDK2 after this proposal has been applied can also be used in > the ARM VM, will, we hope, be enough to convince that the proposal is > sound and can be integrated. > > Regards, > > /Pete > > > [1] https://github.com/pbatard/fasmg-ebc > [2] > https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/driver/dri= ver > .c > [3] > https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/matrix.asm > [4] https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/max.as= m > [5] > https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/cloaked.as= m > [6] > https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/realloc.as= m > [7] https://github.com/pbatard/fasmg-ebc/blob/master/driver.asm > [8] https://github.com/pbatard/fasmg-ebc/blob/master/native/native.c > [9] http://efi.akeo.ie/EBC/FAT/Fat.efi > > ----------------------------------------------------------------------- > Ard Biesheuvel (1): > MdeModulePkg/EbcDxe: add ARM support > > Pete Batard (4): > MdeModulePkg/EbcDxe: allow VmReadIndex##() to return a decoded index > MdeModulePkg/EbcDxe: add a stack tracker for ARM EBC->native support > MdeModulePkg/EbcDxe: add call signatures for ARM native->EBC support > BaseTools: add scripts to generate EBC call signatures > > ArmVirtPkg/ArmVirt.dsc.inc | 6 +- > ArmVirtPkg/ArmVirtQemuFvMain.fdf.inc | 10 +- > ArmVirtPkg/ArmVirtXen.fdf | 10 +- > .../BinWrappers/WindowsLike/GenEbcSignature.bat | 3 + > .../BinWrappers/WindowsLike/PatchEbcSignature.bat | 3 + > BaseTools/Conf/build_rule.template | 72 ++- > BaseTools/Conf/tools_def.template | 12 + > .../Python/GenEbcSignature/GenEbcSignature.py | 306 ++++++++++ > .../Source/Python/GenEbcSignature/__init__.py | 15 + > .../Python/PatchEbcSignature/PatchEbcSignature.py | 226 ++++++++ > .../Source/Python/PatchEbcSignature/__init__.py | 15 + > MdeModulePkg/Include/Protocol/EbcVmTest.h | 4 +- > MdeModulePkg/MdeModulePkg.dsc | 4 +- > MdeModulePkg/Universal/EbcDxe/Arm/EbcLowLevel.S | 184 ++++++ > .../Universal/EbcDxe/Arm/EbcStackTracker.c | 634 > +++++++++++++++++++++ > MdeModulePkg/Universal/EbcDxe/Arm/EbcSupport.c | 599 > +++++++++++++++++++ > MdeModulePkg/Universal/EbcDxe/EbcDebugger.inf | 10 +- > .../EbcDxe/EbcDebugger/EdbDisasmSupport.h | 4 +- > .../Universal/EbcDxe/EbcDebuggerConfig.inf | 2 +- > MdeModulePkg/Universal/EbcDxe/EbcDxe.inf | 10 +- > MdeModulePkg/Universal/EbcDxe/EbcExecute.c | 292 ++++++++-- > MdeModulePkg/Universal/EbcDxe/EbcExecute.h | 8 + > MdeModulePkg/Universal/EbcDxe/EbcInt.h | 7 +- > MdeModulePkg/Universal/EbcDxe/EbcStackTracker.c | 65 +++ > 24 files changed, 2425 insertions(+), 76 deletions(-) > create mode 100644 > BaseTools/BinWrappers/WindowsLike/GenEbcSignature.bat > create mode 100644 > BaseTools/BinWrappers/WindowsLike/PatchEbcSignature.bat > create mode 100644 > BaseTools/Source/Python/GenEbcSignature/GenEbcSignature.py > create mode 100644 BaseTools/Source/Python/GenEbcSignature/__init__.py > create mode 100644 > BaseTools/Source/Python/PatchEbcSignature/PatchEbcSignature.py > create mode 100644 BaseTools/Source/Python/PatchEbcSignature/__init__.p= y > create mode 100644 MdeModulePkg/Universal/EbcDxe/Arm/EbcLowLevel.S > create mode 100644 > MdeModulePkg/Universal/EbcDxe/Arm/EbcStackTracker.c > create mode 100644 MdeModulePkg/Universal/EbcDxe/Arm/EbcSupport.c > create mode 100644 MdeModulePkg/Universal/EbcDxe/EbcStackTracker.c > > -- > 2.9.3.windows.2 > > _______________________________________________ > edk2-devel mailing list > edk2-devel@lists.01.org > https://lists.01.org/mailman/listinfo/edk2-devel