public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
From: Pete Batard <pete@akeo.ie>
To: "edk2-devel@lists.01.org" <edk2-devel@lists.01.org>
Subject: [PATCH 0/5] MdeModulePkg/EbcDxe: add ARM support
Date: Tue, 24 Jan 2017 12:30:11 +0000	[thread overview]
Message-ID: <eae42e0e-9311-d411-2d32-0aa87b49a402@akeo.ie> (raw)

(This e-mail is fairly lengthy, so an Executive Summary is provided, for 
those who don't want to go through a wall of text).

0. Executive Summary
====================

0.1 Preamble
------------

One of the most vexing aspect of EFI Byte Code (EBC) proposal from the 
UEFI specs is that its EDK2 implementation has somewhat fallen short of 
its implicit goal of universality, due to the non availability of an EBC 
VM for all supported architectures.

As a consequence, we feel that this has resulted in major backtracking 
on EBC, such as EBC not being made a mandatory part of UEFI firmware 
implementations (in the same way as FAT) or not having a default 
provision for EBC bootloaders (e.g. /efi/boot/bootebc.arm).

Shortly after Ard Biesheuvel provided an EBC implementation for ARM64, 
last August, questions were raised with regards to being able to do the 
same for ARM due to issues with trying to work with the calling 
convention on that platform, and more specifically, with the 64-bit 
parameter marshalling that is required there. At the time, these 
problems were deemed difficult to tackle without the collaboration of 
third parties (such as external toolchain developers) or without having 
to restrict the scope of what EBC applications could do (such as 
limiting their access to only "known" EDK2 interfaces, for which 
parameter marshalling specifics would have been added).

This series of patches attempts to remedy all that, by proposing an ARM 
EBC implementation that solves the issues mentioned above in a generic 
and entirely self contained manner (i.e. within the EDK2). With this, 
the EDK2 should finally enable the execution of the same EBC binary 
across ALL supported UEFI architectures, and thus complete the implicit 
goal of EBC.

0.2 Solution Overview
---------------------

The gist of our marshalling solution can be summarized with being able 
to access a 16-bit value, at runtime, for native <-> EBC layer 
transition calls, that indicates which of (up to) 16 function call 
parameters is 64-bit. In turn, this enables the ARM EBC VM to "realign" 
said parameters to a 64-bit boundary as needed.

Hereafter, we will refer to these 16-bit values as a "call signatures".

Now, whereas this solution is self-contained, it does entail a minor 
change to the UEFI EBC specs, that mandates the insertion of call 
signatures at compilation time into the unused part of the 64-bit 
function pointer data object used by BREAK 5 (see 3.3).

However, this non breaking change is both backward and forward 
compatible. Especially, once the specs change is effected, and for *ALL* 
current EBC archs (IA32, IA64, X64, AARCH64):

a) EBC executables that were produced with the older version of the 
specs will run exactly as they did on EBC VMs that comply with the newer 
version of the specs

b) EBC executables that are produced with the newer version of the specs 
will run and perform the same, on EBC VMs that comply with either the 
older or newer version of the specs.

Also, and specifically for ARM (since other platforms are unaffected by 
such concerns), this whole proposal makes it possible for:

- Existing EBC binaries, that do not invoke BREAK 5 (i.e. no native to 
EBC calls) to run onto the proposed ARM EBC VM without any changes. 
Especially, these applications can perform EBC to native calls, on ARM, 
with no adverse effects.

- Existing EBC binaries, that do invoke BREAK 5, but that don't include 
signatures, to partially run on ARM. In this case, the ARM EBC VM will 
return an error status for calls issued from native to the EBC, allowing 
a native app to acknowledge incompatibility issues and potentially let 
the developer know that the adding of signature is needed.

- Existing EBC binaries, that do invoke BREAK 5, to be (relatively 
easily) patched for ARM EBC compatibility. This, for instance, is 
demonstrated with the EDK2 FAT EBC binary in 4.3.

- The updated EDK2 EBC toolchain to produce EBC applications that run on 
*ALL* EBC VMs, including the newly added ARM, as well as other VMs.

0.3 Patch Overview
------------------

The patch series is broken down into 5 parts:

- 1/5 relates to preliminary changes, within the common EBC code, that 
enable VmReadIndex##() functions to optionally return the decoded Const 
and Natural parts, as we need this data for our proposed Stack Tracker. 
It is done by adding an optional pointer to a {Const, Natural} struct, 
that is to be filled when the pointer is not NULL. With the introduction 
of this change, the patch sets all of the new optional pointers to NULL, 
so no actual behavioural change occurs.

- 2/5 introduces the basic ARM EBC VM, as was proposed by Ard as a PoC 
port of the ARRCH64 version. Note that this change alone is enough to 
get standalone EBC code to run on ARM, but may result in the parameter 
marshalling issues we mentioned above, for any call that transitions 
between ARM native and EBC, due to potential "misaligned" 64-bit parameters.

- 3/5 fixes the issue of calling from EBC into native ARM, in a 
completely self contained manner, through the addition of a "Stack 
Tracker". This Stack Tracker is enough to dynamically resolve, at call 
time, whether a specific parameter is 64-bit or not, and thus whether it 
needs to be realigned. This is done through a buffer that starts at 
1/64th of the stack size and is grown dynamically if needed. We 
currently estimate that most EBC applications should not need to use 
more than this initially allocated space, as, in most cases, the stack 
tracker should require less than 1.5 bits to track every 32 bits of 
stack data.

- 4/5 fixes the mirror issue of calling from native into EBC. This is 
the part that requires a specs change, as the thunking code for the ARM 
platform must have some knowledge about the parameter signature of the 
function being called into, to re-align 64-bit parameters. This specific 
section adds handling and processing of call signatures at runtime. It 
should be noted that this patch also bumps the global version of the EBC 
VM from 1.0 to 1.1 as a means to indicate whether an EBC VM is compliant 
with the new specs (though this really only affects ARM, and none of the 
other archs).

- 5/5 adds the insertion of call signatures into EBC binaries at EDK2 
compilation time, in compliance with the proposed specs change. This is 
done by introducing the 2 new python tools: one that works around the 
intel EBC compiler (iec) to parse a C source and generates an optional 
signature data for this source, and one that processes these signature 
files, at link time, to patch the final binary with the signature data 
that will be required at runtime. Eventually, we expect the generation 
of signature data to become part of the iec, thus rendering the first 
tool obsolete.


1. A refresher of the ARM calling convention issue
==================================================

As reminder, the major issue we face when trying to implement an EBC VM 
on ARM has to do with the marshalling of 64-bit parameters from EBC to 
native and from native to EBC.

This is because, as per the AAPCS (Procedure Call Standard for the ARM 
Architecture), ARM has the requirement that any 64-bit function call 
parameters must be aligned to a 64-bit boundary (or an even register, 
for register parameters), whereas, by its nature, the EBC call stack on 
ARM is packed to 32-bit, and therefore 64-bit parameters are not 
guaranteed to be aligned

Without counter-measures, this results in code, such as one calling from 
EBC into a simple (UINT32, UINT64) native function, or calling from 
native into a (VOID*, INT64) EBC function, that ends up with garbage 
parameter data.

The one solution we see to work around this is through the use of call 
signatures, that tell us, at call time, where 64-bit arguments are 
located so that we can realign them. However, as we demonstrate in this 
proposal, the provision of call signatures does not need to be intrusive 
with regards to the development flow.


2. EBC to native marshalling: The Stack Tracker
===============================================

2.1. Overview
-------------

On any architecture other than EBC, the idea of using a stack tracker to 
determine the size of a call parameter would sound like a simplistic 
approach. After all, how can one tell if a sequence of 32-bit values 
being pushed onstack is meant to be used as two 32-bit parameters, or a 
single 64-bit parameter, with the low and high words being pushed 
separately.

However, a careful reading of the EBC specs (UEFI 2.6, section 21.9.3) 
enables us to conclude the following: On an EBC platform, any non-64-bit 
call parameter will be enqueued as a natural.

Furthermore, because EBC naturals can only be enqueued in an atomic 
manner (in other words, it is not possible to use a combination of 
shorter PUSHes or MOV's to add a natural onstack), then, by tracking 
natural operations, which we can easily do in the VM, it is possible to 
determine where non 64-bit parameters have been enqueued, and therefore 
also deduce where 64-bit parameters are located.

 From there, we devise that we can add a "stack tracker" on ARM, to 
monitor the EBC executable's stack operations for which, in order to 
minimize the amount of data required for tracking, we will use sets of 2 
bit sequences, using the following encoding:
- 01b -> a natural has been enqueued on stack
- 00b -> a contiguous set of (non-natural) 64-bit data is present on stack
- 1xb -> start of dual 2-bit sequence (4-bits). x along with the the 
next 2 bits indicates the number of contiguous bytes of data that have 
been enqueued (as non natural data). For instance a 10b 01b sequence may 
be used to indicate that a PUSH8 equivalent operation has been effected.

The dual 2-bit sequences are needed as an application may be enqueuing 
non-natural parameters with the aim of constructing a (potential) 64-bit 
parameter.

For instance, if we have a sequence of 4 MOVIw @R0, ..., we want the 
stack tracker to be able to ultimately resolve the enqueued data as a 
bona-fide 64-bit parameter if needed so that, as data is being enqueued, 
we see the stack tracker being updated as such:

10b 10b (16 bits of data enqueued)
11b 00b (32 bits of data)
11b 10b (48 bits of data)
00b     (64-bit of data -> the dual 2-bit sequence is now collapsed into 
a single 2-bit sequence)

The use of 2 bit for an actual 64-bit "stride" of data, vs. 4 bits for 
other lengths, is of course intended as a form of basic compression to 
reduce the amount of space required for stack tracking, since we expect 
the frequency of 64-bit and natural stack elements to be a lot higher 
than smaller sized elements.

2.2 Call time usage
-------------------

With this in effect, at call time, we can look into the stack tracer to 
determine whether each (potential) parameter is either natural or 
64-bit, and then construct a 16-bit call signature. Note that, as per 
the ARM calling convention, both register and non register 64-bit 
parameters must to be aligned, which, for register parameters, that 
means r0 or r2 must be used as the first word of the 64-bit argument.

Also, in case this needs to be clarified, please note that, even as we 
state that there are only 2 types of parameters that an EBC VM can use 
for an EBC to native call (Natural or 64-bit), this does not imply that 
an EBC application cannot call into a native function that, say, takes a 
BYTE as a parameter. Only that, if it does so, the EBC specs requires 
that it must reserve space for a Natural onstack.

Now, one element we cannot determine is just how many parameters the 
target call takes. However this is something that can be safely ignored, 
as there are no issues associated with passing a larger parameter call 
stack than what is actually needed. The extra (potential) parameters we 
enqueue will simply be ignored.

In our implementation we therefore set the maximum number of parameters 
that a native function call may deal with to 16, which means that we 
always assume that the function call might take 16 arguments. This is 
based on what we've seen other VMs do, as well as what we consider safe 
for the official UEFI interface calls. Currently, we are not aware of 
any calls in the EDK2 taking more than 16 parameters, and we also don't 
expect user applications to pass more than 16 parameters.

Finally, it should be pointed out that, because we don't know the actual 
number of parameters for a call, we may still attempt to process some 
dual 2-bit sequences as part of our 16-bit call signature creation. 
However, if we do, we know that they cannot apply to a formal parameter, 
so we can either choose to ignore them, or just let the implementation 
toggle call signature bits indiscriminately (as we do in our proposal).

2.3 Space considerations
------------------------

The stack tracker is designed to grow dynamically and is currently 
allocated to 1/64th of the total stack space on startup (as a buffer 
that is separate from the stack. We initially considered reserving the 
stack tracker as part of the the stack buffer, but dismissed that 
approach). Currently, each time a reallocation is needed, the stack 
tracker is set to double in space.

For typical executables then, even those who tend to err towards high 
stack usage, we don't expect the initial bufferspace to outgrow the 
1/32th (total stack space) mark, especially as a large part of the stack 
will be reserved for what the stack tracker sees as contiguous sways of 
64-bit elements, which will be stored in 1/32th.

However, we still need to consider the theoretical worst case scenario, 
where someone create an EBC application that consists only of:
PUSHn ...
MOV(I)b @R0, ....
PUSHn ...
for as much stack space as is available (R0 being the EBC stack pointer).

In this case, the stack tracker will allocate 2 bits for each PUSHn and 
4 for each MOV (because they equate to pushing a byte), but none of the 
4 bit sequences will ever be collapsable into a 2-bit one (once we have 
accumulated enough bytes to form a 64-bit) which means that 32 bits + 32 
bits of actual stack data (because while 8 bits are pushed, they are 
aligned to 32 on the next PUSHn) = 64 bits are encoded into 6 bits in 
the stack tracker, or ~1/10th of stack space. And since we double the 
stack tracker size each time it needs to be reallocated, this means that 
the very worst case scenario we can see in terms of space would be a 
stack tracker that needs to occupy 1/8th of the stack data at worst.

Even if that was a realistic scenario, we don't consider that the 
drawback of having to (potentially) reserve an extra 1/8th outweighs the 
advantages of allowing ARM users, to benefit from an EBC VM. Still, we 
will point out that this is not a scenario we ever expect to see in a 
practical application. Instead the worst scenario we expect for an 
exceedingly stack heavy EBC executable, that enqueues a lot of <= 32 bit 
elements, would be 1/16th of stack space (or 64KB, since the default 
stack buffer is ~1MB), which we think is very reasonable, even on ARM. 
Furthermore, we consider it a fair estimate that 99% of applications 
will never need more than the initial 1/64th (or 16KB) allocated for 
stack tracking.

2.4 Additional considerations
-----------------------------

2.4.1 Local variables onstack

Typically, at the beginning of a subroutine, a MOV R0, R0(0,-n) will be 
used by the compiler to reserve space for local variables.

For instance, a compiler may insert MOV R0, R0(0,-1024) to reserve space 
for 1024 bytes onstack. And while developers and users can reasonably 
expect this to be a near-instantaneous operation, if our stack tracker 
is going to read that operation as a set of 128 x 64-bit longwords being 
allocated, and then perform 128 x {read byte; modify 2 bits; write byte} 
operations, one may have misgivings about the performance impact of 
tracking the stack.

However, the stack tracker is designed to recognize operations 
pertaining to repeated sequences of data, and optimize them. In short, 
in the current implementation, the sequence above will typically result 
in the stack tracker simply zeroing a set of 128 bytes in one go, 
instead of trying to repeatedly update individual 2-bit sequences.

Note that this holds true even if there is a need to propagate a dual 
2-bit sequence as part of tracking a large set of 64-bit longwords.

To confirm this, and as part of our test suite, we also have a test 
where half the stack is reserved and then released for local data, 10 
times in a row, and we see that the execution of this test is near 
instantaneous in QEMU, confirming that there should be no performance 
bottleneck (see 4.1 -> Realloc).

2.4.2) Stack buffer switching

First, we must point out that, per our testing, NONE of the current EBC 
VM implementations from the EDK2 currently allow the EBC stack buffer to 
be switched to a different buffer by an EBC application at runtime. 
Especially the following EBC assembly code will freeze execution on 
MOVREL, on all current VMs:

EfiMain:
   MOV       R6, R0
   MOVREL    R0, StackTop
   MOV       R0, R6
   RET

section '.data' data readable writeable
   StackBuf: dq 255
   StackTop: dq 1

Nonetheless, in case this ever becomes a possibility, our stack tracking 
proposal does have provision for stack buffer switching. The only 
limitation we have (and this is really the only limitation we see for 
the whole proposal), is that it can only handle one level of switching. 
In other words, provided stack switching was possible (which currently 
isn't the case) the stack tracker wouldn't be able to properly track 
parameters if a second stack buffer switching occurs within code that 
executes against a stack buffer that was already switched. However, the 
proposal should still be fine if only a single level of stack switching 
occurs (i.e. we should be able to track switch/restore, no matter how 
many times such switching is repeated during the execution of an 
application).

Thus, considering that:
1. It does not currently seem possible to switch stack buffer on any arch
2. The EBC compiler does not offer the ability to manipulate the stack 
pointer directly in the first place
3. Stack switching only becomes an issue if done recursively

We consider that this one limitation of our implementation can be 
dismissed as too unrealistic and cannot be construed as a showstopper.

2.4.3 Delta stack pointer updates

Outside of the obvious PUSH/POP operations, the stack tracker does track 
mathematical/logical modification of R0, by computing the delta from the 
previous R0 value. Most of the time this delta would only have a const 
component, which we then try to resolve to a complete or partial set of 
64-bit consecutive values. However, there also exist instances, such as 
MOV R0, R0(+n,+c), where we will have both a constant and a natural part 
to track.

While updating the stack tracker itself with such data is not 
technically an issue, one may wonder if the order in which natural and 
constants are processed might have an effect on our ability to determine 
where 64-bit parameters are located.

However, when one looks more closely at the validity of such concerns, 
the conclusion will be drawn that such an operation can never be used to 
fill actual call parameters (which would have to be optional in the 
first place, since it is of course impossible, with the current EBC 
specs, to use such an operation to pass actual data), as neither the 
specs nor the VM make any promise as to the order in which constant and 
natural parameters are processed. Therefore a programmer cannot assume 
how its call parameter stack will be set from invoking such an 
operation. Thus, as far as tracking data to determine if a parameter is 
natural or 64-bit, the order induced by the operation above is 
irrelevant, and we are therefore free to pick whichever order makes most 
sense for our implementation.

2.4.4 "Cloaked" stack operations

"Cloaked" stack operation are stack operations that are not effected 
using R0 as the stack pointer. For instance, someone may copy R0 into 
R1, then alter the data pointed by either R0 or R1, and then copy R1 
(which may or may not have been modified) back to R0 . And whereas stack 
switching (i.e. trying to have R0 point to an address that isn't within 
the current stack buffer) currently break VM execution, moving R0 within 
the stack buffer, even in a cloaked manner, is something that the EBC VM 
can and does perform without issues.

Ultimately, there are two types if cloaking that may come into effect, 
which we'll call positive and negative cloaking.

Positive cloaking (dequeuing) is a non issue. This basically intervenes 
when the restored R0 is at a higher address than the original one. Since 
stacks grow downwards, this means that dequeueing of data has been 
issued, which we can handle with relative ease by going through our 
existing tracker data, and removing natural and const elements, 
according to their size, until we match the R0 address delta.

Negative cloaking (enqueuing) could be seen as more problematic, as one 
may consider that, since we're not tracking anything but R0, someone may 
use something like an R1 index manipulation to enqueue natural and const 
call parameters, before eventually assigning R1 to R0, which would 
defeat our tracking ability.

However, for all purposes, negative cloaked stack pointer updates should 
never be used to enqueue call arguments. Especially, by NOT explicitly 
declaring whether an argument is a natural or a 64 bit, through a direct 
stack operation, one would be deliberately misinterpreting the intent of 
the specs, which states (21.9.3) that "Parameters are pushed on the VM 
stack" and pretty much implies direct stack operation. At the very 
least, we do not expect C applications to use negative cloaking (as C 
does not have provisions for anything like this), and furthermore, 
should we believe that specs may be misinterpreted, we could add a 
"directly" qualifier, so that all ambiguity is removed.

2.5 Code changes overview
-------------------------

The stack tracker is introduced with PATCH 3/5, and contains the 
following code changes:

2.5.1: In EbcVmTest.h we add a new pointer to an optional, opaque and 
arch-specific structure. This is the structure that will be used for our 
Stack Tracker on ARM. It is important to note that, in the common code, 
we use the presence or absence of this pointer (whether the pointer is 
NULL) to determine whether stack tracking is in operation. We preferred 
this approach to using #ifdef MDE_ARM in the code, as we believe it is 
cleaner.

2.5.2: In EbcExecute.c we add stack tracking for any instruction that 
may require it. This is done for anything that deal with accessing the 
content pointed by R0 (the stack tracker) or manipulating R0 directly, 
including mathematical operations on R0, as well as (obviously) PUSH and 
POP operations.
Depending on whether VmPtr->StackTracker is NULL (currently, only ARM) 
and whether the instructions affects R0, we then invoke one of 
UpdateStackTracker() or UpdateStackTrackerFromDelta().

2.5.3: Since we introduce the 2 functions above, we also add a blank 
EbcStackTracker.c, at the top level, to be used on any arch that doesn't 
require stack tracking. Calls are defined as empty functions there.

2.5.4: For ARM alone, we add Arm/EbcStackTracker.c which contains the 
definition for the 2 functions above, as well as any other stack tracker 
support function, such as the ones that deal with stack tracker buffer 
allocation/release.

2.5.6: In Arm/EbcSupport.c, we add the necessary calls for stack tracker 
allocation/release as well as modify EbcLLCALLEX() to use a new 
EbcLLCALLEXNativeArm() which takes an extra argument for the current 
argument layout as returned by the stack tracked.

2.5.7: in Arm/EbcLowLevel.S, we define the EbcLLCALLEXNativeArm, which 
takes care of properly aligning up to 16 64-bit parameters, according to 
the argument layout.


3. Native to EBC native marshalling: Internal call signatures
=============================================================

3.1 Overview
------------

Once again, we start from the principle that a call signature is needed, 
at layer transition time (native -> EBC invocation), so that we can 
realign 64-bit call parameters as needed.

In this case, since the application performing the call request is 
native, we cannot use anything like a stack tracker (which wouldn't work 
for native code anyway) and instead, need to ensure that we can have the 
call signature at our disposal when we perform thunking.

Two elements that works to our advantage for this part are that:
- The code for which we need signature awareness is the EBC code itself, 
in other words, code that should be produced using the EKD2 EBC toolchain.
- Every single function call for which we need a signature must also be 
function call for which thunking will be set using BREAK 5.

 From there, it's easy to devise a solution that consists of modifying 
the EBC generation toolchain, so that it adds a 16-bit call signature 
into the 64-bits offsets used by BREAK 5 (which only ever use a 32-bit 
payload), to make signatures available when thunking is invoked.

3.2 Implementation
------------------

After having confirmed that, as per specs, all of the current EBC VM 
implementations do ignore the high 32-bit part of the 64-bit used by 
BREAK 5 (which means that we can alter this existing data without 
incurring any drawbacks), we identify it as the best place to store the 
16-bit signature, along with an extra 16-bit marker, which we'll then 
use to detect EBC binaries that were compiled without signatures.
The other reasons that make us want to use this element are that:
- This is space that is already available in an EBC binary (i.e. no need 
to add extra data/instructions)
- This can enable the patching of existing EBC binaries.
- It makes logical sense to have it there, since the call signatures are 
related to functions that requires BREAK 5 invocation.

When BREAK 5 is invoked, we should therefore be able to copy this 
signature (if available) into the EBC_INSTRUCTION_BUFFER structure, and 
subsequently use that data during call thunking, to align 64-bit parameter.

Of course, while we will now require EBC binaries to be decorated with 
additional signature data, we don't want EBC developers to have to go 
through the process of inserting these signatures manually. Instead, we 
automate the signature insertion so that it will run at compilation 
time. To that effect, we introduce 2 Python scripts in BaseTools:
- GenEbcSignature, invoked after each object generation, parses a 
preprocessed C source, along with the object file, to create the 
signature data, which is then stored into a corresponding .sig files.
- PatchEbcSignature, invoked at final application link time, processes 
the .sig files as well as the .map data and .efi binary, and inserts the 
signatures at the relevant location in the final binary

Currently, and because we expect the intel EBC compiler to be updated to 
follow the new specs (since it is really the best place to perform such 
processing, as it has access to the C parser, lexer as well as the full 
preprocessed source, and can more easily determine the nature of 
function call arguments), we see GenEbcSignature as a stopgap solution 
until said compiler is updated.

Therefore, GenEbcSignature was designed to be very basic with regards to 
the ability to properly detect 64-bit parameters. For instance, it 
expects the processed source to follow the EDK coding conventions and it 
also requires straight INT64 or UINT64 parameters to be used (i.e. no 
redefinitions of these basic types). Of course, since any aspect of the 
signature generation and insertion can be amended, we are ready to 
modify the proposal according to what intel sees as the best course of 
action with regards to iec integration.

On the other hand, we expect PatchEbcSignature to remain part of the EBC 
toolchain in one form or another, as signature insertion intervenes post 
linking, and the EBC linker is not something that was written 
specifically for EBC (regular Microsoft linker) and there isn't much 
performance/optimization to be had for not having an extra step here.

Finally, both scripts currently rely on the intel EBC compiler 
referencing externally callable functions with a "_plabel" suffix, which 
is what we empirically identified as intel's marker for such calls. Of 
course, not knowing the internals of the iec, it is possible that this 
assertion does not hold, in which case, there again, we can work with 
intel iec developers to refine it...

3.3 Proposed specs change
-------------------------

UEFI Specs Version 2.6 (January, 2016) are used for all the changes 
highlighted below.

Proposed alterations/insertions are to be found within brackets [ ]

* Section 21.8 -> BREAK -> BREAK 5:

"Create thunk. This causes the interpreter to create a thunk for the EBC 
entry point whose 32-bit IP-relative offset is stored in the low part of 
a 64-bit data address in VM register R7[, and whose call signature is 
stored in the high part. For details on how the signature should be 
generated, see section 21.12.10.2]. The interpreter then
replaces the contents of the memory location pointed to by R7 to point 
to the newly created thunk (...)"

* Section 21.12.10.2: Thunking Native Code to EBC

"Typical C code to install a generic protocol is shown below.
   EFI_STATUS Foo(UINT32 Arg1, UINT[64] Arg2);
   (...)

"To support thunking native code to EBC, the EBC compiler resolves (...)
  • Associated relocations[ and optional call parameter alignment] for 
the above

[In order to perform optional parameter alignment, the EBC toolchain is 
required to insert a 16-bit call signature, along with a 16-bit marker, 
in the high 32-bit word of the 64-bit function pointer data object.

A bit of the call signature is set to 1 if a parameter is 64-bit or 0 
otherwise, with the first parameter at bit 0. If a function call uses 
less than 16 parameters, any unused bit should be set to 0. EBC function 
calls with more than 16 parameters are not supported.

The 16-bit signature should then be written into bits 32 to 47 of the 
64-bit function pointer data object, and bits 48 to 63 set to 0x2EBC.

Thus, for the (UINT32, UINT64) function call above, the 64-bit function 
pointer data object that the EBC toolchain would need to store at 
Foo_pointer is:

(Foo - Foo_pointer - 4) + (0x0002 << 32) + (0x2EBC << 48)]

3.4 Code changes overview
-------------------------

o PATCH 4/5:

3.4.1: In EbcVmTest.h we bump the VM version minor from 0 to 1. This is 
because, while there isn't any actual incompatibility being introduced 
for existing VMs, we feel that EBC developers may still want the ability 
to detect if the VM they are running against is call-signature 
compatible (v1.1) or not (v1.0) and possibly take action as a result.

3.4.2: In EbcInt.h we define the EBC_CALL_SIGNATURE marker, and 
introduce a new flag for the ARM create-thunk function, to indicate 
whether a call signature needs to be processed.

3.4.3: In EbcExecute.c we modify ExecuteBREAK() to read the call 
signature (if present) and then set pass that signature along with the 
FLAG_THUNK_SIGNATURE flag to the arch-specific EbcCreateThunks(). Note 
that, for any other arch than ARM, the flags were already ignored, so no 
changes are needed.

3.4.5: In Arm/EbcSupport.c, we modify EbcCreateThunks() so that it reads 
the new flag and signature, if provided, and store it into the private 
thunk data (InstructionBuffer.EbcCallSignature)

In EbcInterpret() we check whether the signature is present, and if so, 
align parameters as needed. If not, we return an 
EFI_INCOMPATIBLE_VERSION status.

3.4.6: Arm/EbcSupport.S is also modified to handle the new 
EbcCallSignature word of InstructionBuffer.

o PATCH 5/5:

3.4.7: BaseTools\Source\Python\GenEbcSignature\GenEbcSignature.py is the 
new call signature generation script. It is meant to be called after iec 
has generated an object file, and is set to take the preprocessed source 
on stdin (so that we can parse function call declarations from headers) 
along with the object file, and generates signature data (which, in its 
current form, is the python serialized data from a dictionary). 
Basically, we parse the COFF object and locate the symbol table, where 
we identify all the symbols that have a _plabel suffix. When then try to 
locate each symbol in the preprocessed source, as possible function 
calls, and, if we find one, detect whether it has [U]INT64 parameters 
(from either a function declaration or definition). We then use python's 
"pickle" functionality to serialize our call signature dictionary.

3.4.8: BaseTools\Source\Python\PatchEbcSignature\PatchEbcSignature.py is 
the call signature insertion script. This time, we process the .map file 
for the produced .efi to identify the address of _plable suffixed 
function calls. These will be the addresses we need to insert signatures 
into.
Then, we take a list of either .sig or .lib files (which we convert to 
.sig path), and unserialize them to build a full dictionary of call 
signatures.
Then, after a few sanity checks, we insert the signatures, along with 
the markers.

3.4.9: In BaseTools\Conf\build_rule.template, we add the step that call 
on GenEbc/PatchEbcSignature when EBC binaries are being produced.

Note that, as they are introduced in this proposal, these calls 
currently have the -v (verbose) flag set, so that additional information 
about the call signature generation and insertion is displayed during 
compilation. Eventually, we want to remove the -v flag.

3.4.10: In BaseTools\Conf\tools_def.template, we add 2 new variables for 
the new tools.

The rest of the changes should be explicit.


4. Test Suite
=============

A comprehensive validation test suite is provided, in order to 
demonstrate that the proposal does work as advertised. As may be 
expected, these tests are based on using the QEMU_ARM firmware that is 
generated from an EDK2 source tree where these patches have been applied.

For convenience, it will also be assumed that:
- Windows x64 is being used as the test platform
- QEMU 2.7.0 or later (64-bit version) is available and installed under 
C:\Program Files\qemu\ (NB: 2.7.x or 2.8.x should work fine for ARM, but 
I found that the 2.8.x precompiled Windows QEMU binary had issues with 
AARCH64)
- One has cloned the fasmg EBC Assembler [1] (which contains most of the 
test suite) into C:\fasmg-ebc\

4.1 EBC -> native ARM test suite
--------------------------------

The test suite for the stack tracker can be found in the EBC assembler, 
which is an Open Source EBC assembler [1], based on fasmg, that was 
developed in parallel to this proposal (but that isn't directly related 
to it). The use of an EBC assembler makes it convenient to both compile 
and validate/debug test applications (through the EBC Debugger). Also, 
some aspects of what we are testing (such as cloaked stack 
manipulations) would be difficult to test outside of assembly.

Besides the EBC test applications, we do require a native UEFI driver, 
that will install a set of native protocols, which we can call into for 
testing. This driver [2], which is also provided as part of the EBC 
assembler, is written in C (and compiled as a gnu-efi based VS2015 
solution, for convenience reasons). Both pre-compiled ARM and IA32 
driver binaries are provided if needed.

To run this part of the suite, whose prime purpose is to validate the 
stack tracker, one should navigate to the stack_stracker\ subdirectory 
of the EBC Assembler and run something like:

C:\fasmg-ebc\stack_tracker> make qemu

This will download the required files as needed (such as the latest 
fasmg assembler, or the QEMU ARM firmware), assemble the EBC test 
programs, and then run all the tests in an ARM QEMU environment.

The suite is comprised of:

- Matrix test [3], that tests every single of the 16 possible 
combinations for a 4-parameter native call. In other words, this 
validates that that every single parameter we pass, from (UINTN, UINTN, 
UINTN, UINTN) to (UINT64, UINT64, UINT64, UINT64), is received, without 
mangling, by the ARM native driver.

- Max test [4], that confirms that 16 parameters can be successfully 
received. We perform 3 sets of tests here: 16 native parameters, 16 
64-bit parameters and 16 intermixed.

- Cloaked test [5], that performs a set of stack operations using R1 
instead of R0 as the stack pointer, while interspersing the queuing of 
actual parameters for a native function call.

- Realloc test [6], that forces the stack tracker to grow (realloc) its 
buffer, by reserving half the stack as local space, and also that tests 
the speed at which the stack tracker is able to process half the stack 
being reserved/restored as a local space, by repeating the operation 10 
times in a row

Note that a Switch test was also written, that attempts to switch the 
stack buffer, but since this is an operation that does not work on ANY 
of the EBC VMs, it is left out of the suite.

Of course, if you don't want to use the pre-built QEMU_EFI_ARM.fd 
firmware, which will be downloaded from my servers, you can build and 
copy your own in the stack_tracker\ directory.

Needless to say, if the patch series has been properly applied, all of 
the tests above will report a "PASS" status, confirming that the stack 
tracker works.

4.2 Native ARM -> EBC test suite
--------------------------------

This time, we want to test the reverse operation of marshalling from ARM 
to EBC, so we need to create an EBC protocol driver (driver.asm - [7]), 
similar to the native C driver we created for the previous test, along 
with a native application (native.c in the native/ subdirectory [8]) 
that will call into the protocols installed by the EBC driver.

The native test application includes:

- A complete matrix test, similar to the one used to validate the stack 
tracker (i.e. 16 protocol calls taking 4 arguments that are all possible 
combinations of UINTN or UINT64)

- An additional set of protocol calls, that take 16 parameters in all.

It should also be noted that, since the fasmg-ebc assembler already has 
provision for the insertion of call signatures into the BREAK 5 data 
(through its 'EXPORT' macro), there is no need to patch the EBC binary.

Then, to run the test suite, one can simply run (in the fasmg-ebc root 
directory)

C:\fasmg-ebc> make driver qemu arm

To compile and install the EBC driver in qemu, and invoke the native 
test application.

You can also invoke the EBC debugger if you replace 'qemu' with 'debug' 
(a relevant debugger binary will be downloaded automatically). This test 
suite can also be run for other architectures that ARM by replacing 
'arm' with one of 'x64', 'ia32' or 'aa64' (again, the relevant firmware 
will be downloaded automatically if not already provided).

On particular note, if you try to run this test suite for IA32, you will 
see that the 'MaxParam64' test (which validates the ability for calls to 
take 16 64-bit parameters) does fail, as the IA32 EBC VM doesn't seem to 
currently have been designed to handle that many arguments.

4.3 Patching the EDK2's FAT EBC binary
--------------------------------------

Finally, we conclude this introductory note with a real-life example of 
how one can take an existing EBC binary and patch it, so that it will 
run in all VMs, including ARM. This also enables us to further validate 
this proposal, by demonstrating that a fairly complex existing EBC 
application can and does indeed run without issues on ARM.

One thing we need to be clear about from the onset, is that this step is 
NOT something that we expect any EBC developer to have to go through. 
Instead, they should just be able to recompile their code, with the 
patched version of the EBC toolchain, and when they do so, they will 
find that the required signatures have been automatically inserted in 
the resulting EBC binary.

This exercise is only to demonstrate that, if one really needs to, this 
proposal also makes is possible to insert signatures into existing EBC 
binaries, to allow them to run on the new ARM VM.

For this example, we will use the FAT EBC binary driver currently 
included in the EDK2 (under FatBinPkg/EnhancedFatDxe/Ebc/Fat.efi).

Because the proposed ARM VM already takes care of EBC -> native ARM 
handling (through the stack tracker), the only part we need to concern 
ourselves with are the call signatures for native ARM -> EBC invocations.

What we first need to identify then, are the 64-bit locations where the 
32-bit offsets that are used in conjunction with BREAK 5 are stored. 
Obviously, these the elements should be located in the data section, and 
furthermore, we can infer that they should be easily recognisable as 
32-bit negative data offsets (most likely in 0xFFFF.... or 0xFFFE.... 
since the executable isn't that large, and the code sections can be 
expected to be set before data sections), followed by 4 zeroed bytes.

We can also leverage some knowledge of the Microsoft linker with regards 
to how it generates DLL entrypoint addresses (which is what is really 
being used behind the scenes to generate the 32-bit BREAK 5 offsets) as 
it seems to always place these offsets padded to a 16-byte alignement. 
Therefore, we can easily identify that there exist 23 BREAK 5 data 
locations in the data section, with the first one being at address 
0x000109e0, and the last at 0x00010eb0. These mark the addresses at 
which we will need to add call signatures.

However, while it can easily help us find the locations we are after, 
the binary enough is not enough to help us determine the call signature 
data. In this specific case, we will consider that one also has access 
to a .map file that is generated, as part of the EDK2 EBC toolchain, 
during final linking (for us, that would be something like 
"edk2\Build\Fat\RELEASE_VS2015\EBC\FatPkg\EnhancedFatDxe\Fat\OUTPUT\Fat.map). 
If a map file is not available, one will of course need to use other 
means to "guess" what each of the BREAK5 data call is for. Also, it 
doesn't matter if that .map file isn't the exact one that was generated 
with the binary (and as a matter of fact, even as the FAT EBC binary was 
updated very recently, most of this procedure was conducted against the 
2015.08 version of the binary, using a map file that was more that one 
year more recent), as we just use it to get a list of calls, along with 
their expected order. This is because, if you look at the .map file you 
can find that all the EBC calls that may be invoked from native will be 
suffixed with a "_plabel".

 From there, we can deduce that the 23 addresses we have found, and in 
the order we found them, are respectively for:

00109e0 _DriverUnloadHandler()
0010c80 FatDriverBindingStop()
0010c90 FatDriverBindingStart()
0010ca0 FatDriverBindingSupported()
0010d20 FatComponentNameGetControllerName()
0010d30 FatComponentNameGetDriverName()
0010d50 FatOnAccessComplete()
0010d90 FatOpenVolume()
0010da0 FatFlushEx()
0010db0 FatWriteEx()
0010dc0 FatReadEx()
0010dd0 FatOpenEx()
0010de0 FatFlush()
0010df0 FatSetInfo()
0010e00 FatGetInfo()
0010e10 FatSetPosition()
0010e20 FatGetPosition()
0010e30 FatWrite()
0010e40 FatRead()
0010e50 FatDelete()
0010e60 FatClose()
0010e70 FatOpen()
0010eb0 InternalEmptyFunction()

Because the EBC FAT driver was updated recently, we may also find that 
the addresses we identified also match the ones from the .map file 
(minus an 0x1000000 offset), but, as we tried to point out, this is not 
an absolute requirement and one does not necessarily have to use the 
exact same .map file as the one generated for the binary they are trying 
to patch.

Now, looking at the source/headers (which can also be deduced from the 
.map), we find that, out of these 23, only 3 functions need to have a 
signature call that is non-zero (i.e. 3 calls actually use 64-bit 
parameters). Those are:

0010dd0 FatOpenEx(EFI_FILE_PROTOCOL*, EFI_FILE_PROTOCOL**, CHAR16*, 
UINT64, UINT64, EFI_FILE_IO_TOKEN*)
  -> 011000b
0010e10 FatSetPosition(EFI_FILE_PROTOCOL*, UINT64 Position)
  -> 10b
0010e70 FatOpen(EFI_FILE_PROTOCOL*, EFI_FILE_PROTOCOL**, CHAR16*, 
UINT64, UINT64)
  -> 11000b

 From there, we have everything required to insert the call signatures 
(along with their 0x2EBC marker) into the Fat.efi, a fully patched 
version of which can be found at [9]. If you diff this file with the one 
from the EDK2, you will be able to confirm that the code section is 
unchanged, and that the only minimal change that was applied is that 
signatures have been inserted in the data section.

This patched binary can then be used to confirm that the updated EBC 
driver runs as expected on ARM, as well as existing platforms.

This can be achieved using a QEMU firmware in which the native FAT 
driver had replaced with an NTFS driver, and then booting from an NTFS 
partition containing the patched FAT EBC driver produced by this 
procedure. Through this, would were able to demonstrated that data could 
be repeatedly accessed from a FAT partition without any issue. The only 
thing worth mentioning is that (at least on QEMU) the driver may 
sometimes be very slow to load as it accesses the FAT partition, but 
this is behaviour which we observed for both ARM and AARCH64, which we 
suspect has to do with the emulation layer(s).

If you want to run this test, under the same conditions as the ones we 
used (again, all the required files will be downloaded automatically), 
you can issue the following at the root of the EBC Assembler:

C:\fasmg-ebc> make hello qemu arm ntfs

You can also run a similar test against AARCH64 by replacing 'arm' with 
'aa64'.

A similar test was of course performed with the recompiled EBC FAT 
driver, as produced through the updated EBC toolchain, and no issues 
were observed there either.

The fact that one can patch the existing EBC FAT driver and run it 
without issues in the proposed ARM VM, or that the FAT driver produced 
from the EDK2 after this proposal has been applied can also be used in 
the ARM VM, will, we hope, be enough to convince that the proposal is 
sound and can be integrated.

Regards,

/Pete


[1] https://github.com/pbatard/fasmg-ebc
[2] 
https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/driver/driver.c
[3] 
https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/matrix.asm
[4] https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/max.asm
[5] 
https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/cloaked.asm
[6] 
https://github.com/pbatard/fasmg-ebc/blob/master/stack_tracker/realloc.asm
[7] https://github.com/pbatard/fasmg-ebc/blob/master/driver.asm
[8] https://github.com/pbatard/fasmg-ebc/blob/master/native/native.c
[9] http://efi.akeo.ie/EBC/FAT/Fat.efi

-----------------------------------------------------------------------
Ard Biesheuvel (1):
   MdeModulePkg/EbcDxe: add ARM support

Pete Batard (4):
   MdeModulePkg/EbcDxe: allow VmReadIndex##() to return a decoded index
   MdeModulePkg/EbcDxe: add a stack tracker for ARM EBC->native support
   MdeModulePkg/EbcDxe: add call signatures for ARM native->EBC support
   BaseTools: add scripts to generate EBC call signatures

  ArmVirtPkg/ArmVirt.dsc.inc                         |   6 +-
  ArmVirtPkg/ArmVirtQemuFvMain.fdf.inc               |  10 +-
  ArmVirtPkg/ArmVirtXen.fdf                          |  10 +-
  .../BinWrappers/WindowsLike/GenEbcSignature.bat    |   3 +
  .../BinWrappers/WindowsLike/PatchEbcSignature.bat  |   3 +
  BaseTools/Conf/build_rule.template                 |  72 ++-
  BaseTools/Conf/tools_def.template                  |  12 +
  .../Python/GenEbcSignature/GenEbcSignature.py      | 306 ++++++++++
  .../Source/Python/GenEbcSignature/__init__.py      |  15 +
  .../Python/PatchEbcSignature/PatchEbcSignature.py  | 226 ++++++++
  .../Source/Python/PatchEbcSignature/__init__.py    |  15 +
  MdeModulePkg/Include/Protocol/EbcVmTest.h          |   4 +-
  MdeModulePkg/MdeModulePkg.dsc                      |   4 +-
  MdeModulePkg/Universal/EbcDxe/Arm/EbcLowLevel.S    | 184 ++++++
  .../Universal/EbcDxe/Arm/EbcStackTracker.c         | 634 
+++++++++++++++++++++
  MdeModulePkg/Universal/EbcDxe/Arm/EbcSupport.c     | 599 
+++++++++++++++++++
  MdeModulePkg/Universal/EbcDxe/EbcDebugger.inf      |  10 +-
  .../EbcDxe/EbcDebugger/EdbDisasmSupport.h          |   4 +-
  .../Universal/EbcDxe/EbcDebuggerConfig.inf         |   2 +-
  MdeModulePkg/Universal/EbcDxe/EbcDxe.inf           |  10 +-
  MdeModulePkg/Universal/EbcDxe/EbcExecute.c         | 292 ++++++++--
  MdeModulePkg/Universal/EbcDxe/EbcExecute.h         |   8 +
  MdeModulePkg/Universal/EbcDxe/EbcInt.h             |   7 +-
  MdeModulePkg/Universal/EbcDxe/EbcStackTracker.c    |  65 +++
  24 files changed, 2425 insertions(+), 76 deletions(-)
  create mode 100644 BaseTools/BinWrappers/WindowsLike/GenEbcSignature.bat
  create mode 100644 BaseTools/BinWrappers/WindowsLike/PatchEbcSignature.bat
  create mode 100644 
BaseTools/Source/Python/GenEbcSignature/GenEbcSignature.py
  create mode 100644 BaseTools/Source/Python/GenEbcSignature/__init__.py
  create mode 100644 
BaseTools/Source/Python/PatchEbcSignature/PatchEbcSignature.py
  create mode 100644 BaseTools/Source/Python/PatchEbcSignature/__init__.py
  create mode 100644 MdeModulePkg/Universal/EbcDxe/Arm/EbcLowLevel.S
  create mode 100644 MdeModulePkg/Universal/EbcDxe/Arm/EbcStackTracker.c
  create mode 100644 MdeModulePkg/Universal/EbcDxe/Arm/EbcSupport.c
  create mode 100644 MdeModulePkg/Universal/EbcDxe/EbcStackTracker.c

--
2.9.3.windows.2



             reply	other threads:[~2017-01-24 12:30 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-24 12:30 Pete Batard [this message]
2017-01-26  2:38 ` [PATCH 0/5] MdeModulePkg/EbcDxe: add ARM support Yao, Jiewen
2017-01-26  3:32   ` Andrew Fish
2017-01-26 10:53     ` Pete Batard
2017-01-26 11:37       ` Yao, Jiewen
2017-01-26 12:07         ` Pete Batard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eae42e0e-9311-d411-2d32-0aa87b49a402@akeo.ie \
    --to=devel@edk2.groups.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox