public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
@ 2016-09-22  9:43 Pete Batard
  2016-09-22 10:06 ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Pete Batard @ 2016-09-22  9:43 UTC (permalink / raw)
  To: edk2-devel

Hi,

The following is an updated/fixed version of the patch(es), put forward 
by Ard Biesheuvel on August 9 ([1], [2]), and re-submitted for formal 
inclusion, so that the EDK2 can provide EBC functionality for all of 
IA32, IA64, X64, AARCH64 and ARM at last.

This updated patch now includes the necessary corollary dsc/fdf updates 
as well as fixes to the assembly's EbcLLCALLEXNative, as I found the 
following issues there:
- At least gcc5 didn't seem to like the manually optimized branching for 
all register args ("sub r1, r1, r3, lsr #1"), and one can never be sure 
of the actual size instructions will be assembled into, in case of 
assembler internal alignment/optimization, so I broke it down into 
actual labelled branches. There are only 4 of those anyway.
- For register + stack calls, while 8 x 64 bit registers on AARCH64 do 
equate to #64 bytes that need to be taken off the stack, on ARM the 4 x 
32 bit registers equate to #16 bytes, not #32
- Even after fixing the above, I found some issues with the manual stack 
duplication assembly code, so I switched to using a call to CopyMem(), 
like IA32 does.

With these changes, I believe that the ARM/EBC feature should be fully 
functional, especially as I have heavily tested multiparameter calls 
from EBC into native, using an fasmg-based EBC assembler [3], to confirm 
that they performed just as well with ARM as with AARCH64, IA32 or X64.

Regards,

/Pete

[1] https://patches.linaro.org/patch/73554/
[2] https://patches.linaro.org/patch/73559/
[3] https://github.com/pbatard/fasmg-ebc


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22  9:43 [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support Pete Batard
@ 2016-09-22 10:06 ` Ard Biesheuvel
  2016-09-22 10:13   ` Ard Biesheuvel
  2016-09-22 11:05   ` Pete Batard
  0 siblings, 2 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2016-09-22 10:06 UTC (permalink / raw)
  To: Pete Batard; +Cc: edk2-devel@lists.01.org

On 22 September 2016 at 10:43, Pete Batard <pete@akeo.ie> wrote:
> Hi,
>
> The following is an updated/fixed version of the patch(es), put forward by
> Ard Biesheuvel on August 9 ([1], [2]), and re-submitted for formal
> inclusion, so that the EDK2 can provide EBC functionality for all of IA32,
> IA64, X64, AARCH64 and ARM at last.
>
> This updated patch now includes the necessary corollary dsc/fdf updates as
> well as fixes to the assembly's EbcLLCALLEXNative, as I found the following
> issues there:
> - At least gcc5 didn't seem to like the manually optimized branching for all
> register args ("sub r1, r1, r3, lsr #1"), and one can never be sure of the
> actual size instructions will be assembled into, in case of assembler
> internal alignment/optimization, so I broke it down into actual labelled
> branches. There are only 4 of those anyway.
> - For register + stack calls, while 8 x 64 bit registers on AARCH64 do
> equate to #64 bytes that need to be taken off the stack, on ARM the 4 x 32
> bit registers equate to #16 bytes, not #32
> - Even after fixing the above, I found some issues with the manual stack
> duplication assembly code, so I switched to using a call to CopyMem(), like
> IA32 does.
>
> With these changes, I believe that the ARM/EBC feature should be fully
> functional, especially as I have heavily tested multiparameter calls from
> EBC into native, using an fasmg-based EBC assembler [3], to confirm that
> they performed just as well with ARM as with AARCH64, IA32 or X64.
>

Hello Pete,

Thanks a lot for this contribution. I had spotted (and fixed) some of
the above issues as well.

However, there is a fundamental issue with EBC on ARM that has not
been addressed yet, which makes EBC support problematic:
ARM uses natural alignment for 64-bit types, which means it leaves
gaps in the stack frame, and the thunking code has no way of dealing
with that.

I am pasting my analysis below, which I sent out internally a couple
of weeks ago. In summary, we need language spec and compiler updates
before we can fully support this on 32-bit ARM.

Thanks,
Ard.

--------------------------------------

This compares the EBC argument stack with the argument assignment across
registers and stack expected by the respective Procedure Call Standards
for AArch64 and AArch32.

Since the EBC thunking layer is not aware of the actual prototype signature
of the function that is being called (it does not even know which part of the
stack frame consists of outgoing arguments, and so it needs to assume that the
entire stack frame needs to be copied into arguments and the native stack), the
calls can only execute correctly if the EBC stack frame happens to align all
arguments natively, in which case the AAPCS happens to agree with the EBC
cdecl calling conventions (although the first 8 resp 4 arguments are passed
via registers). In the diagrams below, this is the case if the diagrams line
up horizontally.

In summary, EBC on AArch64 seems to be OK (although more testing is needed),
as long as we don't pass arguments whose size exceeds 64 bits (which the EBC
compiler is unlikely to support anyway)

EBC on AArch32 happens to work as long as no UINT64 values appear as the
return value or as an odd-numbered argument. Since it is impossible to infer
from EBC bytecode whether any such function calls are being performed, the only
way to fix this is to update the EBC spec (and the compiler) to insert hints
into the bytecode when such problematic values occur.

Below is a comparison between the stack frame layouts of various protocol
entry points that are relevant to EBC drivers, i.e., PCI I/O, block I/O and
network I/O)

--------------------------------------------------------------------------------

typedef
EFI_STATUS
(EFIAPI *EFI_BLOCK_READ)(
  IN EFI_BLOCK_IO_PROTOCOL          *This,
  IN UINT32                         MediaId,
  IN EFI_LBA                        Lba,
  IN UINTN                          BufferSize,
  OUT VOID                          *Buffer
  );

Executing on 64-bit (ok)
------------------------

         EBC stack                   AArch64 registers
0x00 +----------------+             +----------------+
     |     This       |          x0 |     This       |
0x08 +----------------+             +----------------+
     |     MediaId    |          x1 |     MediaId    |
0x10 +----------------+             +----------------+
     |      Lba       |          x2 |      Lba       |
0x18 +----------------+             +----------------+
     |   BufferSize   |          x3 |   BufferSize   |
0x20 +----------------+             +----------------+
     |     Buffer     |          x4 |     Buffer     |
0x28 +----------------+             +----------------+
             :                               :
     +----------------+             +----------------+
 R7  |  Return value  |          x0 |  Return value  |
     +----------------+             +----------------+

Executing on 32-bit (ok)
------------------------

          EBC stack                   AArch32 registers + stack
0x00 +----------------+             +----------------+
     |     This       |          r0 |     This       |
0x04 +----------------+             +----------------+
     |     MediaId    |          r1 |     MediaId    |
0x08 +----------------+             +----------------+
     |      Lba       |          r2 |      Lba       |
     |                |          r3 |                |
0x10 +----------------+         0x0 +================+
     |   BufferSize   |             |   BufferSize   |
0x14 +----------------+         0x4 +----------------+
     |     Buffer     |             |     Buffer     |
0x18 +----------------+         0x8 +----------------+
             :                               :
     +----------------+             +----------------+
 R7  |  Return value  |          r0 |  Return value  |
     +----------------+             +----------------+

--------------------------------------------------------------------------------

typedef
EFI_STATUS
(EFIAPI *EFI_PCI_IO_PROTOCOL_ALLOCATE_BUFFER)(
  IN EFI_PCI_IO_PROTOCOL           *This,
  IN  EFI_ALLOCATE_TYPE            Type,
  IN  EFI_MEMORY_TYPE              MemoryType,
  IN  UINTN                        Pages,
  OUT VOID                         **HostAddress,
  IN  UINT64                       Attributes
  );

Executing on 64-bit (ok)
------------------------

         EBC stack                   AArch64 registers
0x00 +----------------+             +----------------+
     |     This       |          x0 |     This       |
0x08 +----------------+             +----------------+
     |     Type       |          x1 |     Type       |
0x10 +----------------+             +----------------+
     |   MemoryType   |          x2 |   MemoryType   |
0x18 +----------------+             +----------------+
     |     Pages      |          x3 |     Pages      |
0x20 +----------------+             +----------------+
     |  HostAddress   |          x4 |  HostAddress   |
0x28 +----------------+             +----------------+
     |   Attributes   |          x5 |   Attributes   |
0x30 +----------------+             +----------------+
             :                               :
     +----------------+             +----------------+
 R7  |  Return value  |          x0 |  Return value  |
     +----------------+             +----------------+


Executing on 32-bit (FAIL)
--------------------------

         EBC stack                   AArch32 registers + stack
0x00 +----------------+             +----------------+
     |     This       |          r0 |     This       |
0x04 +----------------+             +----------------+
     |     Type       |          r1 |     Type       |
0x08 +----------------+             +----------------+
     |   MemoryType   |          r2 |   MemoryType   |
0x0c +----------------+             +----------------+
     |     Pages      |          r3 |     Pages      |
0x10 +----------------+        0x00 +================+
     |  HostAddress   |             |  HostAddress   |
0x14 +----------------+        0x04 +----------------+
     |   Attributes   |             |    <padding>   |  <----------
     |                |        0x08 +----------------+
0x1c +----------------+             |   Attributes   |
             :                      |                |
             :                 0x10 +----------------+
             :                               :
     +----------------+             +----------------+
 R7  |  Return value  |          r0 |  Return value  |
     +----------------+             +----------------+

--------------------------------------------------------------------------------

typedef
EFI_STATUS
(EFIAPI *EFI_SIMPLE_NETWORK_RECEIVE)(
  IN EFI_SIMPLE_NETWORK_PROTOCOL          *This,
  OUT UINTN                               *HeaderSize OPTIONAL,
  IN OUT UINTN                            *BufferSize,
  OUT VOID                                *Buffer,
  OUT EFI_MAC_ADDRESS                     *SrcAddr    OPTIONAL,
  OUT EFI_MAC_ADDRESS                     *DestAddr   OPTIONAL,
  OUT UINT16                              *Protocol   OPTIONAL
  );

Executing on 64-bit (ok)
------------------------

         EBC stack                   AArch64 registers
0x00 +----------------+             +----------------+
     |     This       |          x0 |     This       |
0x08 +----------------+             +----------------+
     |  HeaderSize    |          x1 |  HeaderSize    |
0x10 +----------------+             +----------------+
     |   BufferSize   |          x2 |   BufferSize   |
0x18 +----------------+             +----------------+
     |     Buffer     |          x3 |     Buffer     |
0x20 +----------------+             +----------------+
     |    SrcAddr     |          x4 |    SrcAddr     |
0x28 +----------------+             +----------------+
     |    DestAddr    |          x5 |    DestAddr    |
0x30 +----------------+             +----------------+
     |    Protocol    |          x6 |    Protocol    |
0x38 +----------------+             +----------------+

     +----------------+             +----------------+
 R7  |  Return value  |          x0 |  Return value  |
     +----------------+             +----------------+

Executing on 32-bit (ok)
------------------------

         EBC stack                   AArch32 registers + stack
0x00 +----------------+             +----------------+
     |     This       |          r0 |     This       |
0x04 +----------------+             +----------------+
     |  HeaderSize    |          r1 |  HeaderSize    |
0x08 +----------------+             +----------------+
     |   BufferSize   |          r2 |   BufferSize   |
0x0c +----------------+             +----------------+
     |     Buffer     |          r3 |     Buffer     |
0x10 +----------------+         0x0 +================+
     |    SrcAddr     |             |    SrcAddr     |
0x14 +----------------+         0x4 +----------------+
     |    DestAddr    |             |    DestAddr    |
0x18 +----------------+         0xc +----------------+
     |    Protocol    |             |    Protocol    |
0x1c +----------------+         0x8 +----------------+
             :                               :
     +----------------+             +----------------+
 R7  |  Return value  |          r0 |  Return value  |
     +----------------+             +----------------+

--------------------------------------------------------------------------------

typedef
EFI_STATUS
(EFIAPI *EFI_TIMER_SET_TIMER_PERIOD)(
  IN EFI_TIMER_ARCH_PROTOCOL    *This,
  IN UINT64                     TimerPeriod
  );

Executing on 64-bit (ok)
------------------------

         EBC stack                   AArch64 registers
0x00 +----------------+             +----------------+
     |     This       |          x0 |     This       |
0x08 +----------------+             +----------------+
     |  TimerPeriod   |          x1 |  TimerPeriod   |
0x10 +----------------+             +----------------+
             :                               :
     +----------------+             +----------------+
 R7  |  Return value  |          x0 |  Return value  |
     +----------------+             +----------------+


Executing on 32-bit (FAIL)
--------------------------

         EBC stack                   AArch32 registers
0x00 +----------------+             +----------------+
     |     This       |          r0 |     This       |
0x04 +----------------+             +----------------+
     |   TimerPeriod  |          r1 |    <padding>   |  <----------
     |                |             +----------------+
0x0c +----------------+          r2 |   TimerPeriod  |
             :                   r3 |                |
             :                      +----------------+
             :                               :
     +----------------+             +----------------+
 R7  |  Return value  |          r0 |  Return value  |
     +----------------+             +----------------+


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 10:06 ` Ard Biesheuvel
@ 2016-09-22 10:13   ` Ard Biesheuvel
  2016-09-22 11:05   ` Pete Batard
  1 sibling, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2016-09-22 10:13 UTC (permalink / raw)
  To: Pete Batard; +Cc: edk2-devel@lists.01.org

On 22 September 2016 at 11:06, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 22 September 2016 at 10:43, Pete Batard <pete@akeo.ie> wrote:
>> Hi,
>>
>> The following is an updated/fixed version of the patch(es), put forward by
>> Ard Biesheuvel on August 9 ([1], [2]), and re-submitted for formal
>> inclusion, so that the EDK2 can provide EBC functionality for all of IA32,
>> IA64, X64, AARCH64 and ARM at last.
>>
>> This updated patch now includes the necessary corollary dsc/fdf updates as
>> well as fixes to the assembly's EbcLLCALLEXNative, as I found the following
>> issues there:
>> - At least gcc5 didn't seem to like the manually optimized branching for all
>> register args ("sub r1, r1, r3, lsr #1"), and one can never be sure of the
>> actual size instructions will be assembled into, in case of assembler
>> internal alignment/optimization, so I broke it down into actual labelled
>> branches. There are only 4 of those anyway.
>> - For register + stack calls, while 8 x 64 bit registers on AARCH64 do
>> equate to #64 bytes that need to be taken off the stack, on ARM the 4 x 32
>> bit registers equate to #16 bytes, not #32
>> - Even after fixing the above, I found some issues with the manual stack
>> duplication assembly code, so I switched to using a call to CopyMem(), like
>> IA32 does.
>>
>> With these changes, I believe that the ARM/EBC feature should be fully
>> functional, especially as I have heavily tested multiparameter calls from
>> EBC into native, using an fasmg-based EBC assembler [3], to confirm that
>> they performed just as well with ARM as with AARCH64, IA32 or X64.
>>
>
> Hello Pete,
>
> Thanks a lot for this contribution. I had spotted (and fixed) some of
> the above issues as well.
>
> However, there is a fundamental issue with EBC on ARM that has not
> been addressed yet, which makes EBC support problematic:
> ARM uses natural alignment for 64-bit types, which means it leaves
> gaps in the stack frame, and the thunking code has no way of dealing
> with that.
>
> I am pasting my analysis below, which I sent out internally a couple
> of weeks ago. In summary, we need language spec and compiler updates
> before we can fully support this on 32-bit ARM.
>

BTW, the EDK2 tree has an EBC version of the FAT filesystem driver,
which is what I have been using to test EBC. I have a Frankenstein
version of the 32-bit ARM one (shared below) that deals with the
padding of known protocol methods that contain UINT64 arguments at odd
positions, but it is not pretty, and a clear example why the spec
needs to be updated to accommodate ARM

https://git.linaro.org/people/ard.biesheuvel/uefi-next.git/shortlog/refs/heads/ebc3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 10:06 ` Ard Biesheuvel
  2016-09-22 10:13   ` Ard Biesheuvel
@ 2016-09-22 11:05   ` Pete Batard
  2016-09-22 11:14     ` Ard Biesheuvel
                       ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: Pete Batard @ 2016-09-22 11:05 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: edk2-devel@lists.01.org

On 2016.09.22 11:06, Ard Biesheuvel wrote:
> However, there is a fundamental issue with EBC on ARM that has not
> been addressed yet, which makes EBC support problematic:
> ARM uses natural alignment for 64-bit types, which means it leaves
> gaps in the stack frame, and the thunking code has no way of dealing
> with that.

I was hoping you would comment on this, as I believe the issue is larger 
than Arm (which is why I thought the Arm patch could be integrated), 
since I ran into something similar for X64 and AARCH64, with the 
conversion from stack to register parameters.

Let me start by showing a real-life example of what the current EBC 
implementation does, on different architectures, if you CALLEX from EBC 
into a native function call such as:

VOID MultiParamNative(
     UINT32,
     UINT64,
     UINT64,
     UINT64,
     UINT32,
     UINT32,
     UINT64
);

with values:

     0x1C1C1C1C,
     0x2B2B2B2B2A2A2A2A,
     0x3B3B3B3B3A3A3A3A,
     0x4B4B4B4B4A4A4A4A,
     0x5C5C5C5C,
     0x6C6C6C6C,
     0x7B7B7B7B7A7A7A7A

If you do that, then the parameter values seen by each Arch will be as 
follows:

IA32:
   p1 = 0x1C1C1C1C
   p2 = 0x2B2B2B2B2A2A2A2A
   p3 = 0x3B3B3B3B3A3A3A3A
   p4 = 0x4B4B4B4B4A4A4A4A
   p5 = 0x5C5C5C5C
   p6 = 0x6C6C6C6C
   p7 = 0x7B7B7B7B7A7A7A7A

X64:
   p1 = 0x1C1C1C1C
   p2 = 0x3A3A3A3A2B2B2B2B
   p3 = 0x4A4A4A4A3B3B3B3B
   p4 = 0x5C5C5C5C4B4B4B4B
   p5 = 0x6C6C6C6C
   p6 = 0x7B7B7B7B
   p7 = 0x06F23E4012345678

ARM:
   p1 = 0x1C1C1C1C
   p2 = 0x3A3A3A3A2B2B2B2B
   p3 = 0x4A4A4A4A3B3B3B3B
   p4 = 0x5C5C5C5C4B4B4B4B
   p5 = 0x6C6C6C6C
   p6 = 0x7A7A7A7A
   p7 = 0x446EEC467B7B7B7B

AA64:
   p1 = 0x1C1C1C1C
   p2 = 0x3A3A3A3A2B2B2B2B
   p3 = 0x4A4A4A4A3B3B3B3B
   p4 = 0x5C5C5C5C4B4B4B4B
   p5 = 0x6C6C6C6C
   p6 = 0x7B7B7B7B
   p7 = 0x00000000FFFFFF91

Note that these are real-life results gotten from a native set of 
drivers [1] + EBC sample [2], specifically designed to test the above.

So, as you can see, only IA32 currently retrieves the parameters with 
their expected values. All the other platforms, and not just Arm, have 
an issue with parameter retrieval.

I too performed some analysis [3], to understand the problem, the result 
of which can be summarized as follows:

Let's say you have native protocol function:

   ProtocolCall(UINT32, UINT64, UINT64)

to which you want to pass values:

   (0x1C1C1C1C, 0x2B2B2B2B2A2A2A2A, 0x3B3B3B3B3A3A3A3A)

With the EBC VM, the parameters then get stacked as (little endian, 
CDECL and using 32-bit longwords to represent the stack):

    +--------+
    |1C1C1C1C|
    +--------+
    |2A2A2A2A|
    +--------+
    |2B2B2B2B|
    +--------+
    |3A3A3A3A|
    +--------+
    |3B3B3B3B|
    +--------+
    +????????+
    +--------+

Now, if you are calling into an x86_32 arch, this is no issue, as the 
native call reads the parameters off the stack, and finds each one it 
its expected location.

But if, say, you are calling into native Arm, then the calling 
convention dictates that the first four 32 bits parameters must be 
placed into Arm registers r0-r3, rather than on the stack, and what's 
more, that if there exist 64 bit parameters among the register ones, 
they must start with an even register (r0 or r2).

What this means is that, with the current EBC handling, which simply 
maps the top of the stack onto registers for native CALLEX (as the VM 
cannot guess the parameter signature of the function it is calling into, 
and therefore will not do anything else but a blind mapping of the stack 
onto registers), the native Arm function effectively gets called with 
the following parameter mapping:

    +--------+
    |1C1C1C1C|  -> r0 (32-bit first parameter)
    +--------+
    |2A2A2A2A|  -> (r1/unused, since first parameter is 32-bit)
    +--------+
    |2B2B2B2B|  -> r2 (lower half of 64-bit second parameter)
    +--------+
    |3A3A3A3A|  -> r3 (upper half of 64-bit second parameter)
    +--------+
    |3B3B3B3B|  -> lower half of 64-bit third parameter (stack)
    +--------+
    +????????+  -> upper half of 64-bit third parameter (stack)
    +--------+

The end result is that, the Arm call ends up with these values:

   (0x1C1C1C1C, 0x3A3A3A3A2B2B2B2B, 0x????????3B3B3B3B)

However, while we used Arm for this example, this is not an Arm specific 
issue, as x86_64 and Arm64 also expect any of the first eight parameters 
to a native call, that are smaller than 64-bit, to get passed as a 
64-bit register, which means they too have the same issue as the one 
illustrated above.

Now, I'm not sure what the solution to that issue would be. I tend to 
agree that, short of including a parameter signature for function calls, 
this function argument marshalling issue between EBC and native will be 
difficult to solve. A possible half-workaround I thought of could be to 
keep track of all the PUSHes having been carried out before a CALLEX, 
and *assume* (or mandate in the specs) that all the arguments were 
pushed individually and that the size of the PUSH matches the desired 
size for a register argument, but even that would still add a lot of 
complexity and be prone to breakage...

The other solution of course is to ask EBC code developers to be aware 
of and compensate for the calling convention issue, and pad the stack as 
required depending on the ISA they are calling into, which is how I made 
my protocol.asm sample work [4], but this is still rather inconvenient, 
especially if not coding in EBC assembly, and defeats the point of 
trying to write Arch independent code.

Considering this, I'm not entirely convinced ARM EBC integration should 
be held back, as the problem we are faced with is part of a larger issue 
that we'll need to resolve for all archs (except IA32), and not just ARM...

Regards,

/Pete

[1] https://github.com/pbatard/fasmg-ebc/tree/master/Protocol
[2] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm
[3] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm#L113-L177
[4] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm#L211-L219



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 11:05   ` Pete Batard
@ 2016-09-22 11:14     ` Ard Biesheuvel
  2016-09-22 11:26       ` Pete Batard
  2016-09-22 20:27     ` Andrew Fish
  2016-09-22 21:24     ` Andrew Fish
  2 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2016-09-22 11:14 UTC (permalink / raw)
  To: Pete Batard; +Cc: edk2-devel@lists.01.org

On 22 September 2016 at 12:05, Pete Batard <pete@akeo.ie> wrote:
> On 2016.09.22 11:06, Ard Biesheuvel wrote:
>>
>> However, there is a fundamental issue with EBC on ARM that has not
>> been addressed yet, which makes EBC support problematic:
>> ARM uses natural alignment for 64-bit types, which means it leaves
>> gaps in the stack frame, and the thunking code has no way of dealing
>> with that.
>
>
> I was hoping you would comment on this, as I believe the issue is larger
> than Arm (which is why I thought the Arm patch could be integrated), since I
> ran into something similar for X64 and AARCH64, with the conversion from
> stack to register parameters.
>
> Let me start by showing a real-life example of what the current EBC
> implementation does, on different architectures, if you CALLEX from EBC into
> a native function call such as:
>
> VOID MultiParamNative(
>     UINT32,
>     UINT64,
>     UINT64,
>     UINT64,
>     UINT32,
>     UINT32,
>     UINT64
> );
>
> with values:
>
>     0x1C1C1C1C,
>     0x2B2B2B2B2A2A2A2A,
>     0x3B3B3B3B3A3A3A3A,
>     0x4B4B4B4B4A4A4A4A,
>     0x5C5C5C5C,
>     0x6C6C6C6C,
>     0x7B7B7B7B7A7A7A7A
>
> If you do that, then the parameter values seen by each Arch will be as
> follows:
>
> IA32:
>   p1 = 0x1C1C1C1C
>   p2 = 0x2B2B2B2B2A2A2A2A
>   p3 = 0x3B3B3B3B3A3A3A3A
>   p4 = 0x4B4B4B4B4A4A4A4A
>   p5 = 0x5C5C5C5C
>   p6 = 0x6C6C6C6C
>   p7 = 0x7B7B7B7B7A7A7A7A
>
> X64:
>   p1 = 0x1C1C1C1C
>   p2 = 0x3A3A3A3A2B2B2B2B
>   p3 = 0x4A4A4A4A3B3B3B3B
>   p4 = 0x5C5C5C5C4B4B4B4B
>   p5 = 0x6C6C6C6C
>   p6 = 0x7B7B7B7B
>   p7 = 0x06F23E4012345678
>
> ARM:
>   p1 = 0x1C1C1C1C
>   p2 = 0x3A3A3A3A2B2B2B2B
>   p3 = 0x4A4A4A4A3B3B3B3B
>   p4 = 0x5C5C5C5C4B4B4B4B
>   p5 = 0x6C6C6C6C
>   p6 = 0x7A7A7A7A
>   p7 = 0x446EEC467B7B7B7B
>
> AA64:
>   p1 = 0x1C1C1C1C
>   p2 = 0x3A3A3A3A2B2B2B2B
>   p3 = 0x4A4A4A4A3B3B3B3B
>   p4 = 0x5C5C5C5C4B4B4B4B
>   p5 = 0x6C6C6C6C
>   p6 = 0x7B7B7B7B
>   p7 = 0x00000000FFFFFF91
>
> Note that these are real-life results gotten from a native set of drivers
> [1] + EBC sample [2], specifically designed to test the above.
>
> So, as you can see, only IA32 currently retrieves the parameters with their
> expected values. All the other platforms, and not just Arm, have an issue
> with parameter retrieval.
>
> I too performed some analysis [3], to understand the problem, the result of
> which can be summarized as follows:
>
> Let's say you have native protocol function:
>
>   ProtocolCall(UINT32, UINT64, UINT64)
>
> to which you want to pass values:
>
>   (0x1C1C1C1C, 0x2B2B2B2B2A2A2A2A, 0x3B3B3B3B3A3A3A3A)
>
> With the EBC VM, the parameters then get stacked as (little endian, CDECL
> and using 32-bit longwords to represent the stack):
>
>    +--------+
>    |1C1C1C1C|
>    +--------+
>    |2A2A2A2A|
>    +--------+
>    |2B2B2B2B|
>    +--------+
>    |3A3A3A3A|
>    +--------+
>    |3B3B3B3B|
>    +--------+
>    +????????+
>    +--------+
>
> Now, if you are calling into an x86_32 arch, this is no issue, as the native
> call reads the parameters off the stack, and finds each one it its expected
> location.
>
> But if, say, you are calling into native Arm, then the calling convention
> dictates that the first four 32 bits parameters must be placed into Arm
> registers r0-r3, rather than on the stack, and what's more, that if there
> exist 64 bit parameters among the register ones, they must start with an
> even register (r0 or r2).
>
> What this means is that, with the current EBC handling, which simply maps
> the top of the stack onto registers for native CALLEX (as the VM cannot
> guess the parameter signature of the function it is calling into, and
> therefore will not do anything else but a blind mapping of the stack onto
> registers), the native Arm function effectively gets called with the
> following parameter mapping:
>
>    +--------+
>    |1C1C1C1C|  -> r0 (32-bit first parameter)
>    +--------+
>    |2A2A2A2A|  -> (r1/unused, since first parameter is 32-bit)
>    +--------+
>    |2B2B2B2B|  -> r2 (lower half of 64-bit second parameter)
>    +--------+
>    |3A3A3A3A|  -> r3 (upper half of 64-bit second parameter)
>    +--------+
>    |3B3B3B3B|  -> lower half of 64-bit third parameter (stack)
>    +--------+
>    +????????+  -> upper half of 64-bit third parameter (stack)
>    +--------+
>
> The end result is that, the Arm call ends up with these values:
>
>   (0x1C1C1C1C, 0x3A3A3A3A2B2B2B2B, 0x????????3B3B3B3B)
>
> However, while we used Arm for this example, this is not an Arm specific
> issue, as x86_64 and Arm64 also expect any of the first eight parameters to
> a native call, that are smaller than 64-bit, to get passed as a 64-bit
> register, which means they too have the same issue as the one illustrated
> above.
>
> Now, I'm not sure what the solution to that issue would be. I tend to agree
> that, short of including a parameter signature for function calls, this
> function argument marshalling issue between EBC and native will be difficult
> to solve. A possible half-workaround I thought of could be to keep track of
> all the PUSHes having been carried out before a CALLEX, and *assume* (or
> mandate in the specs) that all the arguments were pushed individually and
> that the size of the PUSH matches the desired size for a register argument,
> but even that would still add a lot of complexity and be prone to
> breakage...
>
> The other solution of course is to ask EBC code developers to be aware of
> and compensate for the calling convention issue, and pad the stack as
> required depending on the ISA they are calling into, which is how I made my
> protocol.asm sample work [4], but this is still rather inconvenient,
> especially if not coding in EBC assembly, and defeats the point of trying to
> write Arch independent code.
>
> Considering this, I'm not entirely convinced ARM EBC integration should be
> held back, as the problem we are faced with is part of a larger issue that
> we'll need to resolve for all archs (except IA32), and not just ARM...
>

Hello Peter,

For X64 and AARCH64, the issue does not exist because the EBC spec
mandates that all function arguments are widened to the native word
size. So when executing on a 64-bit architecture, the EBC stack looks
differently from what you describe above, and maps seamlessly onto the
register assignment mandated by the respective calling conventions.

-- 
Ard.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 11:14     ` Ard Biesheuvel
@ 2016-09-22 11:26       ` Pete Batard
  2016-09-22 11:40         ` Ard Biesheuvel
  0 siblings, 1 reply; 12+ messages in thread
From: Pete Batard @ 2016-09-22 11:26 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: edk2-devel@lists.01.org

On 2016.09.22 12:14, Ard Biesheuvel wrote:
> For X64 and AARCH64, the issue does not exist because the EBC spec
> mandates that all function arguments are widened to the native word
> size. So when executing on a 64-bit architecture, the EBC stack looks
> differently from what you describe above, and maps seamlessly onto the
> register assignment mandated by the respective calling conventions.

Ah, I see that you're right, and that I was trying to solve an issue 
that shouldn't exist:

 From UEFI 2.6, paragraph 21.9.3:

"32-bit integers are pushed as natural size (since they
should be passed as 64-bit parameter values on 64-bit machines)."

I must admit I was a bit curious as to why this problem wouldn't have 
been picked before.


So that leaves only the issue you mentioned. But then I'm not too 
hopeful with the timeframe for Arm/EBC integration when you say "we need 
language spec and compiler updates before we can fully support this"...

Do you know if work has been started on this? Or are we just going to 
consider that this is too troublesome a problem to fix?

Regards,

/Pete


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 11:26       ` Pete Batard
@ 2016-09-22 11:40         ` Ard Biesheuvel
  0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2016-09-22 11:40 UTC (permalink / raw)
  To: Pete Batard; +Cc: edk2-devel@lists.01.org

On 22 September 2016 at 12:26, Pete Batard <pete@akeo.ie> wrote:
> On 2016.09.22 12:14, Ard Biesheuvel wrote:
>>
>> For X64 and AARCH64, the issue does not exist because the EBC spec
>> mandates that all function arguments are widened to the native word
>> size. So when executing on a 64-bit architecture, the EBC stack looks
>> differently from what you describe above, and maps seamlessly onto the
>> register assignment mandated by the respective calling conventions.
>
>
> Ah, I see that you're right, and that I was trying to solve an issue that
> shouldn't exist:
>
> From UEFI 2.6, paragraph 21.9.3:
>
> "32-bit integers are pushed as natural size (since they
> should be passed as 64-bit parameter values on 64-bit machines)."
>
> I must admit I was a bit curious as to why this problem wouldn't have been
> picked before.
>
>
> So that leaves only the issue you mentioned. But then I'm not too hopeful
> with the timeframe for Arm/EBC integration when you say "we need language
> spec and compiler updates before we can fully support this"...
>
> Do you know if work has been started on this? Or are we just going to
> consider that this is too troublesome a problem to fix?
>

We are debating this internally. Even on AArch64, there are numerous
issues with EBC drivers, even if the code itself executes fine (this
is mainly related to PCI drivers that don't bother to enable support
for 64-bit DMA since this is never needed on Intel platforms)

So while EBC is of high importance to many Linaro members, I am not
sure if that includes 32-bit ARM support.

-- 
Ard.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 11:05   ` Pete Batard
  2016-09-22 11:14     ` Ard Biesheuvel
@ 2016-09-22 20:27     ` Andrew Fish
  2016-09-22 21:21       ` Ard Biesheuvel
  2016-09-22 21:40       ` Pete Batard
  2016-09-22 21:24     ` Andrew Fish
  2 siblings, 2 replies; 12+ messages in thread
From: Andrew Fish @ 2016-09-22 20:27 UTC (permalink / raw)
  To: Pete Batard; +Cc: Ard Biesheuvel, edk2-devel@lists.01.org


> On Sep 22, 2016, at 4:05 AM, Pete Batard <pete@akeo.ie> wrote:
> 
> On 2016.09.22 11:06, Ard Biesheuvel wrote:
>> However, there is a fundamental issue with EBC on ARM that has not
>> been addressed yet, which makes EBC support problematic:
>> ARM uses natural alignment for 64-bit types, which means it leaves
>> gaps in the stack frame, and the thunking code has no way of dealing
>> with that.
> 
> I was hoping you would comment on this, as I believe the issue is larger than Arm (which is why I thought the Arm patch could be integrated), since I ran into something similar for X64 and AARCH64, with the conversion from stack to register parameters.
> 
> Let me start by showing a real-life example of what the current EBC implementation does, on different architectures, if you CALLEX from EBC into a native function call such as:
> 
> VOID MultiParamNative(
>    UINT32,
>    UINT64,
>    UINT64,
>    UINT64,
>    UINT32,
>    UINT32,
>    UINT64
> );
> 
> with values:
> 
>    0x1C1C1C1C,
>    0x2B2B2B2B2A2A2A2A,
>    0x3B3B3B3B3A3A3A3A,
>    0x4B4B4B4B4A4A4A4A,
>    0x5C5C5C5C,
>    0x6C6C6C6C,
>    0x7B7B7B7B7A7A7A7A
> 
> If you do that, then the parameter values seen by each Arch will be as follows:
> 
> IA32:
>  p1 = 0x1C1C1C1C
>  p2 = 0x2B2B2B2B2A2A2A2A
>  p3 = 0x3B3B3B3B3A3A3A3A
>  p4 = 0x4B4B4B4B4A4A4A4A
>  p5 = 0x5C5C5C5C
>  p6 = 0x6C6C6C6C
>  p7 = 0x7B7B7B7B7A7A7A7A
> 
> X64:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7B7B7B7B
>  p7 = 0x06F23E4012345678
> 
> ARM:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7A7A7A7A
>  p7 = 0x446EEC467B7B7B7B
> 
> AA64:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7B7B7B7B
>  p7 = 0x00000000FFFFFF91
> 
> Note that these are real-life results gotten from a native set of drivers [1] + EBC sample [2], specifically designed to test the above.
> 
> So, as you can see, only IA32 currently retrieves the parameters with their expected values. All the other platforms, and not just Arm, have an issue with parameter retrieval.
> 
> I too performed some analysis [3], to understand the problem, the result of which can be summarized as follows:
> 
> Let's say you have native protocol function:
> 
>  ProtocolCall(UINT32, UINT64, UINT64)
> 
> to which you want to pass values:
> 
>  (0x1C1C1C1C, 0x2B2B2B2B2A2A2A2A, 0x3B3B3B3B3A3A3A3A)
> 
> With the EBC VM, the parameters then get stacked as (little endian, CDECL and using 32-bit longwords to represent the stack):
> 
>   +--------+
>   |1C1C1C1C|
>   +--------+
>   |2A2A2A2A|
>   +--------+
>   |2B2B2B2B|
>   +--------+
>   |3A3A3A3A|
>   +--------+
>   |3B3B3B3B|
>   +--------+
>   +????????+
>   +--------+
> 
> Now, if you are calling into an x86_32 arch, this is no issue, as the native call reads the parameters off the stack, and finds each one it its expected location.
> 
> But if, say, you are calling into native Arm, then the calling convention dictates that the first four 32 bits parameters must be placed into Arm registers r0-r3, rather than on the stack, and what's more, that if there exist 64 bit parameters among the register ones, they must start with an even register (r0 or r2).
> 
> What this means is that, with the current EBC handling, which simply maps the top of the stack onto registers for native CALLEX (as the VM cannot guess the parameter signature of the function it is calling into, and therefore will not do anything else but a blind mapping of the stack onto registers), the native Arm function effectively gets called with the following parameter mapping:
> 
>   +--------+
>   |1C1C1C1C|  -> r0 (32-bit first parameter)
>   +--------+
>   |2A2A2A2A|  -> (r1/unused, since first parameter is 32-bit)
>   +--------+
>   |2B2B2B2B|  -> r2 (lower half of 64-bit second parameter)
>   +--------+
>   |3A3A3A3A|  -> r3 (upper half of 64-bit second parameter)
>   +--------+
>   |3B3B3B3B|  -> lower half of 64-bit third parameter (stack)
>   +--------+
>   +????????+  -> upper half of 64-bit third parameter (stack)
>   +--------+
> 
> The end result is that, the Arm call ends up with these values:
> 
>  (0x1C1C1C1C, 0x3A3A3A3A2B2B2B2B, 0x????????3B3B3B3B)
> 
> However, while we used Arm for this example, this is not an Arm specific issue, as x86_64 and Arm64 also expect any of the first eight parameters to a native call, that are smaller than 64-bit, to get passed as a 64-bit register, which means they too have the same issue as the one illustrated above.
> 
> Now, I'm not sure what the solution to that issue would be. I tend to agree that, short of including a parameter signature for function calls, this function argument marshalling issue between EBC and native will be difficult to solve. A possible half-workaround I thought of could be to keep track of all the PUSHes having been carried out before a CALLEX, and *assume* (or mandate in the specs) that all the arguments were pushed individually and that the size of the PUSH matches the desired size for a register argument, but even that would still add a lot of complexity and be prone to breakage...
> 

It seems like tracking the PUSHes would work? We could update the spec to require this behavior. But it brings up another issue in that you don't know how many PUSHes to go back? What is a calling convention PUSH vs. a state save push? Maybe we could also add a max number of args that are supported? 

Thanks,

Andrew Fish

PS I'm guessing that from a clang/LLVM point of view enforcing the push rules in the code gen should not be hard if the bit code gets optimized and then is converted to EBC. As the bit code for the external code call will contain all the info EBC needs to know about. For example:
%7 = call i32 @MultiParamNative(i32 %4, i64 %5, i32 %6)

Example:
~/work/Compiler/llvm>cat callit.c
typedef unsigned int UINT32;
typedef unsigned long long UINT64;

int MultiParamNative(UINT32, UINT64, UINT32);

int Example (UINT32 A, UINT64 B, UINT32 C)
{
  return MultiParamNative (A, B, C);
}
~/work/Compiler/llvm>clang -S -emit-llvm  callit.c 
~/work/Compiler/llvm>cat callit.ll
; ModuleID = 'callit.c'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.12.0"

; Function Attrs: nounwind ssp uwtable
define i32 @Example(i32 %A, i64 %B, i32 %C) #0 {
  %1 = alloca i32, align 4
  %2 = alloca i64, align 8
  %3 = alloca i32, align 4
  store i32 %A, i32* %1, align 4
  store i64 %B, i64* %2, align 8
  store i32 %C, i32* %3, align 4
  %4 = load i32* %1, align 4
  %5 = load i64* %2, align 8
  %6 = load i32* %3, align 4
  %7 = call i32 @MultiParamNative(i32 %4, i64 %5, i32 %6)
  ret i32 %7
}

declare i32 @MultiParamNative(i32, i64, i32) #1

attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)"}


> The other solution of course is to ask EBC code developers to be aware of and compensate for the calling convention issue, and pad the stack as required depending on the ISA they are calling into, which is how I made my protocol.asm sample work [4], but this is still rather inconvenient, especially if not coding in EBC assembly, and defeats the point of trying to write Arch independent code.
> 
> Considering this, I'm not entirely convinced ARM EBC integration should be held back, as the problem we are faced with is part of a larger issue that we'll need to resolve for all archs (except IA32), and not just ARM...
> 
> Regards,
> 
> /Pete
> 
> [1] https://github.com/pbatard/fasmg-ebc/tree/master/Protocol
> [2] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm
> [3] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm#L113-L177
> [4] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm#L211-L219
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 20:27     ` Andrew Fish
@ 2016-09-22 21:21       ` Ard Biesheuvel
  2016-09-22 21:40       ` Pete Batard
  1 sibling, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2016-09-22 21:21 UTC (permalink / raw)
  To: Andrew Fish; +Cc: Pete Batard, edk2-devel@lists.01.org

On 22 September 2016 at 21:27, Andrew Fish <afish@apple.com> wrote:
>
>> On Sep 22, 2016, at 4:05 AM, Pete Batard <pete@akeo.ie> wrote:
>>
>> On 2016.09.22 11:06, Ard Biesheuvel wrote:
>>>
>>> However, there is a fundamental issue with EBC on ARM that has not
>>> been addressed yet, which makes EBC support problematic:
>>> ARM uses natural alignment for 64-bit types, which means it leaves
>>> gaps in the stack frame, and the thunking code has no way of dealing
>>> with that.
[...]
>> Now, I'm not sure what the solution to that issue would be. I tend to agree
>> that, short of including a parameter signature for function calls, this
>> function argument marshalling issue between EBC and native will be difficult
>> to solve. A possible half-workaround I thought of could be to keep track of
>> all the PUSHes having been carried out before a CALLEX, and *assume* (or
>> mandate in the specs) that all the arguments were pushed individually and
>> that the size of the PUSH matches the desired size for a register argument,
>> but even that would still add a lot of complexity and be prone to
>> breakage...
>
> It seems like tracking the PUSHes would work? We could update the spec to
> require this behavior. But it brings up another issue in that you don't know
> how many PUSHes to go back? What is a calling convention PUSH vs. a state
> save push? Maybe we could also add a max number of args that are supported?
>

Forcing the compiler to emit every function call as a series of
in-order argument pushes may be overly restrictive, and only solves
half of the problem. Native to EBC calls suffer from the same issue,
and so we have to deal with this in the thunking layer.

The EBC code already issues BRK instructions (IIRC) to call into the
interpreter to generate the native to EBC thunks, so I think we should
extend that mechanism to publish prototype annotations for all entry
points subject to thunking (in either direction). The existing
implementations for x86 could ignore these entirely, or perhaps use
them to trim the size of the native to EBC stack copy (or vice versa).
ARM would require these annotations (or perhaps only for prototypes
where UINT64 arguments appear in odd positions)

-- 
Ard.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 11:05   ` Pete Batard
  2016-09-22 11:14     ` Ard Biesheuvel
  2016-09-22 20:27     ` Andrew Fish
@ 2016-09-22 21:24     ` Andrew Fish
  2016-09-22 21:29       ` Ard Biesheuvel
  2 siblings, 1 reply; 12+ messages in thread
From: Andrew Fish @ 2016-09-22 21:24 UTC (permalink / raw)
  To: Pete Batard; +Cc: Ard Biesheuvel, edk2-devel@lists.01.org


> On Sep 22, 2016, at 4:05 AM, Pete Batard <pete@akeo.ie> wrote:
> 
> On 2016.09.22 11:06, Ard Biesheuvel wrote:
>> However, there is a fundamental issue with EBC on ARM that has not
>> been addressed yet, which makes EBC support problematic:
>> ARM uses natural alignment for 64-bit types, which means it leaves
>> gaps in the stack frame, and the thunking code has no way of dealing
>> with that.
> 
> I was hoping you would comment on this, as I believe the issue is larger than Arm (which is why I thought the Arm patch could be integrated), since I ran into something similar for X64 and AARCH64, with the conversion from stack to register parameters.
> 
> Let me start by showing a real-life example of what the current EBC implementation does, on different architectures, if you CALLEX from EBC into a native function call such as:
> 
> VOID MultiParamNative(
>    UINT32,
>    UINT64,
>    UINT64,
>    UINT64,
>    UINT32,
>    UINT32,
>    UINT64
> );
> 

Pete,

Stupid question. Does adding EFIAPI fix this issue? So:

VOID
EFIAPI
MultiParammNative(
   UINT32,
   UINT64,
   UINT64,
   UINT64,
   UINT32,
   UINT32,
   UINT64
   );

I'm sitting next to Mike Kinney at the UEFI Forum Plug Fest and we were wondering if the calling convention stuff was mucked up on C side. 

Thanks,

Andrew Fish


> with values:
> 
>    0x1C1C1C1C,
>    0x2B2B2B2B2A2A2A2A,
>    0x3B3B3B3B3A3A3A3A,
>    0x4B4B4B4B4A4A4A4A,
>    0x5C5C5C5C,
>    0x6C6C6C6C,
>    0x7B7B7B7B7A7A7A7A
> 
> If you do that, then the parameter values seen by each Arch will be as follows:
> 
> IA32:
>  p1 = 0x1C1C1C1C
>  p2 = 0x2B2B2B2B2A2A2A2A
>  p3 = 0x3B3B3B3B3A3A3A3A
>  p4 = 0x4B4B4B4B4A4A4A4A
>  p5 = 0x5C5C5C5C
>  p6 = 0x6C6C6C6C
>  p7 = 0x7B7B7B7B7A7A7A7A
> 
> X64:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7B7B7B7B
>  p7 = 0x06F23E4012345678
> 
> ARM:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7A7A7A7A
>  p7 = 0x446EEC467B7B7B7B
> 
> AA64:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7B7B7B7B
>  p7 = 0x00000000FFFFFF91
> 
> Note that these are real-life results gotten from a native set of drivers [1] + EBC sample [2], specifically designed to test the above.
> 
> So, as you can see, only IA32 currently retrieves the parameters with their expected values. All the other platforms, and not just Arm, have an issue with parameter retrieval.
> 
> I too performed some analysis [3], to understand the problem, the result of which can be summarized as follows:
> 
> Let's say you have native protocol function:
> 
>  ProtocolCall(UINT32, UINT64, UINT64)
> 
> to which you want to pass values:
> 
>  (0x1C1C1C1C, 0x2B2B2B2B2A2A2A2A, 0x3B3B3B3B3A3A3A3A)
> 
> With the EBC VM, the parameters then get stacked as (little endian, CDECL and using 32-bit longwords to represent the stack):
> 
>   +--------+
>   |1C1C1C1C|
>   +--------+
>   |2A2A2A2A|
>   +--------+
>   |2B2B2B2B|
>   +--------+
>   |3A3A3A3A|
>   +--------+
>   |3B3B3B3B|
>   +--------+
>   +????????+
>   +--------+
> 
> Now, if you are calling into an x86_32 arch, this is no issue, as the native call reads the parameters off the stack, and finds each one it its expected location.
> 
> But if, say, you are calling into native Arm, then the calling convention dictates that the first four 32 bits parameters must be placed into Arm registers r0-r3, rather than on the stack, and what's more, that if there exist 64 bit parameters among the register ones, they must start with an even register (r0 or r2).
> 
> What this means is that, with the current EBC handling, which simply maps the top of the stack onto registers for native CALLEX (as the VM cannot guess the parameter signature of the function it is calling into, and therefore will not do anything else but a blind mapping of the stack onto registers), the native Arm function effectively gets called with the following parameter mapping:
> 
>   +--------+
>   |1C1C1C1C|  -> r0 (32-bit first parameter)
>   +--------+
>   |2A2A2A2A|  -> (r1/unused, since first parameter is 32-bit)
>   +--------+
>   |2B2B2B2B|  -> r2 (lower half of 64-bit second parameter)
>   +--------+
>   |3A3A3A3A|  -> r3 (upper half of 64-bit second parameter)
>   +--------+
>   |3B3B3B3B|  -> lower half of 64-bit third parameter (stack)
>   +--------+
>   +????????+  -> upper half of 64-bit third parameter (stack)
>   +--------+
> 
> The end result is that, the Arm call ends up with these values:
> 
>  (0x1C1C1C1C, 0x3A3A3A3A2B2B2B2B, 0x????????3B3B3B3B)
> 
> However, while we used Arm for this example, this is not an Arm specific issue, as x86_64 and Arm64 also expect any of the first eight parameters to a native call, that are smaller than 64-bit, to get passed as a 64-bit register, which means they too have the same issue as the one illustrated above.
> 
> Now, I'm not sure what the solution to that issue would be. I tend to agree that, short of including a parameter signature for function calls, this function argument marshalling issue between EBC and native will be difficult to solve. A possible half-workaround I thought of could be to keep track of all the PUSHes having been carried out before a CALLEX, and *assume* (or mandate in the specs) that all the arguments were pushed individually and that the size of the PUSH matches the desired size for a register argument, but even that would still add a lot of complexity and be prone to breakage...
> 
> The other solution of course is to ask EBC code developers to be aware of and compensate for the calling convention issue, and pad the stack as required depending on the ISA they are calling into, which is how I made my protocol.asm sample work [4], but this is still rather inconvenient, especially if not coding in EBC assembly, and defeats the point of trying to write Arch independent code.
> 
> Considering this, I'm not entirely convinced ARM EBC integration should be held back, as the problem we are faced with is part of a larger issue that we'll need to resolve for all archs (except IA32), and not just ARM...
> 
> Regards,
> 
> /Pete
> 
> [1] https://github.com/pbatard/fasmg-ebc/tree/master/Protocol
> [2] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm
> [3] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm#L113-L177
> [4] https://github.com/pbatard/fasmg-ebc/blob/master/protocol.asm#L211-L219
> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 21:24     ` Andrew Fish
@ 2016-09-22 21:29       ` Ard Biesheuvel
  0 siblings, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2016-09-22 21:29 UTC (permalink / raw)
  To: Andrew Fish; +Cc: Pete Batard, edk2-devel@lists.01.org

On 22 September 2016 at 22:24, Andrew Fish <afish@apple.com> wrote:
>
[..]
> Pete,
>
> Stupid question. Does adding EFIAPI fix this issue? So:
>
> VOID
> EFIAPI
> MultiParammNative(
>    UINT32,
>    UINT64,
>    UINT64,
>    UINT64,
>    UINT32,
>    UINT32,
>    UINT64
>    );
>
> I'm sitting next to Mike Kinney at the UEFI Forum Plug Fest and we were
> wondering if the calling convention stuff was mucked up on C side.
>

Pete already confirmed that his analysis regarding X64 and AARCH64 was
incorrect. The only architecture that is affected by this issue is
ARM, and purely due to the fact that its calling convention leaves
'holes' in the register/stack slot assignment to ensure UINT64 values
appear naturally aligned (i.e., the protoype above would not use r1,
but pass the second argument in r2 and r3)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support
  2016-09-22 20:27     ` Andrew Fish
  2016-09-22 21:21       ` Ard Biesheuvel
@ 2016-09-22 21:40       ` Pete Batard
  1 sibling, 0 replies; 12+ messages in thread
From: Pete Batard @ 2016-09-22 21:40 UTC (permalink / raw)
  To: Andrew Fish; +Cc: Ard Biesheuvel, edk2-devel@lists.01.org

Hi Andrew,

On 2016.09.22 21:27, Andrew Fish wrote:
> It seems like tracking the PUSHes would work?

First of all, as got pointed out by Ard, I need to mention that much of 
my earlier analysis results, which you quoted, were due to a misread of 
the specs, as it does mandates pushing 32 bit parameters as native.
So the output I gave for AARCH64 and X64 was erroneous, since it was a 
result of using PUSH32, and the problematic output disappears when using 
PUSHn.

The only issue therefore, is with 32 bit ARM.

> We could update the spec to require this behavior.

If the idea is to add the tracking when processing POP/PUSH on Arm 
(which is how I originally saw it), please note that we may also have to 
track stack manipulation with MOV against R0, the EBC stack pointer, if 
someone issues something like 'MOV R0, R0(-1,0)' (equivalent to PUSHn) 
or 'MOV R0, R0(0,-8)' (equivalent to PUSH64) for some 32 or 64 bit 
optional parameter...

However, this brings us back to our original issue if there exists a 
succession of optional 32 and 64 bit parameters, which the application 
doesn't care about filling, and that may be handled with a single 'MOV 
R0, R0(-1,-8)' in the assembly. If we are faced with this in our 
tracking, then we still have no way of knowing which of the 32 or 64 bit 
parameter comes first.

So, notwithstanding other issues, we'd have to mandate something very 
specific against doing this in the UEFI/EBC specs (such as only using 
individual PUSHes)...

Regards,

/Pete





^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-09-22 21:40 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-22  9:43 [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support Pete Batard
2016-09-22 10:06 ` Ard Biesheuvel
2016-09-22 10:13   ` Ard Biesheuvel
2016-09-22 11:05   ` Pete Batard
2016-09-22 11:14     ` Ard Biesheuvel
2016-09-22 11:26       ` Pete Batard
2016-09-22 11:40         ` Ard Biesheuvel
2016-09-22 20:27     ` Andrew Fish
2016-09-22 21:21       ` Ard Biesheuvel
2016-09-22 21:40       ` Pete Batard
2016-09-22 21:24     ` Andrew Fish
2016-09-22 21:29       ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox