public inbox for devel@edk2.groups.io
 help / color / mirror / Atom feed
* Issues with CLANGDWARF tool specification and X64 -- am I nuts or what?
@ 2022-01-10 20:56 Bill Paul
  2022-01-11 18:02 ` [edk2-devel] " Bill Paul
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Paul @ 2022-01-10 20:56 UTC (permalink / raw)
  To: devel

Hello all:

Recently I discovered that you can enable CSM compatibility mode in OVMF and 
decided to build some images with this feature Because Of Reasons (tm). My 
platform is:

FreeBSD/amd64 12.2-RELEASE
Clang/LLVM 10.0.1
QEMU 6.2.0

These days FreeBSD uses Clang as its native compiler. 12.2-RELEASE comes with 
version 10.0.1, but you also have the option of installing more recent Clang/
LLVM versions up to 12.0.1 as supplemental packages. I noticed that CLANGDWARF 
is a supported tool spec for the X64 and IA32 build targets so I decided to 
try that.

Unfortunately I ran into a couple of problems:

1) Compilation of OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c fails due to 
uninitialized local variable.

/mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
1875:9: error: variable 'Compacted' is used uninitialized whenever 'if' 
condition is false [-Werror,-Wsometimes-uninitialized]
    if (EcxIn == 1) {
        ^~~~~~~~~~
/mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
1895:12: note: uninitialized use occurs here
           Compacted
           ^~~~~~~~~
/mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
1875:5: note: remove the 'if' if its condition is always true
    if (EcxIn == 1) {
    ^~~~~~~~~~~~~~~~
/mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
1871:37: note: initialize the variable 'Compacted' to silence this warning
    BOOLEAN                Compacted;
                                    ^
                                     = '\0'

I changed line 1871 to:

    BOOLEAN                Compacted = FALSE;

and that fixed it. I suspect this may be a case where Clang is being a bit 
more strict than GCC about uninitialized variables.

2) Linking fails with numerous errors of the following kind:

ld.lld: error: can't create dynamic relocation R_X86_64_64 against local 
symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-
z,notext' to allow text relocations in the output
>>> defined in /mnt/home/wpaul/edk2/edk2/Build/OvmfX64/RELEASE_CLANGDWARF/X64/
UefiCpuPkg/Library/CpuExceptionHandlerLib/SecPeiCpuExceptionHandlerLib/OUTPUT/
SecPeiCpuExceptionHandlerLib.lib(ExceptionHandlerAsm.obj)

If I edit tools_def.txt and change RELEASE_CLANGDWARF_X64_DLINK_FLAGS so that 
it includes -Wl,-z,notext as the error suggests, then everything links as 
expected. (The same thing is needed for DEBUG_CLANGDWARF_X64_DLINK_FLAGS and 
probably NOOPT_CLANGDWARF_X64_DLINK_FLAGS.)

This problem only occurs when building for X64. IA32 seems ok.

Is there any particular reason why these flags aren't there already?

3) Although fixing the above two problems allows me to produce a complete 
OVMF.fd image, it crashes at start-up with the runtime error:

!!!! X64 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000 
!!!!
RIP  - 000000007EC8A7DB, CS  - 0000000000000038, RFLAGS - 0000000000010206
RAX  - 0000000000000000, RCX - 000000007F5A0880, RDX - 000000007FF2BAE0
RBX  - 000000007FF2BBF0, RSP - 000000007FF2BB20, RBP - 000000005A1C5000
RSI  - 000000007FF2BB48, RDI - 0000000000000000
R8   - 0000000000000008, R9  - 0000000000000000, R10 - 0000000000000000
R11  - 000000007FF41C90, R12 - 0000000000058080, R13 - 000000007FF2BC20
R14  - 00000000000F6DB0, R15 - 0000000000000000
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000007FC01000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 000000007F9EC000 0000000000000047, LDTR - 0000000000000000
IDTR - 000000007F5A4018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 000000007FF2B780
!!!! Find image based on IP(0x7EC8A7DB) (No PDB)  (ImageBase=000000007EC87000, 
EntryPoint=000000007EC8D7A6) !!!!

The same thing occurs for IA32:

!!!! IA32 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000 
!!!!
EIP  - 7ECF72D4, CS  - 00000010, EFLAGS - 00010206
EAX  - 00000008, ECX - 00000000, EDX - 7F8EEE10, EBX - 5000A19D
ESP  - 7FF33D44, EBP - 7FF33E4C, ESI - 7FF33D4C, EDI - 7ECFB04C
DS   - 00000008, ES  - 00000008, FS  - 00000008, GS  - 00000008, SS - 00000008
CR0  - 80010033, CR2 - 00000000, CR3 - 7FC01000, CR4 - 00000660
DR0  - 00000000, DR1 - 00000000, DR2 - 00000000, DR3 - 00000000
DR6  - FFFF0FF0, DR7 - 00000400
GDTR - 7F9EC000 00000047, IDTR - 7F5E5010 000007FF
LDTR - 00000000, TR - 00000000
FXSAVE_STATE - 7FF33A80
!!!! Find image based on IP(0x7ECF72D4) (No PDB)  (ImageBase=000000007ECF4000, 
EntryPoint=000000007ECF9FBD) !!!!

This one had me going for a while, but I eventually traced it down to 
LegacyBiosInt86() in OvmfPkg/Csm/LegacyBiosDxe/Thunk.c.

In that function, there is this code:

  UINT16  Segment;
  UINT16  Offset;
[...]
  //
  // The base address of legacy interrupt vector table is 0.
  // We use this base address to get the legacy interrupt handler.
  //
  ACCESS_PAGE0_CODE (
    Segment = (UINT16)(((UINT32 *)0)[BiosInt] >> 16);
    Offset  = (UINT16)((UINT32 *)0)[BiosInt];
    );

As the comment notes, here, we're trying to directly read from address 0 using 
BiosInt as an offset. This seems to trigger a problem with the Clang 
optimizer. The disassembled output looks like this:

0000000000003786 <LegacyBiosInt86>:
    3786:       56                      push   %rsi
    3787:       48 83 ec 60             sub    $0x60,%rsp
    378b:       41 0f b7 40 18          movzwl 0x18(%r8),%eax
    3790:       25 d4 0c 00 00          and    $0xcd4,%eax
    3795:       0d 02 30 00 00          or     $0x3002,%eax
    379a:       66 41 89 40 18          mov    %ax,0x18(%r8)
    379f:       48 8d 74 24 28          lea    0x28(%rsp),%rsi
    37a4:       48 83 66 18 00          andq   $0x0,0x18(%rsi)
    37a9:       48 8b 05 a8 59 00 00    mov    0x59a8(%rip),%rax        # 9158 
<gDS>
    37b0:       31 c9                   xor    %ecx,%ecx
    37b2:       48 89 f2                mov    %rsi,%rdx
    37b5:       ff 50 38                call   *0x38(%rax)
    37b8:       4c 8b 46 18             mov    0x18(%rsi),%r8
    37bc:       41 0f ba e0 0d          bt     $0xd,%r8d
    37c1:       73 18                   jae    37db <LegacyBiosInt86+0x55>
    37c3:       48 8b 05 8e 59 00 00    mov    0x598e(%rip),%rax        # 9158 
<gDS>
    37ca:       49 81 e0 ff df ff ff    and    $0xffffffffffffdfff,%r8
    37d1:       ba 00 10 00 00          mov    $0x1000,%edx
    37d6:       31 c9                   xor    %ecx,%ecx
    37d8:       ff 50 40                call   *0x40(%rax)
    37db:       0f 0b                   ud2    <--- er... what?

Note the "ud2" instruction. I have no idea what that's supposed to be, but it 
looks like Clang just gives up generating any further instructions once it 
hits this code.

I reduced this to a very simple sample C program that also reproduces the 
problem:

#include <stdint.h>

extern int somefunc (uint16_t s, uint16_t o);

int
foo (uint8_t BiosInt)
{
        uint16_t Segment;
        uint16_t Offset;

        Segment = (uint16_t)(((uint32_t *)0)[BiosInt] >> 16);
        Offset  = (uint16_t)((uint32_t *)0)[BiosInt];

        return somefunc (Segment, Offset);
}

If I compile this with:

% clang -O2 -c edk.c

then I get this:

0000000000000000 <foo>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   0f 0b                   ud2    

If I compile with GCC instead:

% gcc10 -O2 -c edk.c

then I get this:

0000000000000000 <foo>:
   0:   40 0f b6 ff             movzbl %dil,%edi
   4:   8b 3c bd 00 00 00 00    mov    0x0(,%rdi,4),%edi
   b:   0f b7 f7                movzwl %di,%esi
   e:   c1 ef 10                shr    $0x10,%edi
  11:   e9 00 00 00 00          jmp    16 <foo+0x16>

If I compile with Clang using -O0 to disable optimization, then it produces 
code:

0000000000000000 <foo>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   31 c0                   xor    %eax,%eax
   a:   89 c1                   mov    %eax,%ecx
   c:   40 88 7d ff             mov    %dil,-0x1(%rbp)
  10:   0f b6 45 ff             movzbl -0x1(%rbp),%eax
  14:   89 c2                   mov    %eax,%edx
  16:   8b 04 91                mov    (%rcx,%rdx,4),%eax
  19:   c1 e8 10                shr    $0x10,%eax
  1c:   66 89 45 fc             mov    %ax,-0x4(%rbp)
  20:   0f b6 75 ff             movzbl -0x1(%rbp),%esi
  24:   89 f2                   mov    %esi,%edx
  26:   8b 34 91                mov    (%rcx,%rdx,4),%esi
  29:   66 89 75 fa             mov    %si,-0x6(%rbp)
  2d:   66 8b 45 fc             mov    -0x4(%rbp),%ax
  31:   0f b7 f8                movzwl %ax,%edi
  34:   0f b7 75 fa             movzwl -0x6(%rbp),%esi
  38:   e8 00 00 00 00          call   3d <foo+0x3d>
  3d:   48 83 c4 10             add    $0x10,%rsp
  41:   5d                      pop    %rbp
  42:   c3                      ret

Note that this is with Clang 10.0.1, however I observe exactly the same 
results with Clang 12.0.1.

I was able to fix this locally (for both X64 and IA32 targets) this by doing 
this:

  UINT16  Segment;
  UINT16  Offset;
  volatile UINT32 * Page0 = NULL;
[...]
  ACCESS_PAGE0_CODE (
    Segment = (UINT16)(Page0[BiosInt] >> 16);
    Offset  = (UINT16)Page0[BiosInt];
    );

Note that the addition of the volatile keyword seems to be the key. If you 
wanted to make the change simpler, you could also just do this:

  ACCESS_PAGE0_CODE (
    Segment = (UINT16)(((volatile UINT32 *)0)[BiosInt] >> 16);
    Offset  = (UINT16)((volatile UINT32 *)0)[BiosInt];
    );

(To my eyes, the other way is more readable, but others may disagree.)

With these three issues fixed, I'm able to boot and run my OVMF.fd images for 
both X64 and IA32, and they're able to also boot and run the loaders and OS 
images that I'm testing.

I'm not sure how much testing the CLANGDWARF toolspec is getting these days, 
but maybe it should get a little more. Is there any chance these issues could 
be addressed?

-Bill

-- 
=============================================================================
-Bill Paul            (510) 749-2329 | VxWorks Software Architect,
                 wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
=============================================================================
   "I put a dollar in a change machine. Nothing changed." - George Carlin
=============================================================================




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [edk2-devel] Issues with CLANGDWARF tool specification and X64 -- am I nuts or what?
  2022-01-10 20:56 Issues with CLANGDWARF tool specification and X64 -- am I nuts or what? Bill Paul
@ 2022-01-11 18:02 ` Bill Paul
  2022-01-13 17:12   ` Andrew Fish
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Paul @ 2022-01-11 18:02 UTC (permalink / raw)
  To: devel

BTW, regarding 3) below, it looks like this is expected behavior from Clang/
LLVM. Dereferencing a NULL pointer is technically undefined in C, and by 
default the Clang optimizer will cause a trap to occur if you try to do it 
(because it assumes it's not safe for programs to do that).

The flag -fno-delete-null-pointer-checks can be used to avoid this behavior. 
This flag seems to be valid for Clang 9 and later. This is slightly more 
efficient than using the volatile keyword to defeat the optimizer. This flag 
is also for GCC.

-Bill

> Hello all:
> 
> Recently I discovered that you can enable CSM compatibility mode in OVMF and
> decided to build some images with this feature Because Of Reasons (tm). My
> platform is:
> 
> FreeBSD/amd64 12.2-RELEASE
> Clang/LLVM 10.0.1
> QEMU 6.2.0
> 
> These days FreeBSD uses Clang as its native compiler. 12.2-RELEASE comes
> with version 10.0.1, but you also have the option of installing more recent
> Clang/ LLVM versions up to 12.0.1 as supplemental packages. I noticed that
> CLANGDWARF is a supported tool spec for the X64 and IA32 build targets so I
> decided to try that.
> 
> Unfortunately I ran into a couple of problems:
> 
> 1) Compilation of OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c fails due to
> uninitialized local variable.
> 
> /mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
> 1875:9: error: variable 'Compacted' is used uninitialized whenever 'if'
> condition is false [-Werror,-Wsometimes-uninitialized]
>     if (EcxIn == 1) {
>         ^~~~~~~~~~
> /mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
> 1895:12: note: uninitialized use occurs here
>            Compacted
>            ^~~~~~~~~
> /mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
> 1875:5: note: remove the 'if' if its condition is always true
>     if (EcxIn == 1) {
>     ^~~~~~~~~~~~~~~~
> /mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
> 1871:37: note: initialize the variable 'Compacted' to silence this warning
>     BOOLEAN                Compacted;
>                                     ^
>                                      = '\0'
> 
> I changed line 1871 to:
> 
>     BOOLEAN                Compacted = FALSE;
> 
> and that fixed it. I suspect this may be a case where Clang is being a bit
> more strict than GCC about uninitialized variables.
> 
> 2) Linking fails with numerous errors of the following kind:
> 
> ld.lld: error: can't create dynamic relocation R_X86_64_64 against local
> symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-
> z,notext' to allow text relocations in the output
> 
> >>> defined in
> >>> /mnt/home/wpaul/edk2/edk2/Build/OvmfX64/RELEASE_CLANGDWARF/X64/
> 
> UefiCpuPkg/Library/CpuExceptionHandlerLib/SecPeiCpuExceptionHandlerLib/OUTPU
> T/ SecPeiCpuExceptionHandlerLib.lib(ExceptionHandlerAsm.obj)
> 
> If I edit tools_def.txt and change RELEASE_CLANGDWARF_X64_DLINK_FLAGS so
> that it includes -Wl,-z,notext as the error suggests, then everything links
> as expected. (The same thing is needed for DEBUG_CLANGDWARF_X64_DLINK_FLAGS
> and probably NOOPT_CLANGDWARF_X64_DLINK_FLAGS.)
> 
> This problem only occurs when building for X64. IA32 seems ok.
> 
> Is there any particular reason why these flags aren't there already?
> 
> 3) Although fixing the above two problems allows me to produce a complete
> OVMF.fd image, it crashes at start-up with the runtime error:
> 
> !!!! X64 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000
> !!!!
> RIP  - 000000007EC8A7DB, CS  - 0000000000000038, RFLAGS - 0000000000010206
> RAX  - 0000000000000000, RCX - 000000007F5A0880, RDX - 000000007FF2BAE0
> RBX  - 000000007FF2BBF0, RSP - 000000007FF2BB20, RBP - 000000005A1C5000
> RSI  - 000000007FF2BB48, RDI - 0000000000000000
> R8   - 0000000000000008, R9  - 0000000000000000, R10 - 0000000000000000
> R11  - 000000007FF41C90, R12 - 0000000000058080, R13 - 000000007FF2BC20
> R14  - 00000000000F6DB0, R15 - 0000000000000000
> DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
> GS   - 0000000000000030, SS  - 0000000000000030
> CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000007FC01000
> CR4  - 0000000000000668, CR8 - 0000000000000000
> DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
> DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
> GDTR - 000000007F9EC000 0000000000000047, LDTR - 0000000000000000
> IDTR - 000000007F5A4018 0000000000000FFF,   TR - 0000000000000000
> FXSAVE_STATE - 000000007FF2B780
> !!!! Find image based on IP(0x7EC8A7DB) (No PDB) 
> (ImageBase=000000007EC87000, EntryPoint=000000007EC8D7A6) !!!!
> 
> The same thing occurs for IA32:
> 
> !!!! IA32 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000
> !!!!
> EIP  - 7ECF72D4, CS  - 00000010, EFLAGS - 00010206
> EAX  - 00000008, ECX - 00000000, EDX - 7F8EEE10, EBX - 5000A19D
> ESP  - 7FF33D44, EBP - 7FF33E4C, ESI - 7FF33D4C, EDI - 7ECFB04C
> DS   - 00000008, ES  - 00000008, FS  - 00000008, GS  - 00000008, SS -
> 00000008 CR0  - 80010033, CR2 - 00000000, CR3 - 7FC01000, CR4 - 00000660
> DR0  - 00000000, DR1 - 00000000, DR2 - 00000000, DR3 - 00000000
> DR6  - FFFF0FF0, DR7 - 00000400
> GDTR - 7F9EC000 00000047, IDTR - 7F5E5010 000007FF
> LDTR - 00000000, TR - 00000000
> FXSAVE_STATE - 7FF33A80
> !!!! Find image based on IP(0x7ECF72D4) (No PDB) 
> (ImageBase=000000007ECF4000, EntryPoint=000000007ECF9FBD) !!!!
> 
> This one had me going for a while, but I eventually traced it down to
> LegacyBiosInt86() in OvmfPkg/Csm/LegacyBiosDxe/Thunk.c.
> 
> In that function, there is this code:
> 
>   UINT16  Segment;
>   UINT16  Offset;
> [...]
>   //
>   // The base address of legacy interrupt vector table is 0.
>   // We use this base address to get the legacy interrupt handler.
>   //
>   ACCESS_PAGE0_CODE (
>     Segment = (UINT16)(((UINT32 *)0)[BiosInt] >> 16);
>     Offset  = (UINT16)((UINT32 *)0)[BiosInt];
>     );
> 
> As the comment notes, here, we're trying to directly read from address 0
> using BiosInt as an offset. This seems to trigger a problem with the Clang
> optimizer. The disassembled output looks like this:
> 
> 0000000000003786 <LegacyBiosInt86>:
>     3786:       56                      push   %rsi
>     3787:       48 83 ec 60             sub    $0x60,%rsp
>     378b:       41 0f b7 40 18          movzwl 0x18(%r8),%eax
>     3790:       25 d4 0c 00 00          and    $0xcd4,%eax
>     3795:       0d 02 30 00 00          or     $0x3002,%eax
>     379a:       66 41 89 40 18          mov    %ax,0x18(%r8)
>     379f:       48 8d 74 24 28          lea    0x28(%rsp),%rsi
>     37a4:       48 83 66 18 00          andq   $0x0,0x18(%rsi)
>     37a9:       48 8b 05 a8 59 00 00    mov    0x59a8(%rip),%rax        #
> 9158 <gDS>
>     37b0:       31 c9                   xor    %ecx,%ecx
>     37b2:       48 89 f2                mov    %rsi,%rdx
>     37b5:       ff 50 38                call   *0x38(%rax)
>     37b8:       4c 8b 46 18             mov    0x18(%rsi),%r8
>     37bc:       41 0f ba e0 0d          bt     $0xd,%r8d
>     37c1:       73 18                   jae    37db <LegacyBiosInt86+0x55>
>     37c3:       48 8b 05 8e 59 00 00    mov    0x598e(%rip),%rax        #
> 9158 <gDS>
>     37ca:       49 81 e0 ff df ff ff    and    $0xffffffffffffdfff,%r8
>     37d1:       ba 00 10 00 00          mov    $0x1000,%edx
>     37d6:       31 c9                   xor    %ecx,%ecx
>     37d8:       ff 50 40                call   *0x40(%rax)
>     37db:       0f 0b                   ud2    <--- er... what?
> 
> Note the "ud2" instruction. I have no idea what that's supposed to be, but
> it looks like Clang just gives up generating any further instructions once
> it hits this code.
> 
> I reduced this to a very simple sample C program that also reproduces the
> problem:
> 
> #include <stdint.h>
> 
> extern int somefunc (uint16_t s, uint16_t o);
> 
> int
> foo (uint8_t BiosInt)
> {
>         uint16_t Segment;
>         uint16_t Offset;
> 
>         Segment = (uint16_t)(((uint32_t *)0)[BiosInt] >> 16);
>         Offset  = (uint16_t)((uint32_t *)0)[BiosInt];
> 
>         return somefunc (Segment, Offset);
> }
> 
> If I compile this with:
> 
> % clang -O2 -c edk.c
> 
> then I get this:
> 
> 0000000000000000 <foo>:
>    0:   55                      push   %rbp
>    1:   48 89 e5                mov    %rsp,%rbp
>    4:   0f 0b                   ud2
> 
> If I compile with GCC instead:
> 
> % gcc10 -O2 -c edk.c
> 
> then I get this:
> 
> 0000000000000000 <foo>:
>    0:   40 0f b6 ff             movzbl %dil,%edi
>    4:   8b 3c bd 00 00 00 00    mov    0x0(,%rdi,4),%edi
>    b:   0f b7 f7                movzwl %di,%esi
>    e:   c1 ef 10                shr    $0x10,%edi
>   11:   e9 00 00 00 00          jmp    16 <foo+0x16>
> 
> If I compile with Clang using -O0 to disable optimization, then it produces
> code:
> 
> 0000000000000000 <foo>:
>    0:   55                      push   %rbp
>    1:   48 89 e5                mov    %rsp,%rbp
>    4:   48 83 ec 10             sub    $0x10,%rsp
>    8:   31 c0                   xor    %eax,%eax
>    a:   89 c1                   mov    %eax,%ecx
>    c:   40 88 7d ff             mov    %dil,-0x1(%rbp)
>   10:   0f b6 45 ff             movzbl -0x1(%rbp),%eax
>   14:   89 c2                   mov    %eax,%edx
>   16:   8b 04 91                mov    (%rcx,%rdx,4),%eax
>   19:   c1 e8 10                shr    $0x10,%eax
>   1c:   66 89 45 fc             mov    %ax,-0x4(%rbp)
>   20:   0f b6 75 ff             movzbl -0x1(%rbp),%esi
>   24:   89 f2                   mov    %esi,%edx
>   26:   8b 34 91                mov    (%rcx,%rdx,4),%esi
>   29:   66 89 75 fa             mov    %si,-0x6(%rbp)
>   2d:   66 8b 45 fc             mov    -0x4(%rbp),%ax
>   31:   0f b7 f8                movzwl %ax,%edi
>   34:   0f b7 75 fa             movzwl -0x6(%rbp),%esi
>   38:   e8 00 00 00 00          call   3d <foo+0x3d>
>   3d:   48 83 c4 10             add    $0x10,%rsp
>   41:   5d                      pop    %rbp
>   42:   c3                      ret
> 
> Note that this is with Clang 10.0.1, however I observe exactly the same
> results with Clang 12.0.1.
> 
> I was able to fix this locally (for both X64 and IA32 targets) this by doing
> this:
> 
>   UINT16  Segment;
>   UINT16  Offset;
>   volatile UINT32 * Page0 = NULL;
> [...]
>   ACCESS_PAGE0_CODE (
>     Segment = (UINT16)(Page0[BiosInt] >> 16);
>     Offset  = (UINT16)Page0[BiosInt];
>     );
> 
> Note that the addition of the volatile keyword seems to be the key. If you
> wanted to make the change simpler, you could also just do this:
> 
>   ACCESS_PAGE0_CODE (
>     Segment = (UINT16)(((volatile UINT32 *)0)[BiosInt] >> 16);
>     Offset  = (UINT16)((volatile UINT32 *)0)[BiosInt];
>     );
> 
> (To my eyes, the other way is more readable, but others may disagree.)
> 
> With these three issues fixed, I'm able to boot and run my OVMF.fd images
> for both X64 and IA32, and they're able to also boot and run the loaders
> and OS images that I'm testing.
> 
> I'm not sure how much testing the CLANGDWARF toolspec is getting these days,
> but maybe it should get a little more. Is there any chance these issues
> could be addressed?
> 
> -Bill


-- 
=============================================================================
-Bill Paul            (510) 749-2329 | VxWorks Software Architect,
                 wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
=============================================================================
   "I put a dollar in a change machine. Nothing changed." - George Carlin
=============================================================================




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [edk2-devel] Issues with CLANGDWARF tool specification and X64 -- am I nuts or what?
  2022-01-11 18:02 ` [edk2-devel] " Bill Paul
@ 2022-01-13 17:12   ` Andrew Fish
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Fish @ 2022-01-13 17:12 UTC (permalink / raw)
  To: edk2-devel-groups-io, wpaul

[-- Attachment #1: Type: text/plain, Size: 12762 bytes --]



> On Jan 11, 2022, at 10:02 AM, Bill Paul <wpaul@windriver.com> wrote:
> 
> BTW, regarding 3) below, it looks like this is expected behavior from Clang/
> LLVM. Dereferencing a NULL pointer is technically undefined in C, and by 
> default the Clang optimizer will cause a trap to occur if you try to do it 
> (because it assumes it's not safe for programs to do that).
> 
> The flag -fno-delete-null-pointer-checks can be used to avoid this behavior. 
> This flag seems to be valid for Clang 9 and later. This is slightly more 
> efficient than using the volatile keyword to defeat the optimizer. This flag 
> is also for GCC.
> 

Given it is undefined behavior and the compiler is behaving correctly I think it is better to write correct C code and use volatile if you mean to use a NULL pointer. 

Not to mention I worked hard to get clang to emit the trap. If was very painful debugging issues like this when the function just ends in the middle and you fall into the next function. 

Thanks,

Andrew Fish

> -Bill
> 
>> Hello all:
>> 
>> Recently I discovered that you can enable CSM compatibility mode in OVMF and
>> decided to build some images with this feature Because Of Reasons (tm). My
>> platform is:
>> 
>> FreeBSD/amd64 12.2-RELEASE
>> Clang/LLVM 10.0.1
>> QEMU 6.2.0
>> 
>> These days FreeBSD uses Clang as its native compiler. 12.2-RELEASE comes
>> with version 10.0.1, but you also have the option of installing more recent
>> Clang/ LLVM versions up to 12.0.1 as supplemental packages. I noticed that
>> CLANGDWARF is a supported tool spec for the X64 and IA32 build targets so I
>> decided to try that.
>> 
>> Unfortunately I ran into a couple of problems:
>> 
>> 1) Compilation of OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c fails due to
>> uninitialized local variable.
>> 
>> /mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
>> 1875:9: error: variable 'Compacted' is used uninitialized whenever 'if'
>> condition is false [-Werror,-Wsometimes-uninitialized]
>>    if (EcxIn == 1) {
>>        ^~~~~~~~~~
>> /mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
>> 1895:12: note: uninitialized use occurs here
>>           Compacted
>>           ^~~~~~~~~
>> /mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
>> 1875:5: note: remove the 'if' if its condition is always true
>>    if (EcxIn == 1) {
>>    ^~~~~~~~~~~~~~~~
>> /mnt/home/wpaul/edk2/edk2/OvmfPkg/Library/VmgExitLib/VmgExitVcHandler.c:
>> 1871:37: note: initialize the variable 'Compacted' to silence this warning
>>    BOOLEAN                Compacted;
>>                                    ^
>>                                     = '\0'
>> 
>> I changed line 1871 to:
>> 
>>    BOOLEAN                Compacted = FALSE;
>> 
>> and that fixed it. I suspect this may be a case where Clang is being a bit
>> more strict than GCC about uninitialized variables.
>> 
>> 2) Linking fails with numerous errors of the following kind:
>> 
>> ld.lld: error: can't create dynamic relocation R_X86_64_64 against local
>> symbol in readonly segment; recompile object files with -fPIC or pass '-Wl,-
>> z,notext' to allow text relocations in the output
>> 
>>>>> defined in
>>>>> /mnt/home/wpaul/edk2/edk2/Build/OvmfX64/RELEASE_CLANGDWARF/X64/
>> 
>> UefiCpuPkg/Library/CpuExceptionHandlerLib/SecPeiCpuExceptionHandlerLib/OUTPU
>> T/ SecPeiCpuExceptionHandlerLib.lib(ExceptionHandlerAsm.obj)
>> 
>> If I edit tools_def.txt and change RELEASE_CLANGDWARF_X64_DLINK_FLAGS so
>> that it includes -Wl,-z,notext as the error suggests, then everything links
>> as expected. (The same thing is needed for DEBUG_CLANGDWARF_X64_DLINK_FLAGS
>> and probably NOOPT_CLANGDWARF_X64_DLINK_FLAGS.)
>> 
>> This problem only occurs when building for X64. IA32 seems ok.
>> 
>> Is there any particular reason why these flags aren't there already?
>> 
>> 3) Although fixing the above two problems allows me to produce a complete
>> OVMF.fd image, it crashes at start-up with the runtime error:
>> 
>> !!!! X64 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000
>> !!!!
>> RIP  - 000000007EC8A7DB, CS  - 0000000000000038, RFLAGS - 0000000000010206
>> RAX  - 0000000000000000, RCX - 000000007F5A0880, RDX - 000000007FF2BAE0
>> RBX  - 000000007FF2BBF0, RSP - 000000007FF2BB20, RBP - 000000005A1C5000
>> RSI  - 000000007FF2BB48, RDI - 0000000000000000
>> R8   - 0000000000000008, R9  - 0000000000000000, R10 - 0000000000000000
>> R11  - 000000007FF41C90, R12 - 0000000000058080, R13 - 000000007FF2BC20
>> R14  - 00000000000F6DB0, R15 - 0000000000000000
>> DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
>> GS   - 0000000000000030, SS  - 0000000000000030
>> CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 000000007FC01000
>> CR4  - 0000000000000668, CR8 - 0000000000000000
>> DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
>> DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
>> GDTR - 000000007F9EC000 0000000000000047, LDTR - 0000000000000000
>> IDTR - 000000007F5A4018 0000000000000FFF,   TR - 0000000000000000
>> FXSAVE_STATE - 000000007FF2B780
>> !!!! Find image based on IP(0x7EC8A7DB) (No PDB) 
>> (ImageBase=000000007EC87000, EntryPoint=000000007EC8D7A6) !!!!
>> 
>> The same thing occurs for IA32:
>> 
>> !!!! IA32 Exception Type - 06(#UD - Invalid Opcode)  CPU Apic ID - 00000000
>> !!!!
>> EIP  - 7ECF72D4, CS  - 00000010, EFLAGS - 00010206
>> EAX  - 00000008, ECX - 00000000, EDX - 7F8EEE10, EBX - 5000A19D
>> ESP  - 7FF33D44, EBP - 7FF33E4C, ESI - 7FF33D4C, EDI - 7ECFB04C
>> DS   - 00000008, ES  - 00000008, FS  - 00000008, GS  - 00000008, SS -
>> 00000008 CR0  - 80010033, CR2 - 00000000, CR3 - 7FC01000, CR4 - 00000660
>> DR0  - 00000000, DR1 - 00000000, DR2 - 00000000, DR3 - 00000000
>> DR6  - FFFF0FF0, DR7 - 00000400
>> GDTR - 7F9EC000 00000047, IDTR - 7F5E5010 000007FF
>> LDTR - 00000000, TR - 00000000
>> FXSAVE_STATE - 7FF33A80
>> !!!! Find image based on IP(0x7ECF72D4) (No PDB) 
>> (ImageBase=000000007ECF4000, EntryPoint=000000007ECF9FBD) !!!!
>> 
>> This one had me going for a while, but I eventually traced it down to
>> LegacyBiosInt86() in OvmfPkg/Csm/LegacyBiosDxe/Thunk.c.
>> 
>> In that function, there is this code:
>> 
>>  UINT16  Segment;
>>  UINT16  Offset;
>> [...]
>>  //
>>  // The base address of legacy interrupt vector table is 0.
>>  // We use this base address to get the legacy interrupt handler.
>>  //
>>  ACCESS_PAGE0_CODE (
>>    Segment = (UINT16)(((UINT32 *)0)[BiosInt] >> 16);
>>    Offset  = (UINT16)((UINT32 *)0)[BiosInt];
>>    );
>> 
>> As the comment notes, here, we're trying to directly read from address 0
>> using BiosInt as an offset. This seems to trigger a problem with the Clang
>> optimizer. The disassembled output looks like this:
>> 
>> 0000000000003786 <LegacyBiosInt86>:
>>    3786:       56                      push   %rsi
>>    3787:       48 83 ec 60             sub    $0x60,%rsp
>>    378b:       41 0f b7 40 18          movzwl 0x18(%r8),%eax
>>    3790:       25 d4 0c 00 00          and    $0xcd4,%eax
>>    3795:       0d 02 30 00 00          or     $0x3002,%eax
>>    379a:       66 41 89 40 18          mov    %ax,0x18(%r8)
>>    379f:       48 8d 74 24 28          lea    0x28(%rsp),%rsi
>>    37a4:       48 83 66 18 00          andq   $0x0,0x18(%rsi)
>>    37a9:       48 8b 05 a8 59 00 00    mov    0x59a8(%rip),%rax        #
>> 9158 <gDS>
>>    37b0:       31 c9                   xor    %ecx,%ecx
>>    37b2:       48 89 f2                mov    %rsi,%rdx
>>    37b5:       ff 50 38                call   *0x38(%rax)
>>    37b8:       4c 8b 46 18             mov    0x18(%rsi),%r8
>>    37bc:       41 0f ba e0 0d          bt     $0xd,%r8d
>>    37c1:       73 18                   jae    37db <LegacyBiosInt86+0x55>
>>    37c3:       48 8b 05 8e 59 00 00    mov    0x598e(%rip),%rax        #
>> 9158 <gDS>
>>    37ca:       49 81 e0 ff df ff ff    and    $0xffffffffffffdfff,%r8
>>    37d1:       ba 00 10 00 00          mov    $0x1000,%edx
>>    37d6:       31 c9                   xor    %ecx,%ecx
>>    37d8:       ff 50 40                call   *0x40(%rax)
>>    37db:       0f 0b                   ud2    <--- er... what?
>> 
>> Note the "ud2" instruction. I have no idea what that's supposed to be, but
>> it looks like Clang just gives up generating any further instructions once
>> it hits this code.
>> 
>> I reduced this to a very simple sample C program that also reproduces the
>> problem:
>> 
>> #include <stdint.h>
>> 
>> extern int somefunc (uint16_t s, uint16_t o);
>> 
>> int
>> foo (uint8_t BiosInt)
>> {
>>        uint16_t Segment;
>>        uint16_t Offset;
>> 
>>        Segment = (uint16_t)(((uint32_t *)0)[BiosInt] >> 16);
>>        Offset  = (uint16_t)((uint32_t *)0)[BiosInt];
>> 
>>        return somefunc (Segment, Offset);
>> }
>> 
>> If I compile this with:
>> 
>> % clang -O2 -c edk.c
>> 
>> then I get this:
>> 
>> 0000000000000000 <foo>:
>>   0:   55                      push   %rbp
>>   1:   48 89 e5                mov    %rsp,%rbp
>>   4:   0f 0b                   ud2
>> 
>> If I compile with GCC instead:
>> 
>> % gcc10 -O2 -c edk.c
>> 
>> then I get this:
>> 
>> 0000000000000000 <foo>:
>>   0:   40 0f b6 ff             movzbl %dil,%edi
>>   4:   8b 3c bd 00 00 00 00    mov    0x0(,%rdi,4),%edi
>>   b:   0f b7 f7                movzwl %di,%esi
>>   e:   c1 ef 10                shr    $0x10,%edi
>>  11:   e9 00 00 00 00          jmp    16 <foo+0x16>
>> 
>> If I compile with Clang using -O0 to disable optimization, then it produces
>> code:
>> 
>> 0000000000000000 <foo>:
>>   0:   55                      push   %rbp
>>   1:   48 89 e5                mov    %rsp,%rbp
>>   4:   48 83 ec 10             sub    $0x10,%rsp
>>   8:   31 c0                   xor    %eax,%eax
>>   a:   89 c1                   mov    %eax,%ecx
>>   c:   40 88 7d ff             mov    %dil,-0x1(%rbp)
>>  10:   0f b6 45 ff             movzbl -0x1(%rbp),%eax
>>  14:   89 c2                   mov    %eax,%edx
>>  16:   8b 04 91                mov    (%rcx,%rdx,4),%eax
>>  19:   c1 e8 10                shr    $0x10,%eax
>>  1c:   66 89 45 fc             mov    %ax,-0x4(%rbp)
>>  20:   0f b6 75 ff             movzbl -0x1(%rbp),%esi
>>  24:   89 f2                   mov    %esi,%edx
>>  26:   8b 34 91                mov    (%rcx,%rdx,4),%esi
>>  29:   66 89 75 fa             mov    %si,-0x6(%rbp)
>>  2d:   66 8b 45 fc             mov    -0x4(%rbp),%ax
>>  31:   0f b7 f8                movzwl %ax,%edi
>>  34:   0f b7 75 fa             movzwl -0x6(%rbp),%esi
>>  38:   e8 00 00 00 00          call   3d <foo+0x3d>
>>  3d:   48 83 c4 10             add    $0x10,%rsp
>>  41:   5d                      pop    %rbp
>>  42:   c3                      ret
>> 
>> Note that this is with Clang 10.0.1, however I observe exactly the same
>> results with Clang 12.0.1.
>> 
>> I was able to fix this locally (for both X64 and IA32 targets) this by doing
>> this:
>> 
>>  UINT16  Segment;
>>  UINT16  Offset;
>>  volatile UINT32 * Page0 = NULL;
>> [...]
>>  ACCESS_PAGE0_CODE (
>>    Segment = (UINT16)(Page0[BiosInt] >> 16);
>>    Offset  = (UINT16)Page0[BiosInt];
>>    );
>> 
>> Note that the addition of the volatile keyword seems to be the key. If you
>> wanted to make the change simpler, you could also just do this:
>> 
>>  ACCESS_PAGE0_CODE (
>>    Segment = (UINT16)(((volatile UINT32 *)0)[BiosInt] >> 16);
>>    Offset  = (UINT16)((volatile UINT32 *)0)[BiosInt];
>>    );
>> 
>> (To my eyes, the other way is more readable, but others may disagree.)
>> 
>> With these three issues fixed, I'm able to boot and run my OVMF.fd images
>> for both X64 and IA32, and they're able to also boot and run the loaders
>> and OS images that I'm testing.
>> 
>> I'm not sure how much testing the CLANGDWARF toolspec is getting these days,
>> but maybe it should get a little more. Is there any chance these issues
>> could be addressed?
>> 
>> -Bill
> 
> 
> -- 
> =============================================================================
> -Bill Paul            (510) 749-2329 | VxWorks Software Architect,
>                 wpaul@windriver.com | Master of Unix-Fu - Wind River Systems
> =============================================================================
>   "I put a dollar in a change machine. Nothing changed." - George Carlin
> =============================================================================
> 
> 
> 
> 
> 
> 


[-- Attachment #2: Type: text/html, Size: 38714 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-01-13 17:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-10 20:56 Issues with CLANGDWARF tool specification and X64 -- am I nuts or what? Bill Paul
2022-01-11 18:02 ` [edk2-devel] " Bill Paul
2022-01-13 17:12   ` Andrew Fish

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox