基本概念

预编译—>编译—>汇编—>链接

宏定义代替,预编译命令,文本去注释—>.S—>.O—>ELF

文本变成中间文件, 也就是编译+汇编的过程:

词法分析,语法分析,语义分析,中间过程优化

ELF 文件

ELF 是一种二进制可执行文件.

目标文件(*.obj)和最终可执行文件(ELF 格式)统称为 ELF 文件.

另外*.a *.so coredump文件也都是 ELF 文件的不同类型.

ELF 文件格式解析

最重要的是,数据是分段存储,不同段的有不同的功能.段的类型: Screenshot from 2019-01-18 16-31-09-4af7c91a-c88c-4640-953e-113a0cfa0550

为了后面装载的方便,段在链接期间,是可以合并,优化空间的,

例子

以 a.c, b.c 为例子:

//in a.c
#include <stdio.h>
extern int shared;
int main(int argc, char* argv[])
{
        int a = 100;
        swap(&a, &shared);
        printf("shared %d\n", shared);
        return 0;
}

//in b.c
int shared = 99;
void swap(int* a, int* b)
{
        int tmp = *a;
        *a = *b;
        *b = tmp;
}

编译(包含中间文件):

gcc a.c b.c -c -o ab

查看所有 elf 信息:

➜  compile-link-lib readelf -a ab
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x5a0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          6584 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         29
  Section header string table index: 28

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         0000000000000238  00000238
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .note.ABI-tag     NOTE             0000000000000254  00000254
       0000000000000020  0000000000000000   A       0     0     4
  [ 3] .note.gnu.build-i NOTE             0000000000000274  00000274
       0000000000000024  0000000000000000   A       0     0     4
  [ 4] .gnu.hash         GNU_HASH         0000000000000298  00000298
       000000000000001c  0000000000000000   A       5     0     8
  [ 5] .dynsym           DYNSYM           00000000000002b8  000002b8
       00000000000000c0  0000000000000018   A       6     1     8
  [ 6] .dynstr           STRTAB           0000000000000378  00000378
       000000000000009f  0000000000000000   A       0     0     1
  [ 7] .gnu.version      VERSYM           0000000000000418  00000418
       0000000000000010  0000000000000002   A       5     0     2
  [ 8] .gnu.version_r    VERNEED          0000000000000428  00000428
       0000000000000030  0000000000000000   A       6     1     8
  [ 9] .rela.dyn         RELA             0000000000000458  00000458
       00000000000000c0  0000000000000018   A       5     0     8
  [10] .rela.plt         RELA             0000000000000518  00000518
       0000000000000030  0000000000000018  AI       5    22     8
  [11] .init             PROGBITS         0000000000000548  00000548
       0000000000000017  0000000000000000  AX       0     0     4
  [12] .plt              PROGBITS         0000000000000560  00000560
       0000000000000030  0000000000000010  AX       0     0     16
  [13] .plt.got          PROGBITS         0000000000000590  00000590
       0000000000000008  0000000000000008  AX       0     0     8
  [14] .text             PROGBITS         00000000000005a0  000005a0
       0000000000000222  0000000000000000  AX       0     0     16
  [15] .fini             PROGBITS         00000000000007c4  000007c4
       0000000000000009  0000000000000000  AX       0     0     4
  [16] .rodata           PROGBITS         00000000000007d0  000007d0
       000000000000000f  0000000000000000   A       0     0     4
  [17] .eh_frame_hdr     PROGBITS         00000000000007e0  000007e0
       0000000000000044  0000000000000000   A       0     0     4
  [18] .eh_frame         PROGBITS         0000000000000828  00000828
       0000000000000128  0000000000000000   A       0     0     8
  [19] .init_array       INIT_ARRAY       0000000000200db0  00000db0
       0000000000000008  0000000000000008  WA       0     0     8
  [20] .fini_array       FINI_ARRAY       0000000000200db8  00000db8
       0000000000000008  0000000000000008  WA       0     0     8
  [21] .dynamic          DYNAMIC          0000000000200dc0  00000dc0
       00000000000001f0  0000000000000010  WA       6     0     8
  [22] .got              PROGBITS         0000000000200fb0  00000fb0
       0000000000000050  0000000000000008  WA       0     0     8
  [23] .data             PROGBITS         0000000000201000  00001000
       0000000000000014  0000000000000000  WA       0     0     8
  [24] .bss              NOBITS           0000000000201014  00001014
       0000000000000004  0000000000000000  WA       0     0     1
  [25] .comment          PROGBITS         0000000000000000  00001014
       000000000000002a  0000000000000001  MS       0     0     1
  [26] .symtab           SYMTAB           0000000000000000  00001040
       0000000000000648  0000000000000018          27    44     8
  [27] .strtab           STRTAB           0000000000000000  00001688
       000000000000022d  0000000000000000           0     0     1
  [28] .shstrtab         STRTAB           0000000000000000  000018b5
       00000000000000fe  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000001f8 0x00000000000001f8  R      0x8
  INTERP         0x0000000000000238 0x0000000000000238 0x0000000000000238
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000950 0x0000000000000950  R E    0x200000
  LOAD           0x0000000000000db0 0x0000000000200db0 0x0000000000200db0
                 0x0000000000000264 0x0000000000000268  RW     0x200000
  DYNAMIC        0x0000000000000dc0 0x0000000000200dc0 0x0000000000200dc0
                 0x00000000000001f0 0x00000000000001f0  RW     0x8
  NOTE           0x0000000000000254 0x0000000000000254 0x0000000000000254
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x00000000000007e0 0x00000000000007e0 0x00000000000007e0
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000000db0 0x0000000000200db0 0x0000000000200db0
                 0x0000000000000250 0x0000000000000250  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .init_array .fini_array .dynamic .got .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06     .eh_frame_hdr
   07
   08     .init_array .fini_array .dynamic .got

Dynamic section at offset 0xdc0 contains 27 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x548
 0x000000000000000d (FINI)               0x7c4
 0x0000000000000019 (INIT_ARRAY)         0x200db0
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x200db8
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x298
 0x0000000000000005 (STRTAB)             0x378
 0x0000000000000006 (SYMTAB)             0x2b8
 0x000000000000000a (STRSZ)              159 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x200fb0
 0x0000000000000002 (PLTRELSZ)           48 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x518
 0x0000000000000007 (RELA)               0x458
 0x0000000000000008 (RELASZ)             192 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000000000001e (FLAGS)              BIND_NOW
 0x000000006ffffffb (FLAGS_1)            Flags: NOW PIE
 0x000000006ffffffe (VERNEED)            0x428
 0x000000006fffffff (VERNEEDNUM)         1
 0x000000006ffffff0 (VERSYM)             0x418
 0x000000006ffffff9 (RELACOUNT)          3
 0x0000000000000000 (NULL)               0x0

Relocation section '.rela.dyn' at offset 0x458 contains 8 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200db0  000000000008 R_X86_64_RELATIVE                    6a0
000000200db8  000000000008 R_X86_64_RELATIVE                    660
000000201008  000000000008 R_X86_64_RELATIVE                    201008
000000200fd8  000100000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_deregisterTMClone + 0
000000200fe0  000400000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
000000200fe8  000500000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000200ff0  000600000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_registerTMCloneTa + 0
000000200ff8  000700000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize@GLIBC_2.2.5 + 0

Relocation section '.rela.plt' at offset 0x518 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000200fc8  000200000007 R_X86_64_JUMP_SLO 0000000000000000 __stack_chk_fail@GLIBC_2.4 + 0
000000200fd0  000300000007 R_X86_64_JUMP_SLO 0000000000000000 printf@GLIBC_2.2.5 + 0

The decoding of unwind sections for machine type Advanced Micro Devices X86-64 is not currently supported.

Symbol table '.dynsym' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __stack_chk_fail@GLIBC_2.4 (2)
     3: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.2.5 (3)
     4: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.2.5 (3)
     5: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     6: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
     7: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.2.5 (3)

Symbol table '.symtab' contains 67 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000238     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000000254     0 SECTION LOCAL  DEFAULT    2
     3: 0000000000000274     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000000298     0 SECTION LOCAL  DEFAULT    4
     5: 00000000000002b8     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000378     0 SECTION LOCAL  DEFAULT    6
     7: 0000000000000418     0 SECTION LOCAL  DEFAULT    7
     8: 0000000000000428     0 SECTION LOCAL  DEFAULT    8
     9: 0000000000000458     0 SECTION LOCAL  DEFAULT    9
    10: 0000000000000518     0 SECTION LOCAL  DEFAULT   10
    11: 0000000000000548     0 SECTION LOCAL  DEFAULT   11
    12: 0000000000000560     0 SECTION LOCAL  DEFAULT   12
    13: 0000000000000590     0 SECTION LOCAL  DEFAULT   13
    14: 00000000000005a0     0 SECTION LOCAL  DEFAULT   14
    15: 00000000000007c4     0 SECTION LOCAL  DEFAULT   15
    16: 00000000000007d0     0 SECTION LOCAL  DEFAULT   16
    17: 00000000000007e0     0 SECTION LOCAL  DEFAULT   17
    18: 0000000000000828     0 SECTION LOCAL  DEFAULT   18
    19: 0000000000200db0     0 SECTION LOCAL  DEFAULT   19
    20: 0000000000200db8     0 SECTION LOCAL  DEFAULT   20
    21: 0000000000200dc0     0 SECTION LOCAL  DEFAULT   21
    22: 0000000000200fb0     0 SECTION LOCAL  DEFAULT   22
    23: 0000000000201000     0 SECTION LOCAL  DEFAULT   23
    24: 0000000000201014     0 SECTION LOCAL  DEFAULT   24
    25: 0000000000000000     0 SECTION LOCAL  DEFAULT   25
    26: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    27: 00000000000005d0     0 FUNC    LOCAL  DEFAULT   14 deregister_tm_clones
    28: 0000000000000610     0 FUNC    LOCAL  DEFAULT   14 register_tm_clones
    29: 0000000000000660     0 FUNC    LOCAL  DEFAULT   14 __do_global_dtors_aux
    30: 0000000000201014     1 OBJECT  LOCAL  DEFAULT   24 completed.7696
    31: 0000000000200db8     0 OBJECT  LOCAL  DEFAULT   20 __do_global_dtors_aux_fin
    32: 00000000000006a0     0 FUNC    LOCAL  DEFAULT   14 frame_dummy
    33: 0000000000200db0     0 OBJECT  LOCAL  DEFAULT   19 __frame_dummy_init_array_
    34: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS a.c
    35: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS b.c
    36: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS crtstuff.c
    37: 000000000000094c     0 OBJECT  LOCAL  DEFAULT   18 __FRAME_END__
    38: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS
    39: 0000000000200db8     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_end
    40: 0000000000200dc0     0 OBJECT  LOCAL  DEFAULT   21 _DYNAMIC
    41: 0000000000200db0     0 NOTYPE  LOCAL  DEFAULT   19 __init_array_start
    42: 00000000000007e0     0 NOTYPE  LOCAL  DEFAULT   17 __GNU_EH_FRAME_HDR
    43: 0000000000200fb0     0 OBJECT  LOCAL  DEFAULT   22 _GLOBAL_OFFSET_TABLE_
    44: 00000000000007c0     2 FUNC    GLOBAL DEFAULT   14 __libc_csu_fini
    45: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTab
    46: 0000000000201000     0 NOTYPE  WEAK   DEFAULT   23 data_start
    47: 0000000000201014     0 NOTYPE  GLOBAL DEFAULT   23 _edata
    48: 00000000000007c4     0 FUNC    GLOBAL DEFAULT   15 _fini
    49: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __stack_chk_fail@@GLIBC_2
    50: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.2.5
    51: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
    52: 0000000000201000     0 NOTYPE  GLOBAL DEFAULT   23 __data_start
    53: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    54: 0000000000201008     0 OBJECT  GLOBAL HIDDEN    23 __dso_handle
    55: 00000000000007d0     4 OBJECT  GLOBAL DEFAULT   16 _IO_stdin_used
    56: 0000000000000750   101 FUNC    GLOBAL DEFAULT   14 __libc_csu_init
    57: 0000000000201018     0 NOTYPE  GLOBAL DEFAULT   24 _end
    58: 00000000000005a0    43 FUNC    GLOBAL DEFAULT   14 _start
    59: 0000000000201014     0 NOTYPE  GLOBAL DEFAULT   24 __bss_start
    60: 00000000000006aa   113 FUNC    GLOBAL DEFAULT   14 main
    61: 0000000000201018     0 OBJECT  GLOBAL HIDDEN    23 __TMC_END__
    62: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
    63: 000000000000071b    45 FUNC    GLOBAL DEFAULT   14 swap
    64: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@@GLIBC_2.2
    65: 0000000000000548     0 FUNC    GLOBAL DEFAULT   11 _init
    66: 0000000000201010     4 OBJECT  GLOBAL DEFAULT   23 shared

Version symbols section '.gnu.version' contains 8 entries:
 Addr: 0000000000000418  Offset: 0x000418  Link: 5 (.dynsym)
  000:   0 (*local*)       0 (*local*)       2 (GLIBC_2.4)     3 (GLIBC_2.2.5)
  004:   3 (GLIBC_2.2.5)   0 (*local*)       0 (*local*)       3 (GLIBC_2.2.5)

Version needs section '.gnu.version_r' contains 1 entry:
 Addr: 0x0000000000000428  Offset: 0x000428  Link: 6 (.dynstr)
  000000: Version: 1  File: libc.so.6  Cnt: 2
  0x0010:   Name: GLIBC_2.2.5  Flags: none  Version: 3
  0x0020:   Name: GLIBC_2.4  Flags: none  Version: 2

Displaying notes found in: .note.ABI-tag
  Owner                 Data size       Description
  GNU                  0x00000010       NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 3.2.0

Displaying notes found in: .note.gnu.build-id
  Owner                 Data size       Description
  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 1f04fda3dce8f1ed6bd62b23f5c98a9188ea3c3e

看 a.o 目标文件的段:

➜  compile-link-lib readelf -S a.o
There are 13 section headers, starting at offset 0x3e0:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       0000000000000071  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  000002d0
       0000000000000090  0000000000000018   I      10     1     8
  [ 3] .data             PROGBITS         0000000000000000  000000b1
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .bss              NOBITS           0000000000000000  000000b1
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .rodata           PROGBITS         0000000000000000  000000b1
       000000000000000b  0000000000000000   A       0     0     1
  [ 6] .comment          PROGBITS         0000000000000000  000000bc
       000000000000002b  0000000000000001  MS       0     0     1
  [ 7] .note.GNU-stack   PROGBITS         0000000000000000  000000e7
       0000000000000000  0000000000000000           0     0     1
  [ 8] .eh_frame         PROGBITS         0000000000000000  000000e8
       0000000000000038  0000000000000000   A       0     0     8
  [ 9] .rela.eh_frame    RELA             0000000000000000  00000360
       0000000000000018  0000000000000018   I      10     8     8
  [10] .symtab           SYMTAB           0000000000000000  00000120
       0000000000000168  0000000000000018          11     9     8
  [11] .strtab           STRTAB           0000000000000000  00000288
       0000000000000044  0000000000000000           0     0     1
  [12] .shstrtab         STRTAB           0000000000000000  00000378
       0000000000000061  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

符号表

symtab 是代码中的用到的符号(函数,变量,elf 文件本身用到的字符串),组成的表.

obj 和最终 elf 都有自己的 symtab.

- elf文件中用到的符号作为单独的1个段来管理.
- 最终的symtab包含了目标文件里的符号表: 符号名,大小,类型(绑定信息)

符号内容包含 value,类型和绑定信息:

Screenshot from 2019-01-18 17-04-26-a2aa6b8a-206c-4c0e-8e73-4fdfea6fd446

符号的值:

- obj文件中,可能是符号/函数的fake地址,段偏移,或其他的属性;
- elf执行文件中,就是符号的虚拟地址,是链接后被修正过的.

来看 a.o 的符号表情况:

➜  compile-link-lib readelf -s a.o

Symbol table '.symtab' contains 15 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS a.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    7
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    8
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    6
     9: 0000000000000000   113 FUNC    GLOBAL DEFAULT    1 main
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND shared
    11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND swap
    13: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
    14: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND __stack_chk_fail

符号修饰和函数签名:

- 不同文件可能定义同1符号,需要保证符号的唯一性,最终的符号导出时,要进行修饰 'func-->?func@@HMT@HZ'之类的
- 不同编译器搞法不一样,这是不同编译器编译后的文件不能相互link的原因

强符号和弱符号:

in file 1:
int a = 0;
int file 2:
int a = 0; //两个强符号,compile error
int a; //未初始化的全局变量为弱符号, ok 跟随file 1的强符号的定义
__attribute(weak) int a = 0; //手动指定弱符号 ,ok,同上

强引用和弱引用:

__attribute(weakref) void func(); //弱引用链接时没找到定义也不报错,(运行时则直接报错) 强引用则链接时就报错,

使用场景:某些库的模块需要扩展模块,扩展模块用弱引用, 没有扩展也能用(扩展不存在时,func 的地址为 0,可以判断出来。)

静态编译的链接过程

拼接分散的模块(*.obj),再修正跨模块引用符号地址,那些符号称为”重定位入口”,在链接阶段被”重定位”,

缺点:空间浪费,不方便修改

优点: 不依赖他人,方便使用,快

地址和空间分配(Address and Space Allocation)

- 相似段(a .text and b.text)叠加合并,重新调整段的大小,位置;
- 所有符号表也合并成1个global symbol table
- 给各个段(.text .data等)分配进程空间的虚拟地址

了解个更深的概念,叫函数级别链接,gcc 参数-ffunction-sections or -fdata-sections 它把每个函数都放到单独的段,这样链接器针对函数级别的段进行链接,自然那些用不到的函数的段,就不会最终合并,也就可减少最终文件的大小,代价是编译和链接过程变慢,而且更复杂了,链接器也需要维护不同函数的调用关系,目标文件自然也增大,

符号决议(Symbol Resolution) + 重定位(Relocation)

先看结果. 比较下 a.o 和 ab.o 里,可以看到符号修正的情况:

objdump 可以 dump 指定的段:

a.o:

objdump -d a.o
0000000000000000 <main>:
0:   55                      push   %rbp
1:   48 89 e5                mov    %rsp,%rbp
4:   48 83 ec 20             sub    $0x20,%rsp
8:   89 7d ec                mov    %edi,-0x14(%rbp)
b:   48 89 75 e0             mov    %rsi,-0x20(%rbp)
f:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
16:   00 00
18:   48 89 45 f8             mov    %rax,-0x8(%rbp)
1c:   31 c0                   xor    %eax,%eax
1e:   c7 45 f4 64 00 00 00    movl   $0x64,-0xc(%rbp)
25:   48 8d 45 f4             lea    -0xc(%rbp),%rax
29:   48 8d 35 00 00 00 00    lea    0x0(%rip),%rsi        # 30 <main+0x30>
30:   48 89 c7                mov    %rax,%rdi
33:   b8 00 00 00 00          mov    $0x0,%eax
38:   e8 00 00 00 00          callq  3d <main+0x3d>
3d:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 43 <main+0x43>
43:   89 c6                   mov    %eax,%esi
45:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 4c <main+0x4c>
4c:   b8 00 00 00 00          mov    $0x0,%eax
51:   e8 00 00 00 00          callq  56 <main+0x56>
56:   b8 00 00 00 00          mov    $0x0,%eax
5b:   48 8b 55 f8             mov    -0x8(%rbp),%rdx
5f:   64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
66:   00 00
68:   74 05                   je     6f <main+0x6f>
6a:   e8 00 00 00 00          callq  6f <main+0x6f>
6f:   c9                      leaveq
70:   c3                      retq

line 29 : swap 调用前将 shared 压栈,此时 shared 地址为 0x00000000,显示并不是有效的地址;

line 38: 通过 callq 指令,调用 swap,但 swap 的地址也为 0x00000000,无意义;

再看符号重定位后的 elf 执行文件 ab:

objdump -d ab
00000000000006aa <main>:
 6aa:   55                      push   %rbp
 6ab:   48 89 e5                mov    %rsp,%rbp
 6ae:   48 83 ec 20             sub    $0x20,%rsp
 6b2:   89 7d ec                mov    %edi,-0x14(%rbp)
 6b5:   48 89 75 e0             mov    %rsi,-0x20(%rbp)
 6b9:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
 6c0:   00 00
 6c2:   48 89 45 f8             mov    %rax,-0x8(%rbp)
 6c6:   31 c0                   xor    %eax,%eax
 6c8:   c7 45 f4 64 00 00 00    movl   $0x64,-0xc(%rbp)
 6cf:   48 8d 45 f4             lea    -0xc(%rbp),%rax
 6d3:   48 8d 35 36 09 20 00    lea    0x200936(%rip),%rsi        # 201010 <shared>
 6da:   48 89 c7                mov    %rax,%rdi
 6dd:   b8 00 00 00 00          mov    $0x0,%eax
 6e2:   e8 34 00 00 00          callq  71b <swap>
 6e7:   8b 05 23 09 20 00       mov    0x200923(%rip),%eax        # 201010 <shared>
 6ed:   89 c6                   mov    %eax,%esi
 6ef:   48 8d 3d de 00 00 00    lea    0xde(%rip),%rdi        # 7d4 <_IO_stdin_used+0x4>
 6f6:   b8 00 00 00 00          mov    $0x0,%eax
 6fb:   e8 80 fe ff ff          callq  580 <printf@plt>
 700:   b8 00 00 00 00          mov    $0x0,%eax
 705:   48 8b 55 f8             mov    -0x8(%rbp),%rdx
 709:   64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
 710:   00 00
 712:   74 05                   je     719 <main+0x6f>
 714:   e8 57 fe ff ff          callq  570 <__stack_chk_fail@plt>
 719:   c9                      leaveq
 71a:   c3                      retq

line 6d3: shared 已经有具体的虚拟地址:0x00200936

line 6e2: swap 有具体的虚拟地址:0x00000034

具体怎么重定位的? 就是利用重定位表和段地址,符号的偏移值,修正规则等, 计算一般符号的虚拟地址.

来看重定位表,即 rela.开头的段:

➜  compile-link-lib objdump -r a.o

a.o:     file format elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE
000000000000002c R_X86_64_PC32     shared-0x0000000000000004
0000000000000039 R_X86_64_PLT32    swap-0x0000000000000004
000000000000003f R_X86_64_PC32     shared-0x0000000000000004
0000000000000048 R_X86_64_PC32     .rodata-0x0000000000000004
0000000000000052 R_X86_64_PLT32    printf-0x0000000000000004
000000000000006b R_X86_64_PLT32    __stack_chk_fail-0x0000000000000004

RELOCATION RECORDS FOR [.eh_frame]:
OFFSET           TYPE              VALUE
0000000000000020 R_X86_64_PC32     .text

第 1 列的 offset 就是重定位入口在 a.o 中相对短起始的偏移位置.

第 2 列是重定位符号类型,决定了它的地址修正方式:

A:保存在被修正位置的值,
P:被修正的位置(相对于段开始的偏移量)
S:符号的实际地址 (符号肯定是存在的,它的实际位置是放在sysmtab中)
修正方式:
_32: 绝对地址寻址修正 S+A
_PC32: 相对地址寻址修正 S+A-P
_PLT32:

来看这个例子,shared 的修正方式是相对寻址修正.

  • S 先看 shared 符号在符号表中的地址 S = 0x00201010
➜  compile-link-lib objdump -t ab |grep shared
0000000000201010 g     O .data  0000000000000004              shared
  • A 保存在被修正位置的值,也就是未重定位前的原值是 A = 0x00000000.这就是要修改的目标 一般都是 0x0.
  • P 表示 shared 要被修正的位置,它位于 main 中,所以先找到 main 的实际地址
00000000000006aa g     F .text  0000000000000071              main

修正的指令的偏移在重定位表里为 offset = 0x2c

所以被修正的位置值: P = 0x6aa + 0x2c

那么计算相对寻址的方式是:

modified_A = S + A - P - 4

计算结果:

>>> hex(0x201010 + 0x0 -(0x6aa+0x2c) - 4)
'0x200936'

最后看实际的代码,是正确的:

6d3:   48 8d 35 36 09 20 00    lea    0x200936(%rip),%rsi        # 201010 <shared>

注意:这个 lea + rip 大有学问..

lea 指令用来取地址的

如果是绝对寻址,修正值直接就是符号表里的地址(when A=0x0),这是很自然的:

modified_A  = S + A(A=0 usually)

相对寻址修正,这两个的区别自然也就是这个 P 上.

为何需要相对寻址修正?

也许在汇编过程中,编译器为了其他指令使用相对寻址的方式来寻访 access 的方便.

静态库链接

lib.a 就是一堆的.o,所以本质上和 obj 静态链接没有区别.