C++ exception stack unwinding is Turing-complete
The
DWARF debugging information format is complex, which is needed in order to describe optimized code. For example, an expression such as
i+1
may be placed in a register
reg
, and
i
is not available anywhere in the registers or memory in the optimized program. But the debugger should be able to display the value of
i
, so the debug information must be able to express that
i
has the value
reg-1
. And optimized code can be transformed in much more complex ways than this, so DWARF does (surprisingly) define a stack machine that can do arbitrary computation in order to be able to represent the information.
Exception handling need to unwind the stack and restore registers, which is functionality that can be represented by the debug information, so DWARF was adopted (with some minor modifications) for the exception handling. The stack unwinding code can thus execute programs "hidden" in the DWARF information while unwinding the stack.
The Turing-completeness of this is not strange in the same way as the others I have written about previously; DWARF is designed to be general! But it is probably surprising to most developers that this functionality exists... The paper "
Exploiting the hard-working DWARF: Trojan and Exploit Techniques With No Native Executable Code" looks at what can be done with this. The stack machine is rather easy to work with — here is an example from the paper that calculates the length of a string that is located just below the stack frame:
# Value at −0x8(%rbp) on stack
DW_OP_breg6 −8
DW_OP_lit0 # initial strlen
DW_OP_swap
DW_OP_dup
LOOP:
DW_OP_deref_size 1
# Branch if top of stack nonzero
DW_OP_bra NOT_DONE
DW_OP_skip FINISH
NOT_DONE:
# Increment the counted length
DW_OP_swap
DW_OP_lit1
DW_OP_plus
DW_OP_swap
# Add length to char pointer
DW_OP_plus
DW_OP_skip LOOP
FINISH:
# Finally put the character count on the top of the stack
# as return value
DW_OP_swap
ELF metadata is Turing-complete
The Runtime Linker/Loader (RTLD) is responsible for loading shared libraries into a process' address space and updating the symbol references in the object code using the relocation and symbol tables. The RTLD can be viewed as a machine that is executing the relocation tables, and the paper "
“Weird Machines” in ELF: A Spotlight on the Underappreciated Metadata" shows that this machine is Turing-complete by constructing an assembly-like language composed of three instructions
mov
— move
add
— addition
jnz
— jump if not zero
where each instruction is built from a sequence of relocations.
There are two kinds of structures involved in the relocation:
- The symbol tables contain information about the symbols, where each symbol is described by the
ELF64_Sym
structure
typedef struct
{
Elf64_Word st_name; /* Symbol name (string table index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Unused */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
- The relocation tables specify how to modify the binary, where each relocation is described by the
Elf64_Rela
structure
typedef struct
{
Elf64_Addr r_offset; /* Address to patch */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;
The construct of the machine is using three kinds of relocations,
R_X86_64_COPY
,
R_X86_64_64
, and
R_X86_64_RELATIVE
, which have the functionality
R_X86_64_COPY memcpy(base + r.r_offset, s.st_value, s.st_size)
R_X86_64_64 *(base + r.r_offset) = s.st_value + r.addend + base
R_X86_64_RELATIVE *(base + r.r_offset) = r.r_addend + base
where
r
is the relocation,
s
is the symbol referenced by the relocation, and
base
is the ELF object’s base address (that is typically
0
for an executable). These relocations directly correspond to move and addition instructions of the form
mov *0xbeef0000, $0x02 # *(uint64_t*)0xbeef0000 = 2;
mov *0xbeef0000, [%foo] # *(uint64_t*)0xbeef0000 = *(uint64_t*)foo;
add *0xbeef0000, %foo, $0x02 # *(uint64_t*)0xbeef0000 = foo + 2;
For example, an addition instruction
add *0xbeef0000, %foo, $0x02
is encoded as a relocation entry with
r_offset=0xbeef0000
,
r_addend=2
, and
r_info
specifying relocation type
R_X86_64_64
and the symbol
foo
.
Branch instructions are implemented by modifying the data RTLD is using while processing the relocations.
The RTLD is maintaining a doubly linked list of
link_map
structures containing information for each ELF object, and it iterates over this list in reverse order when relocating the binary
while (lm != NULL) {
r = lm->dyn[DT_RELA];
end = r + lm->dyn[DT_RELASZ];
for (r ; r < end; r++)
relocate(lm, r, &dyn[DT_SYM]);
lm = lm->prev;
}
A branch instruction can be implemented by relocations modifying
lm->prev
,
lm->dyn[DT_RELASZ]
, or
lm->dyn[DT_RELA]
so that an
lm
structure is skipped, handled partially, or handled multiple times. There are some complications such as finding the address to modify, and to save/restore the original value to be able to exit the program. I refer to the paper for the details.
The only thing remaining is a conditional branch instruction. The paper constructs the
jnz
instruction by noting that symbols of the type
STT_GNU_IFUNC
are treated differently depending on if the section index is
0
or not, and the symbol can be initialized so that an
add
instruction of the form
add *0xbeef0000, %foo, $0x00
has the effect
*(uint64_t*)0xbeef0000 = (s.st_shindx != 0) ? CONSTANT : 0;
where
s
is the
Elf64_Sym
structure describing
foo
. This can now be used when rewriting the end marker
lm->dyn[DT_RELASZ]
so that RTLD does not enter the for-loop for the link map when the condition is
0
(and the machine is thus not jumping to the part of the program defined by those relocations), by relocations first writing the condition to test to
s.st_shindx
of a temporary symbol
s
, and then writing the symbol
(by adding
0
to it) to
lm->dyn[DT_RELASZ]
.
1. I'm playing fast and loose with the concept “Turing-complete”, as does most papers I have read. I had expected computer scientists to be more careful, and use terms like "linear bounded automaton" or similar...
Comments
Post a Comment