Hello, World!

Cool, we can locate functions by name but how do we figure out what they actually do?

As expected reverse engineering from here on is a bit more difficult because we are digging through the assembly code contained in Instructions objects.

Instead of using a modern compiler backend like clang, Dart actually uses its JIT compiler for code generation but with a couple AOT specific optimizations.

If you have never worked with JIT code, it is a bit bloated in some places compared to what the equivalent C code would produce. Not that Dart is doing a bad job though, it’s designed to be generated quickly at runtime and the hand-written assembly for common instructions often beats clang/gcc in terms of performance.

Generated code being less micro-optimized actually works heavily to our advantage since it closer resembles the higher level IR used to generate it.

Most of the relevant code generation can be found in:

  • vm/compiler/backend/il_<arch>.cc
  • vm/compiler/assembler/assembler_<arch>.cc
  • vm/compiler/asm_intrinsifier_<arch>.cc
  • vm/compiler/graph_intrinsifier_<arch>.cc

Here is the register layout and calling conventions for dart’s A64 assembler:

#lint reg-tbl
       r0 |     | Returns
r0  -  r7 |     | Arguments
r0  - r14 |     | General purpose
      r15 | sp  | Dart stack pointer
      r16 | ip0 | Scratch register
      r17 | ip1 | Scratch register
      r18 |     | Platform register
r19 - r25 |     | General purpose
r19 - r28 |     | Callee saved registers
      r26 | thr | Current thread
      r27 | pp  | Object pool
      r28 | brm | Barrier mask
      r29 | fp  | Frame pointer
      r30 | lr  | Link register
      r31 | zr  | Zero / CSP

This ABI follows the standard AArch64 calling conventions here but with a few global registers:

  • R26 / THR: Pointer to the running vm Thread, see vm/thread.h
  • R27 / PP: Pointer to the ObjectPool of the current context, see vm/object.h
  • R28 / BRM: The barrier mask, used for incremental garbage collection

Similarly, this is the register layout for A32:

#lint reg-tbl
r0 -  r1 |     | Returns
r0 -  r9 |     | General purpose
r4 - r10 |     | Callee saved registers
      r5 | pp  | Object pool
     r10 | thr | Current thread
     r11 | fp  | Frame pointer
     r12 | ip  | Scratch register
     r13 | sp  | Stack pointer
     r14 | lr  | Link register
     r15 | pc  | Program counter

While A64 is a more common target I’ll mostly be covering A32 since its is simpler to read and disassemble.

You can view the IR along with the disassembly by passing --disassemble-optimized to gen_snapshot, but note this only works on the debug/release targets and not product.

As an example, when compiling hello world:

void hello() {
  print("Hello, World!");

Scrolling down a bit in the disassembly you will find:

#lint dartvm-dasm
Code for optimized function 'package:dectest/hello_world.dart_::_hello' {
        ;; B0
        ;; B1
        ;; Enter frame
0xf69ace60    e92d4800               stmdb sp!, {fp, lr}
0xf69ace64    e28db000               add fp, sp, #0
        ;; CheckStackOverflow:8(stack=0, loop=0)
0xf69ace68    e59ac024               ldr ip, [thr, #+36]
0xf69ace6c    e15d000c               cmp sp, ip
0xf69ace70    9bfffffe               blls +0 ; 0xf69ace70
        ;; PushArgument(v3)
0xf69ace74    e285ca01               add ip, pp, #4096
0xf69ace78    e59ccfa7               ldr ip, [ip, #+4007]
0xf69ace7c    e52dc004               str ip, [sp, #-4]!
        ;; StaticCall:12( print<0> v3)
0xf69ace80    ebfffffe               bl +0 ; 0xf69ace80
0xf69ace84    e28dd004               add sp, sp, #4
        ;; ParallelMove r0 <- C
0xf69ace88    e59a0060               ldr r0, [thr, #+96]
        ;; Return:16(v0)
0xf69ace8c    e24bd000               sub sp, fp, #0
0xf69ace90    e8bd8800               ldmia sp!, {fp, pc}
0xf69ace94    e1200070               bkpt #0x0

What is printed here is slightly different from a snapshot built in product but the important part is that we can see the IR instructions alongside assembly.

Breaking it down:

#lint dartvm-dasm
        ;; Enter frame
0xf6a6ce60    e92d4800               stmdb sp!, {fp, lr}
0xf6a6ce64    e28db000               add fp, sp, #0

This is a standard function prologue, the frame pointer of the caller and link register are pushed to the stack after which the frame pointer is set to the bottom of the function stack frame.

As with the standard ARM ABI, this uses a full-descending stack meaning it grows backwards in memory.

#lint dartvm-dasm
        ;; CheckStackOverflow:8(stack=0, loop=0)
0xf6a6ce68    e59ac024               ldr ip, [thr, #+36]
0xf6a6ce6c    e15d000c               cmp sp, ip
0xf6a6ce70    9bfffffe               blls +0 ; 0xf6a6ce70

This is a simple routine which does what you probably guessed, checks if the stack overflowed.

Sadly their disassembler does not annotate either thread fields or branch targets so you have to do some digging.

A list of field offsets can be found in vm/compiler/runtime_offsets_extracted.h, which defines Thread_stack_limit_offset = 36 telling us that the field accessed is the threads stack limit.

After the stack pointer is compared, it calls the stackOverflowStubWithoutFpuRegsStub stub if it has overflowed. The branch target in the disassembly appears to be un-patched but we can still inspect the binary afterwards to confirm.

#lint dartvm-dasm
        ;; PushArgument(v3)
0xf6a6ce74    e285ca01               add ip, pp, #4096
0xf6a6ce78    e59ccfa7               ldr ip, [ip, #+4007]
0xf6a6ce7c    e52dc004               str ip, [sp, #-4]!

Here an object from the object pool is pushed onto the stack. Since the offset is too big to fit in an ldr offset encoding it uses an extra add instruction.

This object is in fact our “Hello, World!” string as a RawOneByteString* stored in the globalObjectPool of our isolate at offset 8103.

You may have noticed that offsets are misaligned, this is because object pointers are tagged with kHeapObjectTag from vm/pointer_tagging.h, in this case all of the pointers to RawObjects in compiled code are offset by 1.

#lint dartvm-dasm
        ;; StaticCall:12( print<0> v3)
0xf6a6ce80    ebfffffe               bl +0 ; 0xf6a6ce80
0xf6a6ce84    e28dd004               add sp, sp, #4

Here print is called followed by the string argument being popped from the stack.

Like before the branch hasn’t been resolved, it is a relative branch to the entry point for print in dart:core.

#lint dartvm-dasm
        ;; ParallelMove r0 <- C
0xf69ace88    e59a0060               ldr r0, [thr, #+96]

Null is loaded into the return register, 96 being the offset to the null object field in a Thread.

#lint dartvm-dasm
        ;; Return:16(v0)
0xf69ace8c    e24bd000               sub sp, fp, #0
0xf69ace90    e8bd8800               ldmia sp!, {fp, pc}
0xf69ace94    e1200070               bkpt #0x0

And finally the function epilogue, the stack frame is restored along with any callee-saved registers. Since lr was pushed last, popping it into pc will cause the function to return.

From now on I’ll be using snippets from my own disassembler which has less problems than the builtin one.