cassemblyarmgdbarm64

A hand-coded branch to a bounce bed was not taken but taken with debugger attached


About one month ago I tried to implement a bounce-bed-like struct in C, which encapsules a short piece of assembly code that hooks a function with another.

The idea was simple: whenever a bounce_bed is created, I will replace the first instruction of the hooked function to a branch to the bounce_bed, and in the bounce_bed I encodes a short assembly function which executes the stack operations, then call the hook and back, finally jump back to that function.

Here is my code:

#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

typedef uint32_t u32;

static inline u32
encode_b (unsigned long target_addr)
{
  return 0x14000000 | (0x03ffffff & (target_addr >> 2));
}

static inline u32
encode_bl (unsigned long target_addr)
{
  return 0x94000000 | (0x03ffffff & (target_addr >> 2));
}

/**
 * a bounce_bed is an entry to a hook function
 * use trace(func, hook) to set it up.
 */
struct __attribute__ ((aligned (4))) bounce_bed
{
  u32 first_inst;  // 0xa9bc7bfd
  u32 second_inst; // 0x910003fd
  u32 stp_x0_x1;   // 0xa9be07e0
  u32 stp_x2_x3;   // 0xa9be0fe2
  u32 bl_trace_hook;
  u32 ldp_x2_x3; // 0xa8c20fe2
  u32 ldp_x0_x1; // 0xa8c207e0
  u32 b_trace_point;
};

struct bounce_bed *
trace (void *func_addr, void *hook_addr)
{
  u32 *ptr = func_addr;
  size_t page_size = getpagesize ();
  void *page_aligned_addr;
  struct bounce_bed *tnode = malloc (sizeof (struct bounce_bed));

  tnode->first_inst = ptr[0];    // normally stp fp, lr, [sp, #-32]!
  tnode->second_inst = ptr[1];   // normally mov fp, sp
  tnode->stp_x0_x1 = 0xa9be07e0; // stp x0, x1, [sp, #-32]!
  tnode->stp_x2_x3 = 0xa9be0fe2; // stp x2, x3, [sp, #-32]!
  tnode->bl_trace_hook
      = encode_bl ((void *)hook_addr - (void *)&tnode->bl_trace_hook);
  tnode->ldp_x2_x3 = 0xa8c20fe2; // ldp x2, x3. [sp], #32
  tnode->ldp_x0_x1 = 0xa8c207e0; // ldp x0, x1, [sp], #32
  tnode->b_trace_point
      = encode_b (((void *)func_addr + 8) - (void *)&tnode->b_trace_point);

  page_aligned_addr = (void *)((uintptr_t)tnode & ~(page_size - 1));
  mprotect (page_aligned_addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC);

  page_aligned_addr = (void *)((uintptr_t)func_addr & ~(page_size - 1));
  mprotect (page_aligned_addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC);

  ptr[0] = encode_b ((void *)tnode - (void *)func_addr); // b tnode
  ptr[1] = 0xd5033fdf;                                   // isb

  return tnode;
}

/** computes x power y */
int
mypow (int x, int y)
{
  if (y > 0)
    return x * mypow (x, y - 1);
  else
    return 1;
}

/** this is called before hooked function */
void
myhook (void)
{
  unsigned long x1;

  asm ("mov %0, x1" : "=r"(x1));
  printf ("%s: x1 = %#lx\n", __func__, x1);
  return;
}

int
main (int argc, char *argv[])
{
  if (argc != 3)
    return -1;

  /* read arguments */
  int x = strtod (argv[1], NULL);
  int y = strtod (argv[2], NULL);

  /* attach myhook to mypow */
  struct bounce_bed *tp = trace (mypow, myhook);
  /* test */
  int c = mypow (x, y);
  printf ("%d\n", c);
  free (tp);
  return 0;
}

All the tests were executed on an arm64 board (rk3588 with linux bsp kernel 5.10), but I also tested it on an arm64 server (ampere, kernel 5.10, gcc-12.0) later, the answer was the same.

I compiled it with -g option using gcc-13.1 and tried to run it with argument 2 4, it returned normally but no print was shown. But, prints from myhook is expected. That means, myhook is not called, actually, the jump to bounce bed is not taken.

But when I run it with gdb step by step, it seems that every step was correct, the prints from myhook showed as expected, and finally it returned normally ... I checked the addresses and found everything is correct.

What's more, I also tried to run it with valgrind to detect if there is memory leak, and I found that the prints from myhook also showed, no memory leak detected.

I tried to disable the ASLR (address space layout randomization), nothing happened.

It is so strange, I have checked related questions on stackoverflow but found no idea helpful to this.


Solution

  • Indeed, this program lacks of cache management operations, but that is not why it is safe and sound in gdb. The key cause is this function:

    static inline u32
    encode_b (unsigned long target_addr)
    {
      return 0x14000000 | (0x03ffffff & (target_addr >> 2));
    }
    

    which encodes an unconditional branch instruction, can only handle at most a 26-bit offset.

    However, the distance of .text and .data is usually larger than 0x3ffffff, that will cause an incorrect branch instruction, which jumps to an invalid address.

    In gdb, however, all segments are allocated by gdb, so the distance between addresses of segments becomes much smaller.

    So, things is a little bit zig-zag. We have to create an empty function int .text section as the middle bounce bed. An alternative choice is to allocated that function as an intermediate structure on the stack rather than on the heap, like what gcc does for its nested function extension.