linux-kernelftrace

restoring task pt_regs when returning to original function from ftrace handler


With a kernel module (LKM), the linux kernel ftrace functions allow you to set the FTRACE_OPS_FL_SAVE_REGS and FTRACE_OPS_FL_IPMODIFY flags, essentially allowing you to completely redirect any kernel function you can find the symbol address for, like this:

static void notrace my_ftrace_handler(unsigned long ip, unsigned long parent_ip,
        struct ftrace_ops *fops, struct pt_regs *regs) {
    regs->ip = new_addr;
}

Where new_addr is the address of the new function. The kpatch tool uses this, although never returns to the original function.

If at the end of the function pointed to by new_addr I try this:

task_pt_regs(current)->ip = orig_addr + MCOUNT_INSN_SIZE;

Some functions proceed without a problem, but most cause a segfault of the calling process.

The ftrace functions have built-in code to restore the current task's pt_regs upon return to the original function, which is why I'm able to go to my own function and have the arguments without a problem. However, at this point in the code, ftrace is no longer involved. How would I tell the kernel not to reset the current registers, so they can be used by the function at the new return address?


Solution

  • After posting this I had the thought that maybe I could read arguments directly from the pt_regs *regs pointer, within the ftrace handler. Turns out, you can. By not redirecting to another function, you can preserve the registers and the return address, while deciding whether you return there or somewhere else from the handler itself:

    int donotexec(void) {
            return -EACCES;
    }
    
    static void notrace my_ftrace_handler(unsigned long ip, unsigned long parent_ip,
                        struct ftrace_ops *fops, struct pt_regs *regs) {
    
        struct linux_binprm *bprm = (struct linux_binprm *)regs->di;
    
        if (bprm->file)
                if (allowed_to_exec(bprm->file))
                        regs->ip = (unsigned long)donotexec;
    }
    

    This function hooks security_bprm_check, where allowed_to_exec is another function that checks the bprm->file which was read from the regs->di register.

    This is arch dependant (see the kernel's pt_regs struct in arch/x86/include/asm/ptrace.h) and is limited to 5 function arguments.