csystem-callsptracebpfseccomp

How to get the return code of the syscall using SECCOMP_RET_DATA and PTRACE_GETEVENTMSG


I'm a little bit confused trying to obtaining syscall's return value using ptrace + seccomp.

man 4 bpf says:

 FILTER MACHINE
A filter program is an array of instructions, with  all branches forwardly 
directed, terminated by a return instruction

man 2 ptrace says:

 PTRACE_O_TRACESECCOMP  
While this triggers a PTRACE_EVENT stop, it is
similar to a syscall-enter-stop, in that the tracee has not yet
entered the syscall that seccomp triggered on. The seccomp event
message data (from the SECCOMP_RET_DATA portion of the seccomp filter
rule) can be retrieved with PTRACE_GETEVENTMSG.

 PTRACE_GETEVENTMSG 
For PTRACE_EVENT_SECCOMP, this is the seccomp(2)
filter's SECCOMP_RET_DATA associated with the triggered rule.

man 2 seccomp says:

 SECCOMP_RET_TRACE
The tracer will be notified of a 
PTRACE_EVENT_SECCOMP  and  the  SECCOMP_RET_DATA
portion of the filter's return value will be available to 
the tracer via PTRACE_GETEVENTMSG
 [...]
The seccomp check will not be run again after the tracer is notified.

It turns out that the BPF program can not perform something further after the BPF_RET statement. So when tracee is interrupted on SECCOMP_RET_TRACE it's in the syscall-enter-stop state and the syscall has not yet been made, therefore, the return code is definitely nowhere to take. I expect that after a subsequent call PTRACE_SYSCALL, tracee will be in the syscall-exit-stop state and tracer will be able to get the result of the syscall using PTRACE_GETEVENTMSG. But it doesn't work in my sample.

#include <linux/filter.h>
#include <linux/seccomp.h>
#include <linux/unistd.h>
#include <stddef.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char **argv)
{
    pid_t pid;
    int status;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s <prog> <arg1> ... <argN>\n", argv[0]);
        return 1;
    }

    if ((pid = fork()) == 0) {
        ptrace(PTRACE_TRACEME, 0, 0, 0);

        struct sock_filter filter[] = {
            BPF_STMT(BPF_LD | BPF_W | BPF_ABS, (offsetof(struct seccomp_data, nr))),
            BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_open, 1, 2),
            BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_openat, 0, 1),
            BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_TRACE | SECCOMP_RET_DATA),
            BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
        };
        struct sock_fprog prog = {
            .filter = filter,
            .len = (unsigned short) (sizeof(filter)/sizeof(filter[0])),
        };

        if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) == -1)
            return 2;
        if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) == -1)
            return 3;

        kill(getpid(), SIGSTOP);
        return execvp(argv[1], argv + 1);
    } else {
        waitpid(pid, &status, 0);
        ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESECCOMP);
        ptrace(PTRACE_CONT, pid, 0, 0);

        int status = 0;
        unsigned long ret_data = 0;
        while(1) {
            while (1) {
                waitpid(pid, &status, 0);
                fprintf(stderr, "status = %08x\n", status);

                if (status >> 8 == (SIGTRAP | (PTRACE_EVENT_SECCOMP << 8)))
                    break;

                if (WIFEXITED(status))
                    return 0;
                ptrace(PTRACE_CONT, pid, 0, 0);
            }
            // restart stopped tracee
            ptrace(PTRACE_SYSCALL, pid, 0, 0);
            // wait for SIGTRAP, when tracee will be in the syscall-exit-stop state
            waitpid(pid, &status, 0);

            ptrace(PTRACE_GETEVENTMSG, pid, 0, &ret_data);
            fprintf(stderr, "retdat = %lu\n", ret_data);

            ptrace(PTRACE_CONT, pid, 0, 0);
        }
        return 0;
    }
}

I am able to get syscall's return code inspecting registers

    // ptrace(PTRACE_GETEVENTMSG, pid, 0, &ret_data);
    struct user_regs_struct regs;
    ptrace(PTRACE_GETREGS, pid, 0, &regs);
    fprintf(stderr, "retdat = %lu\n", regs.rax);

but I wonder how to do it in the way specified in the documentation.


Solution

  • How to get the return code of the syscall using SECCOMP_RET_DATA and PTRACE_GETEVENTMSG?

    The simple answer is you can't. The seccomp event is sent before even entering the system call. You can't see any result there as there has not been any system call yet. To get one, you must spin the process twice with PTRACE_SYSCALL after you have received your seccomp event:

    bool WaitForSyscallExit(const pid_t pid)
    {
      bool entered = false;
      int  status  = 0;
    
      while (true)
      {
        ptrace(PTRACE_SYSCALL, pid, 0, 0);
        waitpid(pid, &status, 0);
    
        if (WSTOPSIG(status) == SIGTRAP)
        {
          if (entered)
          {
            // If we had already entered before, then current SIGTRAP signal means exiting
            break;
          }
          entered = true;
        }
        else if (WIFEXITED(status) || WIFSIGNALED(status) || WCOREDUMP(status))
        {
          std::cerr << "The child has unexpectedly exited." << std::endl;
    
          return false;
        }
      }
    
      return true;
    }
    

    As PTRACE_SYSCALL is used, the process will be stopped twice (first time after entering the system call, next and last time after exiting it). You can take the result only after the system call has actually finished, so after the second process stop. And yes, you can only do this by reading registers manually, as seccomp structure can only be used within the seccomp trace handler for this event. Even the structure itself does not contain anything related to the system call result, and the man pages don't mention getting the result value as well.