cdebugginggdbcorruptionmemory-corruption

gdb - how to trap corruption using gdb


I am trying to find where/when corruption occurs in a new program. The program is only 495 lines, and gdb is not helping me debug it. (At least, not with my current knowledge set.) Consider the following :

> gdb psgrep-2020
(comments omitted)
Reading symbols from psgrep-2020...
(gdb) b 466
Breakpoint 1 at 0x3073: file psgrep-2020.c, line 466.
(gdb) run -F dnsmasq
Starting program: /usr/local/src/psgrep-2022/psgrep-2020 -F dnsmasq

Breakpoint 1, showProcess (pid=893) at psgrep-2020.c:466
466         if (printCmdline) {
(gdb) step
467             procNameFromCmdline(pid, strWork, sizeof(strWork), TRUE) ;
(gdb) p pid
$1 = 893
(gdb) p strWork
$2 = '\000' <repeats 1023 times>
(gdb) print sizeof(strWork)
$3 = 1024
(gdb) step
procNameFromCmdline (pid=0, result=0x0, resultLen=0, fullCmd=0 '\000') at psgrep-2020.c:58
58  int procNameFromCmdline(pid_t pid, char *result, int resultLen, BOOL fullCmd) {
(gdb) 

At the inception of the called process (procNameFromCmdline) we can see that every parameter is incorrect (TRUE equates to 1 via #define). Sometimes gdb shows this like:

procNameFromCmdline (pid=0, result=0x19c5b4 <error: Cannot access memory at address 0x19c5b4>, resultLen=1689012, fullCmd=0 '\000') 

I'm not trying to get someone else to find the problem for me; what I want to do is to find a way that I can detect when the program has been corrupted. I believe that all my memset, snprintf() and so on are correctly constrained; clearly though something has gone awry.

In case it's any help and to put things in perspective, here is the surrounding code from before the call...

    fpProcFile = fopen(sProcPath, "rt") ; // Open the stat file for reading text
    if (fpProcFile) {
        fscanf(fpProcFile
            , "%d %s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld %ld %ld %ld "
              "%ld %llu %lu %ld %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %d %d %u "
              "%u %llu %lu %ld %lu %lu %lu %lu %lu %lu %lu %d"
            , &(s->pid),           s->comm,       &(s->state),       &(s->ppid),        &(s->pgrp)
            , &(s->session),     &(s->tty_nr),    &(s->tpgid),       &(s->flags),       &(s->minflt)
            , &(s->cminflt),     &(s->majflt),    &(s->cmajflt),     &(s->utime),       &(s->stime)
            , &(s->cutime),      &(s->cstime),    &(s->priority),    &(s->nice),        &(s->num_threads)
            , &(s->itrealvalue), &(s->starttime), &(s->vsize),       &(s->rss),         &(s->rsslim)
            , &(s->startcode),   &(s->endcode),   &(s->startstack),  &(s->kstkesp),     &(s->kstkeip)
            , &(s->signal),      &(s->blocked),   &(s->sigignore),   &(s->sigcatch),    &(s->wchan)
            , &(s->nswap),       &(s->cnswap),    &(s->exit_signal), &(s->processor),   &(s->rt_priority)
            , &(s->policy),      &(s->delayacct_blkio_ticks)
                                               ,  &(s->guest_time),  &(s->cguest_time), &(s->start_data)
            , &(s->end_data),    &(s->start_brk), &(s->arg_start),   &(s->arg_end),     &(s->env_start)
            , &(s->env_end),     &(s->exit_code)
             ) ;
        fclose(fpProcFile) ;
        processName = s->comm ;
        memset(strWork, 0x00, sizeof(strWork)) ;
        if (printCmdline) {
            procNameFromCmdline(pid, strWork, sizeof(strWork), TRUE) ;

(the only %s in that fscanf points to a char[65535] and the value for that field in /fproc/893/stat has length 9 plus a terminator. As per documentation, 16 is enough. But that's not the point anyway.)

Is there a way? Do I need a more professional debugger?

(Although I am not looking for someone to solve the problem of this program, it seems to have gained some interest. In that light, I am posting the struct used within the referenced code.) This is documented in the linux kernel source (fs/proc/array.c) and (not my version) can be seen [here][1] and many other places.

struct myProcStat {
    int pid ;       // Process ID
    char comm[65535] ; // Command name limited to 16 bytes
    char state ;    // R=Running S=Sleeping D=WaitingDisk Z=Zombie T=Stopped 
                    // t=TracingStopped W=Paging X=Dead x=Dead K=Wakekill
                    // W=Waking P=Parked
    int ppid ;      // Parent process ID
    int pgrp ;      // Process group ID
    int session ;   // Session ID
    int tty_nr ;    // Controlling terminal
    int tpgid ;     // Foreground process group
    unsigned int flags ; // Kernel flags
    unsigned long int minflt ; // Number of minor faults
    unsigned long int cminflt ; // Children's minor faults
    unsigned long int majflt ; // Number of major faults
    unsigned long int cmajflt ; // Children's major faults
    unsigned long int utime ; // Amount of time scheduled user mode
    unsigned long int stime ; // Amount of time scheduled kernel mode
    long int cutime ;   // Amount of time waited-for children scheduled user mode
    long int cstime ;   // Amount of time waited-for children scheduled kernel mode
    long int priority ; // Priority running real-time scheduling policy
    long int nice ;     // Nice value
    long int num_threads ; // Number of threads in this process
    long int itrealvalue ; // Time in jiffies before next SIGALARM is sent
// 21 above, 22 next ...
    unsigned long long int starttime ; // Start tine (in clock ticks) after system boot (divide by sysconf(_SC_CLK_TCK))
    unsigned long int vsize ; // Virtual memory size in bytes
    long int rss ;  // Resident set size
    unsigned long int rsslim ; // Current soft limit in bytes on rss
    unsigned long int startcode ; // address above which text can be run
    unsigned long int endcode ;   // Address below which text can be run
    unsigned long int startstack ; // Address of the start (bottom) of the stack
    unsigned long int kstkesp ; // Current stack pointer from kernel perspective
    unsigned long int kstkeip ; // Current EIP (instruction pointer)
    unsigned long int signal ; // Bitmap of pending signals as a decimal number. Obsolete. use /proc/[pid]/status instead.
    unsigned long int blocked ; // Bitmap of blocked signals. Obsolete. Use /proc/[pid]/status instead
    unsigned long int sigignore ; // Bitmap of ignored signals. Obsolete, use /proc/[pid]/status instead
    unsigned long int sigcatch ; // Bitmap of caught signals.  Use /proc/[pid]/status instead
    unsigned long int wchan ; // Channel in which process is waiting. Use with /proc/[pid]/wchan
    unsigned long int nswap ; // Number of pages swapped (not maintained - ignore)
    unsigned long int cnswap ; // Number of child process pages swapped (not maintained - ignore)
    int exit_signal ; // Signal to be sent to parent upon death
    int processor ;   // CPU last executed on
    unsigned int rt_priority ; // Real-time scheduling priority
    unsigned int policy ; // Scheduling policy for real-time scheduling
    unsigned long long int delayacct_blkio_ticks ; // Aggregated block I/O delays, in clock ticks
    unsigned long int guest_time ; // Guest time (time spent running virtual CPU for guest OS)
    unsigned long int cguest_time ; // Guest time of processes' children
    unsigned long int start_data ; // Address above which program BSS data are placed
    unsigned long int end_data ; // Address below which program BSS data are placed
    unsigned long int start_brk ; // Address above which program heap can be expanded
    unsigned long int arg_start ; // Address above which program command-line arguments (argv) are placed
    unsigned long int arg_end ; // Address below which argv are placed
    unsigned long int env_start ; // Address above which environment is placed
    unsigned long int env_end ; // Address below which environment is placed
    int exit_code ; // The thread's exit status in form reported by waitpid(2)
    } ;


  [1]: https://elixir.bootlin.com/linux/latest/source/fs/proc/array.c

Solution

  • At the inception of the called process (procNameFromCmdline) we can see that every parameter is incorrect

    This most likely means that GDB didn't skip the function prolog (like it's supposed to). Most likely because of this.

    If you do another step or next, parameters will suddenly become correct again.

    Note that the above bug has been fixed in newer GDB versions, so updating GDB is another solution.