I am trying to find where/when corruption occurs in a new program. The program is only 495 lines, and gdb is not helping me debug it. (At least, not with my current knowledge set.) Consider the following :
> gdb psgrep-2020
(comments omitted)
Reading symbols from psgrep-2020...
(gdb) b 466
Breakpoint 1 at 0x3073: file psgrep-2020.c, line 466.
(gdb) run -F dnsmasq
Starting program: /usr/local/src/psgrep-2022/psgrep-2020 -F dnsmasq
Breakpoint 1, showProcess (pid=893) at psgrep-2020.c:466
466 if (printCmdline) {
(gdb) step
467 procNameFromCmdline(pid, strWork, sizeof(strWork), TRUE) ;
(gdb) p pid
$1 = 893
(gdb) p strWork
$2 = '\000' <repeats 1023 times>
(gdb) print sizeof(strWork)
$3 = 1024
(gdb) step
procNameFromCmdline (pid=0, result=0x0, resultLen=0, fullCmd=0 '\000') at psgrep-2020.c:58
58 int procNameFromCmdline(pid_t pid, char *result, int resultLen, BOOL fullCmd) {
(gdb)
At the inception of the called process (procNameFromCmdline) we can see that every parameter is incorrect (TRUE equates to 1 via #define). Sometimes gdb shows this like:
procNameFromCmdline (pid=0, result=0x19c5b4 <error: Cannot access memory at address 0x19c5b4>, resultLen=1689012, fullCmd=0 '\000')
I'm not trying to get someone else to find the problem for me; what I want to do is to find a way that I can detect when the program has been corrupted. I believe that all my memset, snprintf() and so on are correctly constrained; clearly though something has gone awry.
In case it's any help and to put things in perspective, here is the surrounding code from before the call...
fpProcFile = fopen(sProcPath, "rt") ; // Open the stat file for reading text
if (fpProcFile) {
fscanf(fpProcFile
, "%d %s %c %d %d %d %d %d %u %lu %lu %lu %lu %lu %lu %ld %ld %ld %ld %ld "
"%ld %llu %lu %ld %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %lu %d %d %u "
"%u %llu %lu %ld %lu %lu %lu %lu %lu %lu %lu %d"
, &(s->pid), s->comm, &(s->state), &(s->ppid), &(s->pgrp)
, &(s->session), &(s->tty_nr), &(s->tpgid), &(s->flags), &(s->minflt)
, &(s->cminflt), &(s->majflt), &(s->cmajflt), &(s->utime), &(s->stime)
, &(s->cutime), &(s->cstime), &(s->priority), &(s->nice), &(s->num_threads)
, &(s->itrealvalue), &(s->starttime), &(s->vsize), &(s->rss), &(s->rsslim)
, &(s->startcode), &(s->endcode), &(s->startstack), &(s->kstkesp), &(s->kstkeip)
, &(s->signal), &(s->blocked), &(s->sigignore), &(s->sigcatch), &(s->wchan)
, &(s->nswap), &(s->cnswap), &(s->exit_signal), &(s->processor), &(s->rt_priority)
, &(s->policy), &(s->delayacct_blkio_ticks)
, &(s->guest_time), &(s->cguest_time), &(s->start_data)
, &(s->end_data), &(s->start_brk), &(s->arg_start), &(s->arg_end), &(s->env_start)
, &(s->env_end), &(s->exit_code)
) ;
fclose(fpProcFile) ;
processName = s->comm ;
memset(strWork, 0x00, sizeof(strWork)) ;
if (printCmdline) {
procNameFromCmdline(pid, strWork, sizeof(strWork), TRUE) ;
(the only %s in that fscanf points to a char[65535] and the value for that field in /fproc/893/stat has length 9 plus a terminator. As per documentation, 16 is enough. But that's not the point anyway.)
Is there a way? Do I need a more professional debugger?
(Although I am not looking for someone to solve the problem of this program, it seems to have gained some interest. In that light, I am posting the struct used within the referenced code.) This is documented in the linux kernel source (fs/proc/array.c) and (not my version) can be seen [here][1] and many other places.
struct myProcStat {
int pid ; // Process ID
char comm[65535] ; // Command name limited to 16 bytes
char state ; // R=Running S=Sleeping D=WaitingDisk Z=Zombie T=Stopped
// t=TracingStopped W=Paging X=Dead x=Dead K=Wakekill
// W=Waking P=Parked
int ppid ; // Parent process ID
int pgrp ; // Process group ID
int session ; // Session ID
int tty_nr ; // Controlling terminal
int tpgid ; // Foreground process group
unsigned int flags ; // Kernel flags
unsigned long int minflt ; // Number of minor faults
unsigned long int cminflt ; // Children's minor faults
unsigned long int majflt ; // Number of major faults
unsigned long int cmajflt ; // Children's major faults
unsigned long int utime ; // Amount of time scheduled user mode
unsigned long int stime ; // Amount of time scheduled kernel mode
long int cutime ; // Amount of time waited-for children scheduled user mode
long int cstime ; // Amount of time waited-for children scheduled kernel mode
long int priority ; // Priority running real-time scheduling policy
long int nice ; // Nice value
long int num_threads ; // Number of threads in this process
long int itrealvalue ; // Time in jiffies before next SIGALARM is sent
// 21 above, 22 next ...
unsigned long long int starttime ; // Start tine (in clock ticks) after system boot (divide by sysconf(_SC_CLK_TCK))
unsigned long int vsize ; // Virtual memory size in bytes
long int rss ; // Resident set size
unsigned long int rsslim ; // Current soft limit in bytes on rss
unsigned long int startcode ; // address above which text can be run
unsigned long int endcode ; // Address below which text can be run
unsigned long int startstack ; // Address of the start (bottom) of the stack
unsigned long int kstkesp ; // Current stack pointer from kernel perspective
unsigned long int kstkeip ; // Current EIP (instruction pointer)
unsigned long int signal ; // Bitmap of pending signals as a decimal number. Obsolete. use /proc/[pid]/status instead.
unsigned long int blocked ; // Bitmap of blocked signals. Obsolete. Use /proc/[pid]/status instead
unsigned long int sigignore ; // Bitmap of ignored signals. Obsolete, use /proc/[pid]/status instead
unsigned long int sigcatch ; // Bitmap of caught signals. Use /proc/[pid]/status instead
unsigned long int wchan ; // Channel in which process is waiting. Use with /proc/[pid]/wchan
unsigned long int nswap ; // Number of pages swapped (not maintained - ignore)
unsigned long int cnswap ; // Number of child process pages swapped (not maintained - ignore)
int exit_signal ; // Signal to be sent to parent upon death
int processor ; // CPU last executed on
unsigned int rt_priority ; // Real-time scheduling priority
unsigned int policy ; // Scheduling policy for real-time scheduling
unsigned long long int delayacct_blkio_ticks ; // Aggregated block I/O delays, in clock ticks
unsigned long int guest_time ; // Guest time (time spent running virtual CPU for guest OS)
unsigned long int cguest_time ; // Guest time of processes' children
unsigned long int start_data ; // Address above which program BSS data are placed
unsigned long int end_data ; // Address below which program BSS data are placed
unsigned long int start_brk ; // Address above which program heap can be expanded
unsigned long int arg_start ; // Address above which program command-line arguments (argv) are placed
unsigned long int arg_end ; // Address below which argv are placed
unsigned long int env_start ; // Address above which environment is placed
unsigned long int env_end ; // Address below which environment is placed
int exit_code ; // The thread's exit status in form reported by waitpid(2)
} ;
[1]: https://elixir.bootlin.com/linux/latest/source/fs/proc/array.c
At the inception of the called process (procNameFromCmdline) we can see that every parameter is incorrect
This most likely means that GDB didn't skip the function prolog (like it's supposed to). Most likely because of this.
If you do another step
or next
, parameters will suddenly become correct again.
Note that the above bug has been fixed in newer GDB versions, so updating GDB is another solution.