I am working on an embedded Linux (kernel-5.10.24) and now I am running a C program to do a stress testing about file copying. The code is using stdio
to read and write file, as follows,
#include <stdio.h>
#include <stdlib.h> // For exit()
#include <unistd.h>
#include <string.h>
#ifndef BUF_SIZE /* Allow "cc -D" to override definition */
#define BUF_SIZE 1024
#endif
static unsigned char buf[BUF_SIZE];
static char *infname = "/tmp/src_file.bin";
static char *ofname = "/tmp/dst_file.bin";
static int create_dest_file(const char *fname)
{
FILE *fp = fopen(fname, "w");
if (fp == NULL) {
printf("Failed to create/truncate %s\n", fname);
return 1;
}
fclose(fp);
return 0;
}
static int copy_file(const char *src, const char *dest)
{
FILE *fp, *fp2;
int rlen = 0, wlen = 0, rc = 0;
// Open one file for reading
fp = fopen(src, "r");
if (fp == NULL)
{
printf("Cannot open file %s\n", src);
return 1;
}
fp2 = fopen(dest, "ab");
if (fp2 == NULL) {
fclose(fp);
return 1;
}
while (1) {
rlen = fread(buf, 1, sizeof(buf), fp);
if (rlen > 0) {
wlen = fwrite(buf, 1, rlen, fp2);
if (wlen != rlen) {
printf("Wrote len: %d, read len: %d\n", wlen, rlen);
rc = 1;
break;
}
} else {
break;
}
}
fclose(fp);
fclose(fp2);
return rc;
}
int main(int argc, char **argv)
{
int i = 0, rc = 0;
int us = 500000;
if (argc != 4) {
printf("Usage: %s srcfile dstfile delay_in_us\n", argv[0]);
return 1;
}
infname = argv[1];
ofname = argv[2];
us = atoi(argv[3]);
printf("Copying %s to %s\n", infname, ofname);
for (i = 0; i < 1000; i++) {
create_dest_file(ofname);
rc = copy_file(infname, ofname);
printf("XXXXXXXXXXX %d, rc: %d\n", i, rc);
usleep(us);
}
return 0;
}
After compile it and run it with ./filecopy /root/16MB_src.bin /root/dest.bin 250000
, I would got several different types of error like Segmentation fault
, Bus error
, Illegal instruction
, and so on.
I installed the GDB to run filecopy
, and got one following error.
XXXXXXXXXXX 45, rc: 0
XXXXXXXXXXX 46, rc: 0
Fatal error: glibc detected an invalid stdio handle
Program received signal SIGABRT, Aborted.
0x77cdfd44 in ?? () from /lib/libc.so.6
(gdb) bt
#0 0x77cdfd44 in ?? () from /lib/libc.so.6
#1 0x77c964ac in raise () from /lib/libc.so.6
#2 0x77c97ae4 in abort () from /lib/libc.so.6
warning: GDB can't find the start of the function at 0x77cd0c97.
#3 0x77cd0c98 in ?? () from /lib/libc.so.6
(gdb)
I checked the code and asked other colleagues to review the code, no error found :-(.
From the error types, the code triggered same random failures, but I cannot find the root-cause.
The system has 64MB RAM, and the source file is about 18MB, the libc is GLIBC2.38.
With many tests, it is found if the source file is about 1MB, the program ran well, no error hit.
If the source file about 8MB, and 18MB, the program hit errors.
If the file (18MB) is read from NAND and written to RAM, it ran well.
If the file (18MB) is read from RAM and written to NAND, it hit error.
The output of free -k
showed
# free -k
total used free shared buff/cache available
Mem: 54580 13228 9404 344 31948 38260
Swap: 0 0 0
# free -k
total used free shared buff/cache available
Mem: 54580 13252 17480 344 23848 38240
Swap: 0 0 0
# free -k
total used free shared buff/cache available
Mem: 54580 13260 11936 344 29384 38232
Swap: 0 0 0
# free -k
total used free shared buff/cache available
Mem: 54580 13252 6656 344 34672 38240
Swap: 0 0 0
# free -k
total used free shared buff/cache available
Mem: 54580 13244 19304 344 22032 38224
The memory is NOT used up.
There doesn't seem to be anything wrong with your program, your problem likely lies elsewhere.
I would got several different types of error like Segmentation fault, Bus error, Illegal instruction, and so on.
All of these could be the result of bad memory (reading back values other than what was written previously).
I suggest running a memory checker program to confirm or rule out this possibility.
You could also run other memory-intensive known stable applications (such as gcc
itself) -- if you see crashes in them as well, "bad memory" is a very likely root cause.