I'm getting a segmentation fault when running the code below.
It should basically read a .csv
file with over 3M lines and do other stuff afterwards (not relevant to the problem), but after 207746 iterations it returns a segmentation fault. If I remove the p = strsep(&line,"|");
and just print the whole line
it will print the >3M lines.
int ReadCSV (int argc, char *argv[]){
char *line = NULL, *p;
unsigned long count = 0;
FILE *data;
if (argc < 2) return 1;
if((data = fopen(argv[1], "r")) == NULL){
printf("the CSV file cannot be open");
exit(0);
}
while (getline(&line, &len, data)>0) {
p = strsep(&line,"|");
printf("Line number: %lu \t p: %s\n", count, p);
count++;
}
free(line);
fclose(data);
return 0;
}
I guess it'd have to do with the memory allocation, but can't figure out how to fix it.
A combination of getline
and strsep
often causes confusion, because both functions change the pointer that you pass them by pointer as the initial argument. If you pass the pointer that has been through strsep
to getline
again, you run the risk of undefined behavior on the second iteration.
Consider an example: getline
allocates 101 bytes to line
, and reads a 100-character string into it. Note that len
is now set to 101. You call strsep
, which finds '|'
in the middle of the string, so it points line
to what used to be line+50
. Now you call getline
again. It sees another 100-character line, and concludes that it is OK to copy it into the buffer, because len
is still 101. However, since line
points to the middle of the buffer now, writing 100 characters becomes undefined behavior.
Make a copy of line
before calling strsep
:
while (getline(&line, &len, data)>0) {
char *copy = line;
p = strsep(©, "|");
printf("Line number: %lu \t p: %s\n", count, p);
count++;
}
Now line
that you pass to getline
is preserved between loop iterations.