csegmentation-faultgetlinestrsep

getline() / strsep() combination causes segmentation fault


I'm getting a segmentation fault when running the code below.

It should basically read a .csv file with over 3M lines and do other stuff afterwards (not relevant to the problem), but after 207746 iterations it returns a segmentation fault. If I remove the p = strsep(&line,"|"); and just print the whole line it will print the >3M lines.

int ReadCSV (int argc, char *argv[]){

    char *line = NULL, *p;
    unsigned long count = 0;

    FILE *data;
    if (argc < 2) return 1;
    if((data = fopen(argv[1], "r")) == NULL){
        printf("the CSV file cannot be open");
        exit(0);
    }


    while (getline(&line, &len, data)>0) {

        p = strsep(&line,"|");  

        printf("Line number: %lu \t p: %s\n", count, p);
        count++;
    }

    free(line);
    fclose(data);

    return 0;
}

I guess it'd have to do with the memory allocation, but can't figure out how to fix it.


Solution

  • A combination of getline and strsep often causes confusion, because both functions change the pointer that you pass them by pointer as the initial argument. If you pass the pointer that has been through strsep to getline again, you run the risk of undefined behavior on the second iteration.

    Consider an example: getline allocates 101 bytes to line, and reads a 100-character string into it. Note that len is now set to 101. You call strsep, which finds '|' in the middle of the string, so it points line to what used to be line+50. Now you call getline again. It sees another 100-character line, and concludes that it is OK to copy it into the buffer, because len is still 101. However, since line points to the middle of the buffer now, writing 100 characters becomes undefined behavior.

    Make a copy of line before calling strsep:

    while (getline(&line, &len, data)>0) {
        char *copy = line;
        p = strsep(&copy, "|");  
        printf("Line number: %lu \t p: %s\n", count, p);
        count++;
    }
    

    Now line that you pass to getline is preserved between loop iterations.