cperformancemallocfseek

C malloc/free + fgets performance


As I loop through lines in file A, I am parsing the line and putting each string (char*) into a char**.

At the end of a line, I then run a procedure that consists of opening file B, using fgets, fseek and fgetc to grab characters from that file. I then close file B.

I repeat reopening and reclosing file B for each line.

What I would like to know is:

  1. Is there a significant performance hit from using malloc and free, such that I should use something static like myArray[NUM_STRINGS][MAX_STRING_WIDTH] instead of a dynamic char** myArray?

  2. Is there significant performance overhead from opening and closing file B (conceptually, many thousands of times)? If my file A is sorted, is there a way for me to use fseek to move "backwards" in file B, to reset where I was previously located in file B?

EDIT Turns out that a two-fold approach greatly reduced the runtime:

  1. My file B is actually one of twenty-four files. Instead of opening up the same file B1 a thousand times, and then B2 a thousand times, etc. I open up file B1 once, close it, B2 once, close it, etc. This reduces many thousands of fopen and fclose operations to roughly 24.

  2. I used rewind() to reset the file pointer.

This yielded a roughly 60-fold speed improvement, which is more than sufficient. Thanks for pointing me to rewind().


Solution

  • If your dynamic array grows in time, there is a copy cost on some reallocs. If you use the "always double" heuristic, this is amortized to O(n), so it is not horrible. If you know the size ahead of time, a stack allocated array will still be faster.

    For the second question read about rewind. It has got to be faster than opening and closing all the time, and lets you do less resource management.