crealloccoredumpcalloc

How do I read a large file of numbers and store its content in an array?


My original task is: Given a file of numbers, I have to find all pairs (a pair is just 2 numbers; they cannot be consecutive) that fulfil a certain condition.

To do that, I've decided to create an array where I will store all the numbers. I have a ~11000 Kbyte file of numbers (the number of numbers is ~1,5*10^9), and I need to read all of them from the file and store them in an array.

Numbers in the file are arranged like this; 0 < number <= 100000:

10
20
30
40
50
60

The first thing I decided to do was to create just a regular array int arr[150000000]. But this won't do due to stack overflow.

Then I decided to use calloc.

#include<stdio.h>
#include<stdlib.h>

int main()
{
    FILE *file = fopen("nums.txt", "r");
    
    int str_size = 8;
    char str[str_size]; // 0 < number <= 100000, so using the length 8 (6 digits + '\n' + null-terminating byte) is enough.
    
    int size = 2000; //Initial capacity
    int *a = calloc(size, sizeof(int)); //The array

    int i = 0;
    while( ((fgets(str, str_size, file) ) != NULL)) { //read a line from the file (a line = 1 number)
        if(atoi(str)!=0) // '\n' is on the end of each line, so fgets() function will read an empty line and return 0, we don't want that.
        {
            a[i] = atoi(str); //Convert a string to int and storing the value in the array
            i++;
        }
        a = realloc(a, sizeof(int)*(size*2) ); //And then I realloc my array, multiplying the size by 2.
        size = size * 2;
    }

    a[i] = -1; //When the flow of numbers ended, I mark the end with -1.

    fclose(file);
    free(a);
    return 0;
}

This code works well for smaller files (100 Kbyte files), but fails for larger files, returning the following:

realloc(): invalid next size
Aborted (core dumped)

As far as I understand, this means that my computer cannot allocate more memory. So what can I do?

EDIT

I understand my first error. I reallocated the array's size, but I didn't increase the size variable itself. But it didn't help me, now I get the Segmentation fault (core dumped) instead.


Solution

  • fopen, calloc, and realloc can all fail, returning NULL.

    In the case of realloc, the original allocation will remain untouched, so immediately overwriting the original pointer with the newly returned value will leak memory and throw away potentially useful data.

    Reallocating every iteration is rather costly. A better strategy is to reallocate when the buffer is full, increasing the memory by some factor. Doubling the current size is a simple approach.

    Strictly sized buffers like char [6] are prone to filling up too quickly. Be more generous with your input buffer sizes.

    The primary issue is that size is never changed during the loop, so the allocated memory does not keep up with the data read. The buffer overflows and you invoke Undefined Behaviour, likely overwriting some data you should not have touched.

    Run this program as ./a.out nums.txt, or omit the argument to read from stdin.

    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc, char **argv)
    {
        char line[128];
        FILE *file = argc > 1 ? fopen(argv[1], "r") : stdin;
    
        if (!file) {
            perror(argv[1]);
            return EXIT_FAILURE;
        }
    
        size_t size = 1024;
        size_t length = 0;
        int *list = malloc(sizeof *list * size);
    
        if (!list) {
            perror("malloc");
            return EXIT_FAILURE;
        }
    
        while (fgets(line, sizeof line, file)) {
            int value = atoi(line);
    
            if (!value)
                continue;
    
            if (length == size) {
                size *= 2;
                void *memory = realloc(list, sizeof *list * size);
    
                if (!memory) {
                    perror("realloc");
                    break;
                }
    
                list = memory;
            }
    
            list[length++] = value;
        }
    
        if (argc > 1)
            fclose(file);
    
        /* arbitrary use of data */
        size_t odds = 0;
    
        for (size_t i = 0; i < length; i++)
            if (list[i] & 1)
                odds++;
    
        printf("Number of odd values: %zu\n", odds);
    
        free(list);
    }