cdynamic-arraysread-textdynamic-struct

I want to read a txt with infos of a movie in each line and save it in to a dynamic array of structures


I'm really new to C programming and I try to make this as an example of reading files and saving them to dynamic array of structs the infos of the txt are:

Movie id:1448
title:The movie
surname of director: lorez
name of director: john
date: 3
month: september
year: 1997

the structs should be like this

typedef struct date
{
  int day, month, year;
} date;

typedef struct director_info
{
  char* director_surname, director_name;
} director_info;

typedef struct movie
{
  int id;
  char* title;
  director_info* director;
  date* release_date;
} movie;

All I know is that I should read it with fgets and i think this is some way but I cant figure out how I will make the structs and save them

    FILE *readingText;
    readingText = fopen("movies.txt", "r");

    if (readingText == NULL)
    {
        printf("Can't open file\n");
        return 1;
    }

    while (fgets(line, sizeof(line), readingText) != NULL)
    {
        .....
    }
    fclose(readingText);

Solution

  • Reading multi-line input can be a bit challenging and couple that with allocating for nested structures and you have a good learning experience for file I/O and dynamic memory allocation. But before looking at your task, there are some misconceptions to clean up:

    char* director_surname, director_name;
    

    Does NOT declare two pointers-to char. It declares a single pointer (director_surname) and then a single character (director_name). Lesson, the unary '*' that indicates the level of pointer indirection goes with the variable NOT the type. Why? Just as you experienced:

    char* a, b, c;
    

    Does NOT declare three pointers-to char, it declares one pointer and two char variables. Using:

    char *a, b, c;
    

    Makes that clear.

    The Multi-Line Read

    When you have to coordinate data from multiple lines, you must validate you obtain the needed information for each line in a group BEFORE you consider input for that group valid. There are a number of approaches, but perhaps one of the more straight-forward is simply to use temporary variables to hold each input, and keep a counter that you increment each time a successful input is received. If you fill all your temporary variables, and your counter reflects the correct number of input, you can then allocate memory for each of the structs and copy the temporary variable to their permanent storage. You then reset your counter to zero, and repeat until you run out of lines in your file.

    Most of your reads are straight-forward, with the exception being the month which is read as a lower-case string for the given month which you must then convert to int for storage in your struct date. Probably the easiest way to handle that is to create a lookup-table (e.g. a constant array of pointers to a string-literals for each of the twelve months). Then after reading your months string you can loop over the array using strcmp() to map the index for that months to your stuct. (adding +1 to make, e.g. january month 1, february month 2, etc...) For example, you can use something like:

    const char *months[] = { "january", "february", "march", "april",
                            "may", "june", "july", "august", "september",
                            "october", "november", "december" };
    #define NMONTHS (int)(sizeof months / sizeof *months)
    

    Where the macro NMONTHS is 12 for the number of elements in months.

    Then for reading your file, your basic approach will be to read each line with fgets() and then parse the needed information from the line with sscanf() validating every input, conversion and allocation along the way. Validation is key to any successful piece of code and especially crucial for multi-line reads with conversions.

    For instance given your structs, you could declare your additional needed constants and declare and initialize your temporary variables, and open the file given as the first argument and validate it is open for reading with:

    ...
    #define MAXC 1024       /* if you need a constant, #define one (or more) */
    #define MAXN  128
    #define AVAIL   2
    ...
    int main (int argc, char **argv) {
        
        char line[MAXC], tmptitle[MAXN], tmpsurnm[MAXN], tmpnm[MAXN], tmpmo[MAXN];
        int good = 0, tmpid;
        date tmpdt = { .day = 0 };      /* temporary date struct to fill */
        movie *movies = NULL;
        size_t avail = AVAIL, used = 0;
        /* use filename provided as 1st argument (stdin by default) */
        FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
        
        if (!fp) {  /* validate file open for reading */
            perror ("file open failed");
            return 1;
        }
    

    Above your good variable will be your counter you increment for each good read and conversion of data from each of the seven lines of data that make up your input blocks. When good == 7 you will have confirmed you have all the data associated with one movie and you can allocate and fill final storage with all the temporary values.

    The used and avail counters track how many allocated struct movie you have available and out of that how many are used. When used == avail, you know it is time to realloc() your block of movies to add more. That's how dynamic allocations schemes work. You allocate some anticipated number of object you need. You loop reading and filling object until you fill up what you have allocated, then you reallocate more and keep going.

    You can add as much additional memory each time as you want, but the general scheme is to double your allocated memory each time a reallocation is needed. That provides a good balance between the number of allocations required and the growth of the number of objects available.

    (memory operations are relatively expensive, you want to avoid allocating for each new outer struct -- though allocation has gotten a bit better in extending rather than creating new and copying each time, using a scheme that allocates larger blocks will still be a more efficient approach in the end)

    Now with your temporary variables and counter declared, you can start your multi-line read. Let's take the first id line as an example:

        while (fgets (line, MAXC, fp)) {    /* read each line */
            /* read ID line & validate conversion */
            if (good == 0 && sscanf (line, "Movie id: %d", &tmpid) == 1)
                good++;     /* increment good line counter */
    

    You read the line and check if good == 0 to coordinate the read with the id line. You attempt a conversion to int and validate both. If you successfully store an integer in your temporary id, you increment your good counter.

    Your read of the Title line will be similar, except this time it will be an else if instead of a plain if. The id line above and the read of title from the next line would be:

         while (fgets (line, MAXC, fp)) {    /* read each line */
            /* read ID line & validate conversion */
            if (good == 0 && sscanf (line, "Movie id: %d", &tmpid) == 1)
                good++;     /* increment good line counter */
            /* read Title line & validate converion */
            else if (good == 1 && sscanf (line, "title:%127[^\n]", tmptitle) == 1)
                good++;     /* increment good line counter */
    

    (note: any time you read a character string into any array with any of the scanf() family of functions, you must use the field-width modifier (127 above) to limit the read to what your array can hold (+1 for '\0') to protect your array bounds from being overwritten. If you fail to include the field-width modifier, then the use of the scanf() function are no safer than using gets(). See: Why gets() is so dangerous it should never be used!)

    With each line read and successfully converted and stored, good will be increment to set up the read of the next line's values into the proper temporary variable.

    Note I said you have a bit more work to do with the month read and conversion due to reading, e.g. "september", but needing to store the integer 9 in your struct. Using your lookup-table from the beginning, you would read and obtain the string for the month name and then loop to find the index in your lookup-table (you will want to add +1 to the index so that january == 1, and so on). You could do it like:

            /* read Month line and loop comparing with array to map index */
            else if (good == 5 && sscanf (line, "month: %s", tmpmo) == 1) {
                tmpdt.month = -1;   /* set month -1 as flag to test if tmpmo found */
                for (int i = 0; i < NMONTHS; i++) {
                    if (strcmp (tmpmo, months[i]) == 0) {
                        tmpdt.month = i + 1;    /* add 1 to make january == 1, etc... */
                        break;
                    }
                }
                if (tmpdt.month > -1)   /* if not found, flag still -1 - failed */
                    good++;
                else
                    good = 0;
            }
    

    After your last else if for the year, you include an else so that any failure of any one line in the block will reset good = 0; so it will attempt to read and match of the next id line in the file, e.g.

            /* read Year line & validate */
            else if (good == 6 && sscanf (line, "year: %d", &tmpdt.year) == 1)
                good++;
            else
                good = 0;
    

    Dynamic Allocation

    Dynamic allocation for your nested structs isn't hard, but you must keep clear in your mind how you will approach it. Your outer struct, struct movie is the one you will allocate and reallocate using used == avail, etc... You will have to allocate for struct date and struct director_info each time you have all seven of your temporary variables filled and validated and ready to be put in final storage. You would start your allocation block by checking if your struct movie block had been allocated yet, if not allocate it. If it had, and used == avail, you reallocate it.

    Now every time you realloc() you use a temporary pointer, so when (not if) realloc() fails returning NULL, you don't lose your pointer to the currently allocated storage by overwriting it with the NULL returned -- creating a memory-leak. That initial handling of allocating or reallocating for your struct movie would look like:

            /* if good 7, all sequential lines and values for movie read */
            if (good == 7) {
                director_info *newinfo;     /* declare new member pointers */
                date *newdate;
                size_t len;
                
                /* if 1st allocation for movies, allocate AVAIL no. of movie struct */
                if (movies == NULL) {
                    movies = malloc (avail * sizeof *movies);
                    if (!movies) {                  /* validate EVERY allocation */
                        perror ("malloc-movies");
                        exit (EXIT_FAILURE);
                    }
                }
                /* if movies needs reallocation */
                if (used == avail) {
                    /* ALWAYS realloc with a temporary pointer */
                    void *tmp = realloc (movies, 2 * avail * sizeof *movies);
                    if (!tmp) {
                        perror ("realloc-movies");
                        break;
                    }
                    movies = tmp;
                    avail *= 2;
                }
    

    Now you have a valid block of struct movie where you can directly store an id and allocate for the title and assign the allocated block holding the title to your title pointer in each struct movie worth of storage. We allocate two struct movie to begin with. When you start used == 0 and avail = 2 (see the AVAIL constant at the top for where the 2 comes from). Handling id and allocating for title would work as:

                movies[used].id = tmpid;    /* set movie ID to tmpid */
                
                /* get length of tmptitle, allocate, copy to movie title */
                len = strlen (tmptitle);
                if (!(movies[used].title = malloc (len + 1))) {
                    perror ("malloc-movies[used].title");
                    break;
                }
                memcpy (movies[used].title, tmptitle, len + 1);
    

    (note: when you declare multiple struct in a block of memory and use [..] to index each struct, the [..] acts as a dereference of the pointer, so you use the '.' operator to access the member following the [..], not the '->' operator as you normally would to derefernce a struct pointer to access the member (the derefernce is already done by [..])

    Also, since you know the len, there is no reason to use strcpy() to copy tmptitle to movies[used].title and have strcpy() scan the string looking for the nul-terminating character at the end. You already know the number of characters, so just use memcpy() to copy len + 1 bytes. (note if you have strdup() you can allocate and copy in a single-call, but note strdup() isn't part of the c library in C11.

    The allocation for your struct director_info for each struct movie element is straight-forward. You allocate the struct director_info and then use strlen() to get the length of the names and then allocate storage for each and memcpy() as we did above.

                /* allocate director_info struct & validate */
                if (!(newinfo = malloc (sizeof *newinfo))) {
                    perror ("malloc-newinfo");
                    break;
                }
                
                len = strlen (tmpsurnm);    /* get length of surname, allocate, copy */
                if (!(newinfo->director_surname = malloc (len + 1))) {
                    perror ("malloc-newinfo->director_surname");
                    break;
                }
                memcpy (newinfo->director_surname, tmpsurnm, len + 1);
                
                len = strlen (tmpnm);       /* get length of name, allocate, copy */
                if (!(newinfo->director_name = malloc (len + 1))) {
                    perror ("malloc-newinfo->director_name");
                    break;
                }
                memcpy (newinfo->director_name, tmpnm, len + 1);
                
                movies[used].director = newinfo;    /* assign allocated struct as member */
    

    Handling allocation and filling the new struct date is even easier. You just allocate and assign 3 integer values and then assign the address for the allocated struct date to the pointer in your struct movie, e.g.

                /* allocate new date struct & validate */
                if (!(newdate = malloc (sizeof *newdate))) {
                    perror ("malloc-newdate");
                    break;
                }
                
                newdate->day = tmpdt.day;       /* populate date struct from tmpdt struct */
                newdate->month = tmpdt.month;
                newdate->year = tmpdt.year;
                        
                movies[used++].release_date = newdate;  /* assign newdate as member */
                good = 0;
            }
    

    That's it, you increment used++ when you assign the last pointer in your struct movie so you are set up to fill the next element in that block with the next seven lines from the file. You reset good = 0; to ready the read loop to read the next id line from the file.

    Putting It Altogether

    If you fill in the pieces putting the code altogether, you would end up with something similar to:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define MAXC 1024       /* if you need a constant, #define one (or more) */
    #define MAXN  128
    #define AVAIL   2
    
    const char *months[] = { "january", "february", "march", "april",
                            "may", "june", "july", "august", "september",
                            "october", "november", "december" };
    #define NMONTHS (int)(sizeof months / sizeof *months)
    
    typedef struct date {
          int day, month, year;
    } date;
    
    typedef struct director_info {
          char *director_surname, *director_name;
    } director_info;
    
    typedef struct movie {
      int id;
      char *title;
      director_info *director;
      date *release_date;
    } movie;
    
    void prnmovies (movie *movies, size_t n)
    {
        for (size_t i = 0; i < n; i++)
            printf ("\nMovie ID : %4d\n"
                    "Title    : %s\n"
                    "Director : %s %s\n"
                    "Released : %02d/%02d/%4d\n",
                    movies[i].id, movies[i].title, 
                    movies[i].director->director_name, movies[i].director->director_surname,
                    movies[i].release_date->day, movies[i].release_date->month,
                    movies[i].release_date->year);
    }
    
    void freemovies (movie *movies, size_t n)
    {
        for (size_t i = 0; i < n; i++) {
            free (movies[i].title);
            free (movies[i].director->director_surname);
            free (movies[i].director->director_name);
            free (movies[i].director);
            free (movies[i].release_date);
        }
        free (movies);
    }
    
    int main (int argc, char **argv) {
        
        char line[MAXC], tmptitle[MAXN], tmpsurnm[MAXN], tmpnm[MAXN], tmpmo[MAXN];
        int good = 0, tmpid;
        date tmpdt = { .day = 0 };      /* temporary date struct to fill */
        movie *movies = NULL;
        size_t avail = AVAIL, used = 0;
        /* use filename provided as 1st argument (stdin by default) */
        FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
        
        if (!fp) {  /* validate file open for reading */
            perror ("file open failed");
            return 1;
        }
        
        while (fgets (line, MAXC, fp)) {    /* read each line */
            /* read ID line & validate conversion */
            if (good == 0 && sscanf (line, "Movie id: %d", &tmpid) == 1)
                good++;     /* increment good line counter */
            /* read Title line & validate converion */
            else if (good == 1 && sscanf (line, "title:%127[^\n]", tmptitle) == 1)
                good++;     /* increment good line counter */
            /* read director Surname line & validate */
            else if (good == 2 && sscanf (line, "surname of director: %127[^\n]", 
                                            tmpsurnm) == 1)
                good++;
            /* read directory Name line & validate */
            else if (good == 3 && sscanf (line, "name of director: %127[^\n]", tmpnm) == 1)
                good++;
            /* read Day line & validate */
            else if (good == 4 && sscanf (line, "date: %d", &tmpdt.day) == 1)
                good++;
            /* read Month line and loop comparing with array to map index */
            else if (good == 5 && sscanf (line, "month: %s", tmpmo) == 1) {
                tmpdt.month = -1;   /* set month -1 as flag to test if tmpmo found */
                for (int i = 0; i < NMONTHS; i++) {
                    if (strcmp (tmpmo, months[i]) == 0) {
                        tmpdt.month = i + 1;    /* add 1 to make january == 1, etc... */
                        break;
                    }
                }
                if (tmpdt.month > -1)   /* if not found, flag still -1 - failed */
                    good++;
                else
                    good = 0;
            }
            /* read Year line & validate */
            else if (good == 6 && sscanf (line, "year: %d", &tmpdt.year) == 1)
                good++;
            else
                good = 0;
            
            /* if good 7, all sequential lines and values for movie read */
            if (good == 7) {
                director_info *newinfo;     /* declare new member pointers */
                date *newdate;
                size_t len;
                
                /* if 1st allocation for movies, allocate AVAIL no. of movie struct */
                if (movies == NULL) {
                    movies = malloc (avail * sizeof *movies);
                    if (!movies) {                  /* validate EVERY allocation */
                        perror ("malloc-movies");
                        exit (EXIT_FAILURE);
                    }
                }
                /* if movies needs reallocation */
                if (used == avail) {
                    /* ALWAYS realloc with a temporary pointer */
                    void *tmp = realloc (movies, 2 * avail * sizeof *movies);
                    if (!tmp) {
                        perror ("realloc-movies");
                        break;
                    }
                    movies = tmp;
                    avail *= 2;
                }
                
                movies[used].id = tmpid;    /* set movie ID to tmpid */
                
                /* get length of tmptitle, allocate, copy to movie title */
                len = strlen (tmptitle);
                if (!(movies[used].title = malloc (len + 1))) {
                    perror ("malloc-movies[used].title");
                    break;
                }
                memcpy (movies[used].title, tmptitle, len + 1);
                
                
                /* allocate director_info struct & validate */
                if (!(newinfo = malloc (sizeof *newinfo))) {
                    perror ("malloc-newinfo");
                    break;
                }
                
                len = strlen (tmpsurnm);    /* get length of surname, allocate, copy */
                if (!(newinfo->director_surname = malloc (len + 1))) {
                    perror ("malloc-newinfo->director_surname");
                    break;
                }
                memcpy (newinfo->director_surname, tmpsurnm, len + 1);
                
                len = strlen (tmpnm);       /* get length of name, allocate, copy */
                if (!(newinfo->director_name = malloc (len + 1))) {
                    perror ("malloc-newinfo->director_name");
                    break;
                }
                memcpy (newinfo->director_name, tmpnm, len + 1);
                
                movies[used].director = newinfo;    /* assign allocated struct as member */
                
                /* allocate new date struct & validate */
                if (!(newdate = malloc (sizeof *newdate))) {
                    perror ("malloc-newdate");
                    break;
                }
                
                newdate->day = tmpdt.day;       /* populate date struct from tmpdt struct */
                newdate->month = tmpdt.month;
                newdate->year = tmpdt.year;
                        
                movies[used++].release_date = newdate;  /* assign newdate as member */
                good = 0;
            }
            
        }
        if (fp != stdin)   /* close file if not stdin */
            fclose (fp);
    
        prnmovies (movies, used);       /* print stored movies */
        freemovies (movies, used);      /* free all allocated memory */
    }
    

    (note: the addition of prnmovies() to output all stored movies and freemovies() to free all allocated memory)

    Example Input File

    Rather than just one block of seven lines for one movie, let's add another to make sure the code will loop through a file, e.g.

    $ cat dat/moviegroups.txt
    Movie id:1448
    title:The movie
    surname of director: lorez
    name of director: john
    date: 3
    month: september
    year: 1997
    Movie id:1451
    title:Election - Is the Cheeto Tossed?
    surname of director: loreza
    name of director: jill
    date: 3
    month: november
    year: 2020
    

    Example Use/Output

    Processing the input file with two movies worth of data in the filename dat/moviegroups.txt you would have:

    $ ./bin/movieinfo dat/moviegroups.txt
    
    Movie ID : 1448
    Title    : The movie
    Director : john lorez
    Released : 03/09/1997
    
    Movie ID : 1451
    Title    : Election - Is the Cheeto Tossed?
    Director : jill loreza
    Released : 03/11/2020
    

    Memory Use/Error Check

    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

    It is imperative that you use a memory error checking program to ensure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

    $ valgrind ./bin/movieinfo dat/moviegroups.txt
    ==9568== Memcheck, a memory error detector
    ==9568== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
    ==9568== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
    ==9568== Command: ./bin/movieinfo dat/moviegroups.txt
    ==9568==
    
    Movie ID : 1448
    Title    : The movie
    Director : john lorez
    Released : 03/08/1997
    
    Movie ID : 1451
    Title    : Election - Is the Cheeto Tossed?
    Director : jill loreza
    Released : 03/10/2020
    ==9568==
    ==9568== HEAP SUMMARY:
    ==9568==     in use at exit: 0 bytes in 0 blocks
    ==9568==   total heap usage: 14 allocs, 14 frees, 5,858 bytes allocated
    ==9568==
    ==9568== All heap blocks were freed -- no leaks are possible
    ==9568==
    ==9568== For counts of detected and suppressed errors, rerun with: -v
    ==9568== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
    

    Always confirm that you have freed all memory you have allocated and that there are no memory errors.

    There is a LOT of information in this answer (and it always turns out longer than I anticipated), but to give a fair explanation of what is going on takes a little time. Go though it slowly, understand what each bit of code is doing and understand how the allocations are handled (it will take time to digest). If you get stuck, drop a comment and I'm happy to explain further.