cgetlinestrtok

strtok string from file to array but missing first line


I'm trying to read a .txt file and save all sentences end with .!? into array. I use getline and strtok to do this. When I save the sentences, it seems work. But when I try to retrieve data later through index, the first line is missing.

The input is in a file input.txt with content below

The wandering earth! In 2058, the aging Sun? is about to turn into a red .giant and threatens to engulf the Earth's orbit!

Below is my code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main() {
    FILE *fp = fopen("input.txt", "r+");
    char *line = NULL;
    size_t len = 0;
    char *sentences[100];

    if (fp == NULL) {
        perror("Cannot open file!");
        exit(1);
    }

    char delimit[] = ".!?";
    int i = 0;

    while (getline(&line, &len, fp) != -1) { 
        char *p = strtok(line, delimit);
        
        while (p != NULL) {
            sentences[i] = p;
            printf("sentences [%d]=%s\n", i, sentences[i]);
            i++;
            p = strtok(NULL, delimit);
        }
    }

    for (int k = 0; k < i; k++) {
        printf("sentence is ----%s\n", sentences[k]);
    }

    return 0;
}

output is

sentences [0]=The wandering earth
sentences [1]= In 2058, the aging Sun
sentences [2]= is about to turn into a red 
sentences [3]=giant and threatens to engulf the Earth's orbit
sentence is ----
sentence is ---- In 2058, the aging Sun
sentence is ---- is about to turn into a red 
sentence is ----giant and threatens to engulf the Earth's orbit

I use strtok to split string directly. It worked fine.


Solution

    1. Change mode from "r+" to "r".
    2. Changed the list of delimiters from a variable to a constant DELIMITERS and added '\n'. You may or may not what that '\n' in there but I would need to see the expected output now that you supplied input. vim, at least, ends the last line with a '\n' which would generate at least one '\n' token at the end. The other option is to remove leading and trailing white space, and if you end up with an empty string then don't add it as a sentence.
    3. Introduced a constant for number of sentences, and ignore additional sentences beyond what we have space for.
    4. Combined the two strtok() calls (DRY).
    5. Eliminated the two memory leaks.
    6. If your input contains multiple lines the contents of line will be overwritten. This means the pointers in in sentences no longer make sense. The easiest fix is strdup() each string. Another approach would be to retain an array of line pointers (for subsequent free()) and have getline() allocate new a new line each time by resetting line = 0 and line = NULL.
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    #define DELIMITERS ".!?\n"
    #define SENTENCES_LEN 100
    
    int main() {
        FILE *fp = fopen("input.txt", "r");
        if (!fp) {
            perror("Cannot open file!");
            return 1;
        }
    
        char *line = NULL;
        size_t len = 0;
        char *sentences[SENTENCES_LEN];
        int i = 0;
        while (getline(&line, &len, fp) != -1) { 
            char *s = line;
            for(; i < SENTENCES_LEN; i++) {
                char *sentence = strtok(s, DELIMITERS);
                if(!sentence)
                    break;
                sentences[i] = strdup(sentence);
                printf("sentences [%d]=%s\n", i, sentences[i]);
                s = NULL;
            }
        }
    
        for (int k = 0; k < i; k++) {
            printf("sentence is ----%s\n", sentences[k]);
            free(sentences[k]);
        }
    
        free(line);
        fclose(fp);
    }
    

    Using the supplied input file the matching out is:

    sentences [0]=The wandering earth
    sentences [1]= In 2058, the aging Sun
    sentences [2]= is about to turn into a red
    sentences [3]=giant and threatens to engulf the Earth's orbit
    sentence is ----The wandering earth
    sentence is ---- In 2058, the aging Sun
    sentence is ---- is about to turn into a red
    sentence is ----giant and threatens to engulf the Earth's orbit