cfgetsstrtok

Processing an Input File Line-By-Line Using fgets/strtok


I am trying to create a C program that processes an input file and finds information to word count/anatomy (number of words, length of longest word, most frequent size of word and its frequency, etc.).

I have a rough idea of how to do it, but when use fgets as the condition to a loop to process the input file line-by-line, my program never even reaches the body of the loop, producing unexpected results.

So far I have the following code:

// This program reads all lines of an input file and generates 
// a report including: 
//      Number of words in the file 
//      Which word size occurs the most and how many times 
//      Largest word length and its frequency 
//      All words of the longest word length of the file 
//       (duplicates not reported)

#include <stdio.h> 
#include <ctype.h> 
#include <string.h>

#define MAXW 300    // max total words
#define MAXC 17     // max chars in a word 
#define MAXLINEW 82 // max chars to a line 
#define MAXLINE 30  // max number of lines

const char *clean(char *src); 
int getWords(char (*words)[MAXC], FILE *f);

int main(char **argv) {

    char words[MAXC][MAXW] = {{0}};
    int num_words = 0;
    size_t i;

    FILE *f = fopen("input.txt", "r");

    if (!f) {
        fprintf (stderr, "ERROR: Unable to open file '%s'.\n", argv[1]);
        return 1;
    }

    num_words = getWords(words, f);
    printf("Num words = %d\n", num_words);

    fclose(f);
}

const char *clean(char *src) { 
    char *dst; 
    for (; *src; ++src) { 
        if (!ispunct((unsigned char)*src)) 
            *dst++ = tolower((unsigned char)*src); 
            *dst = 0;
    }
    return dst; 
}

int getWords(char (*words)[MAXC], FILE *f) { 
    int word_cnt = 0; 
    int r; 
    char p = NULL; 
    char lines[MAXLINE][MAXLINEW]; 
    char buf[MAXLINEW]; 
    static const char delims[] = " \n"; 
    r = 0; 
    while (fgets(buf, MAXLINEW, f)) { 
        // find the next word 
        if (p == NULL) { 
            p = strtok(buf, delims); 
            while (p) { 
                const char c = clean(p); 
                strcpy(words[word_cnt], c); 
                word_cnt++; 
                p = strtok(NULL, delims); 
            } 
        } 
    } 
}

I'm trying to split each line of the input file using fgets, then process each line using strtok to get each word (delimited by a space of newline). With each word tokenized via strtok I want to pass it to the clean function which should remove any punctuation and make everything lowercase. Once the word is cleaned I want to copy the cleaned word into the final array of all words that I can later use to produce the desired result of this program (count word lengths/frequencies, etc).

As I said before, my program never even reaches the body of the while loop inside getWords, and I'm not sure why.

I don't have much experience with C, but I do know C++ so I'm sorry if my code is missing anything glaringly obvious.

Any help would be greatly appreciated, thank you!


Solution

  • The obvious mistake is you store the cleaned version of the word in clean to *dst, but this pointer is not initialized.

    You should instead modify the source array in place:

    char *clean(char *src) { 
        char *result = src; 
        char *dst = src; 
        for (; *src; ++src) { 
            if (!ispunct((unsigned char)*src)) 
                *dst++ = tolower((unsigned char)*src); 
        }
        *dst = '\0';
        return result; 
    }
    

    Another mistake: const char c = clean(p); should be

    const char *c = clean(p);
    

    Also int main(char **argv) is missing an argument. Use int main().

    Note however that there is no need to store the words in order to compute the stats requested. Just compiling the word lengths in an array and doing simple math will produce the expected output.