regexbashclearcasewc

how to count the number of lines in a text file that start with a date


I have a file that has the content as

2004-10-07     cva        create file ...
2003-11-11     cva        create version ...
2003-11-11     cva        create version ...
2003-11-11     cva        create branch ...

now I want to count the number of lines that start with date in this particular file. How can I do that

if I use wc -l <file.txt>
it gives me total number of lines(5 in my case whereas I want is count should be 4)


Solution

  • Given:

    $ cat file
    2004-10-07     cva        create file ...
    no date
    2003-11-11     cva        create version ...
    no date
    2003-11-11     cva        create version ...
    no date
    2003-11-11     cva        create branch ...
    

    First figure out how to run a regex on each line of the file. Suppose you use sed since it is fairly standard and fast. You could also use awk, grep, bash, perl

    Here is a sed solution:

    $ sed -nE '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/p' file
    2004-10-07     cva        create file ...
    2003-11-11     cva        create version ...
    2003-11-11     cva        create version ...
    2003-11-11     cva        create branch ...
    

    Then pipe that to wc:

    $ sed -nE '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/p' file | wc -l
          4
    

    Or, you can use the same pattern in awk and not need to use wc:

    $ awk '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/{lc++} END{ print lc }' file
    4
    

    Or if you wanted the count of each date:

    $ awk '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/{cnt[$1]++} END{ for (e in cnt) print e, cnt[e] }' file
    2003-11-11 3
    2004-10-07 1
    

    Or, same pattern, with grep:

    $ grep -cE '^[12][0-9]{3}-[0-9]{2}-[0-9]{2}' file
    4
    

    (Note: it is unclear if your date format is YYYY-MM-DD or YYYY-DD-MM You can make the pattern more specific if this is known. )