I have a file that has the content as
2004-10-07 cva create file ...
2003-11-11 cva create version ...
2003-11-11 cva create version ...
2003-11-11 cva create branch ...
now I want to count the number of lines that start with date in this particular file. How can I do that
if I use wc -l <file.txt>
it gives me total number of lines(5 in my case whereas I want is count should be 4)
Given:
$ cat file
2004-10-07 cva create file ...
no date
2003-11-11 cva create version ...
no date
2003-11-11 cva create version ...
no date
2003-11-11 cva create branch ...
First figure out how to run a regex on each line of the file. Suppose you use sed
since it is fairly standard and fast. You could also use awk
, grep
, bash
, perl
Here is a sed
solution:
$ sed -nE '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/p' file
2004-10-07 cva create file ...
2003-11-11 cva create version ...
2003-11-11 cva create version ...
2003-11-11 cva create branch ...
Then pipe that to wc
:
$ sed -nE '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/p' file | wc -l
4
Or, you can use the same pattern in awk
and not need to use wc
:
$ awk '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/{lc++} END{ print lc }' file
4
Or if you wanted the count of each date:
$ awk '/^[12][0-9]{3}-[0-9]{2}-[0-9]{2}/{cnt[$1]++} END{ for (e in cnt) print e, cnt[e] }' file
2003-11-11 3
2004-10-07 1
Or, same pattern, with grep
:
$ grep -cE '^[12][0-9]{3}-[0-9]{2}-[0-9]{2}' file
4
(Note: it is unclear if your date format is YYYY-MM-DD
or YYYY-DD-MM
You can make the pattern more specific if this is known. )