sortingunixcommand-linetr

Finding all punctuation in a text file & print count


I have come close to counting all occurrences of punctuation, however punctuation characters that are right next to each other get counted as one.

Like so:

cat filename.txt |
tr -sc '[:punct:]' '\n' |
sort |
uniq -c |
sort -bnr`

Which prints something like this:

15 ,
 9 !
 5 .
 2 ;
 2 !"
 2 '
 1 -
 1 --
 1 :
 1 ?

It is clearly only counting punctuation, but how would I separate those that are right next to each other?


Solution

  • This:

    tr -sc '[:punct:]' '\n' 
    

    Basically what you do here is replace all the non-punctuation characters with \n. So when there is no such character between two punctuation chars , you get them next to each other

    You want something like that:

    cat filename.txt | tr -cd [:punct:] | fold -w 1 | sort | uniq -c | sort -bnr