bashunixcsvuniq

Unix uniq command to CSV file


I have a text file (list.txt) containing single and multi-word English phrases. My goal is to do a word count for each word and write the results to a CSV file.

I have figured out the command to write the amount of unique instances of each word, sorted from largest to smallest. That command is:

$ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | less > output.txt

The problem is the way the new file (output.txt) is formatted. There are 3 leading spaces, followed by the number of occurrences, followed by a space, followed by the word. Then on to a next line. Example:

   9784 the
   6368 and
   4211 for
   2929 to

What would I need to do in order to get the results in a more desired format, such as CSV? For example, I'd like it to be:

9784,the
6368,and
4211,for
2929,to

Even better would be:

the,9784
and,6368
for,4211
to,2929

Is there a way to do this with a Unix command, or do I need to do some post-processing within a text editor or Excel?


Solution

  • Use awk as follows:

     > cat input 
       9784 the
       6368 and
       4211 for
       2929 to
     > cat input | awk '{ print $2 "," $1}'
    the,9784
    and,6368
    for,4211
    to,2929
    

    You full pipeline will be:

    $ tr 'A-Z' 'a-z' < list.txt | tr -sc 'A-Za-z' '\n' | sort | uniq -c | sort -n -r | awk '{ print $2 "," $1}' > output.txt