awk

Calculate subtotals of the output of sort-unique command


I have a file that is generated using command | sort | uniq -c

city.txt
2 mumbaiXa
3 mumbaiXb
1 mumbaiXp
5 delhiXn
4 delhiXz
1 parisXs
7 parisXt
1 parisXa
9 parisXe

I am trying to split on X and get the count of each city:

expected output:
mumbai 6
delhi 9
paris 18

I tried this but that did not return the expected result.

grep 'X' city.txt | awk '{print $2}' | awk -F 'X' '{print $1}' | sort | uniq -c

Update:

The data file looks like this...

   1904 mumbaiXa
   1167 mumbaiXa
    830 mumbaiXb
    565 mumbaiXp
    424 delhiXn
    423 delhiXz

I gave a simplified version and changed the text.


Solution

  • I have a file that is generated using command | sort | uniq -c

    city.txt
    2 mumbaiXa
    3 mumbaiXb
    1 mumbaiXp
    5 delhiXn
    4 delhiXz
    1 parisXs
    7 parisXt
    1 parisXa
    9 parisXe
    

    If you are allowed to call command again and it will give exactly same output you might get desired totals by dropping X and what is after it, before ramming that into following command, which might be done e.g. following way

    command | awk 'BEGIN{FS="X"}{print $1}' | sort | uniq -c
    

    otherwise if you wish to use ... | sort | uniq -c you should repeat cityname times quantity, let city.txt content be

    2 mumbaiXa
    3 mumbaiXb
    1 mumbaiXp
    5 delhiXn
    4 delhiXz
    1 parisXs
    7 parisXt
    1 parisXa
    9 parisXe
    

    then

    awk 'sub(/X.*/,""){for(i=1;i<=$1;i+=1){print $2}}' city.txt | sort | uniq -c
    

    gives output

      9 delhi
      6 mumbai
     18 paris
    

    Explanation: for every line where subsitution of X followed by zero-or-more of any character was done I use for loop to print 2nd field number of times specified in 1st field.