shellunixawkuniq

uniq -c in one column


Imagine we have a txt file like the next one:

Input:

a1 D1
b1 D1
c1 D1
a1 D2
a1 D3
c1 D3

I want to count the time each element in the first column appears but also keep the information provided by the second column (someway). Potential possible output formats are represented, but any coherent alternative is also accepted:

Possible output 1:

3 a1 D1,D2,D3
1 b1 D1
2 c1 D1,D3

Possible output 2:

3 a1 D1
1 b1 D1
2 c1 D1
3 a1 D2
3 a1 D3
1 c1 D3

How can I do this? I guess a combination sort -k 1 input | uniq -c <keep col2> or perhaps using awk but I was not able to write anything that works. However, all answers are considered.


Solution

  • I would harness GNU AWK for this task following way, let file.txt content be

    a1 D1
    b1 D1
    c1 D1
    a1 D2
    a1 D3
    c1 D3
    

    then

    awk 'FNR==NR{arr[$1]+=1;next}{print arr[$1],$0}' file.txt file.txt
    

    gives output

    3 a1 D1
    1 b1 D1
    2 c1 D1
    3 a1 D2
    3 a1 D3
    2 c1 D3
    

    Explanation: 2-pass solution (observe that file.txt is repeated), first pass does count number of occurences of first column value storing that data into array arr, second pass is for printing computed number from array, followed by whole line.

    (tested in GNU Awk 5.0.1)