arraysbashawkuniq

Find duplicates in array, print count with pair


I have an array of value,location pairs

arr=(test,meta my,amazon test,amazon this,meta test,google my,google hello,microsoft)

I want to print the duplicate values, the number/count of them, along with the location.

For example:

3 test: meta, amazon, google
2 my: amazon, google
1 this: meta
1 hello: microsoft

Here test appears 3 times, in meta, amazon, and google

So far, this code will print the item and location

printf '%s\n' "${arr[@]}" | awk -F"," '!_[$1]++'
test,meta
my,amazon
this,meta
hello,microsoft

This will print the count, but it's taking in the value,location as one value

printf '%s\n' "${arr[@]}" | sort | uniq -c | sort -r
   1 my,amazon
   1 my,google
   1 this,meta
   1 test,meta
   1 test,google
   1 test,amazon
   1 hello,microsoft

Solution

  • You may consider this solution that would with any version of awk:

    printf '%s\n' "${arr[@]}" |
    awk -F, '
    {
       row[$1] = (fq[$1]++ ? row[$1] ", " : "") $2
    }
    END {
       for (k in fq)
          print fq[k], k ":", row[k]
    }' | sort -rn -k1
    
    3 test: meta, amazon, google
    2 my: amazon, google
    1 this: meta
    1 hello: microsoft
    

    Note that, I have used sort to get output as per your shown expected output. If you don't care about ordering that you can remove sort command.