bashunixuniq

uniq -cd but as percentage


I have a file containing these lines:

"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.6.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"
"RedfishVersion":"1.15.0"

I was wondering if is there a Unix way to get a histogram percentage of these lines based on how many times it's repeated. This is my attempt:

sort bmc-versions.txt | uniq -cd
    321 "RedfishVersion":"1.0.0"
     19 "RedfishVersion":"1.0.2"

I want output like this:

"1.0.0"  50%
"1.0.2"  40%

Solution

  • Sorted by percentage (highest first) using GNU awk:

    awk 'BEGIN{FS=":"; PROCINFO["sorted_in"] = "@val_num_desc"} {a[$2]++} END{for (i in a) {print i "  " int(a[i] / NR * 100 + 0.5) "%"}}' test.txt
    "1.15.0"  54 %
    "1.6.0"  46 %
    

    Nicer formatting:

    awk 'BEGIN {
        FS = ":"
        PROCINFO["sorted_in"] = "@val_num_desc"
    }
    
    {
        a[$2]++
    }
    
    END {
        for (i in a) {
            print i "  " int(a[i] / NR * 100 + 0.5) "%"
        }
    }' test.txt
    "1.15.0"  54 %
    "1.6.0"  46 %
    

    Sorted by percentage (highest first) using 'non-GNU' awk (e.g. posix awk):

    awk 'BEGIN{FS=":"} {a[$2]++} END{for (i=NR; i>=0; i--) {for (h in a) {if(a[h] == i) {print h, int(a[h] / NR * 100 + 0.5), "%"}}}}' test.txt
    "1.15.0" 54 %
    "1.6.0" 46 %
    

    Nicer formatting:

    awk 'BEGIN {
        FS = ":"
    }
    
    {
        a[$2]++
    }
    
    END {
        for (i = NR; i >= 0; i--) {
            for (h in a) {
                if (a[h] == i) {
                    print h, int(a[h] / NR * 100 + 0.5), "%"
                }
            }
        }
    }' test.txt
    "1.15.0" 54 %
    "1.6.0" 46 %