bashshellsortingtextgnu-sort

How to sort a file containing two sets into two alphabetically ordered sets in Linux shell?


I have 2 similar sets of data in a single text file and I want to alphabetically sort the content of these sets independent of each other. The following text block is a part of the text file which is in the format of name : value

set1 :
scripts : 1168
virt : 541
firmware : 15
init : 315
security : 1529

set2 :    
scripts : 873
init : 84
virt : 402
security : 1720
firmware : 6

and I want the output to be sorted as follow:

set1 :
firmware : 15
init : 315
scripts : 1168
security : 1529
virt : 541

set2 :    
firmware : 6
init : 84
scripts : 873
security : 1720
virt : 402

After a quick research about this problem, I found the following solutions:

  1. Split the file into 2 halves and then try to sort them independently which is not preferred here because I need to collect these results from more than 1000 files.
  2. Use Linux GNU sort command which has a bit of problem. It sorts the two sets alphabetically but then it orders the same names based on their values. This means, using GNU sort, I am losing the set membership for some rows.

The result for GNU sort is which is sorted but it's not correct in the context of this question.

firmware : 15
firmware : 6
init : 315
init : 84
scripts : 1168
scripts : 873
security : 1529
security : 1720
set1 :
set2 :
virt : 402
virt : 541

Solution

  • BEGIN {
    RS="set"
    FS="\n"
    }
    
    {
    # Skip empty sets
    if ($0 == "") next
    
    setname = "set"$1
    print setname
    
    i = 0
    # Start at field 2 to skip first field which is part of "set" name
    for (x=2; x<NF; x++) {
        # Ignore blank lines in set
        if ($x != "") {
            data[i] = $x
            i = i + 1
        }
    }
    n = asort(data)  # NOTE: asort indexes from 1
    for (i=1; i<=n; i++) {
        print data[i]
    }
    delete data
    }
    

    Run the above based on OP input:

    awk -f sort.awk input.txt

    set1 :
    firmware : 15
    init : 315
    scripts : 1168
    security : 1529
    virt : 541
    set2 :
    firmware : 6
    init : 84
    scripts : 873
    security : 1720
    virt : 402