awk

recursion function with awk


Given a csv describing firstname and lastname of parent-child relationship

$ cat /var/tmp/hier
F2 L2,F1 L1
F3 L3,F1 L1
F4 L4,F2 L2
F5 L5,F2 L2
F6 L6,F3 L3

I want to print:

F1 L1
    F2 L2
        F4 L4
        F5 L5
    F3 L3
        F6 L6

I wrote a script like below:

#!/bin/bash
print_node() {
        echo "awk -F, '\$2=="\"$@\"" {print \$1}' /var/tmp/hier"
        for node in `eval "awk -F, '\$2=="\"$@\"" {print \$1}'     /var/tmp/hier"`
        do
                echo -e "\t"$node
                print_node "$node"
        done
}
print_node "$1"

run the script:

$ ./print_tree.sh "F1 L1"
awk -F, '$2=="F1 L1" {print $1}' /var/tmp/hier
awk: syntax error near line 1
awk: bailing out near line 1

It seemed that the awk command was malformed. but if I run the command shown in the debug output, it works:

$ awk -F, '$2=="F1 L1" {print $1}' /var/tmp/hier
F2 L2
F3 L3

What might be causing this error?


Solution

  • I would personally reach for Perl here; you could also do Python (or any other similar-level language that happens to be there, like Ruby or Tcl, but Perl and Python are almost universally preinstalled). I would use one of them since they have built-in nested data structures, which make it easy to cache the tree in navigable form, instead of re-parsing the parent links every time you want to fetch a node's children. (GNU awk has arrays of arrays, but BSD awk doesn't.)

    Anyway, here's one perl solution:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    
    my %parent;
    
    while (<>) {
      chomp;
      my ($child, $parent) = split ',';
      $parent{$child} = $parent;
    }
    
    my (%children, %roots);
    
    while (my ($child, $parent) = each %parent) {
      push @{$children{$parent} ||= []}, $child;
      $roots{$parent} = 1 unless $parent{$parent};
    }
    
    foreach my $root (sort keys %roots) {
      show($root);
    }
    
    sub show {
      my ($node, $indent) = (@_,'');
      print "$indent$node\n";
      foreach my $child (sort(@{$children{$node}||[]})) {
        show($child, "    $indent");
      }
    }
    

    I saved the above as print_tree.pl and ran it like this on your data:

    $ perl print_tree.pl *csv
    

    You could also make it executable with chmod +x print_tree.pl and run it without explicitly calling perl:

    $ ./print_tree.pl *csv
    

    Anyway, on your sample data, it produces this output:

    F1 L1
        F2 L2
            F4 L4
            F5 L5
        F3 L3
            F6 L6