Given a csv describing firstname and lastname of parent-child relationship
$ cat /var/tmp/hier
F2 L2,F1 L1
F3 L3,F1 L1
F4 L4,F2 L2
F5 L5,F2 L2
F6 L6,F3 L3
I want to print:
F1 L1
F2 L2
F4 L4
F5 L5
F3 L3
F6 L6
I wrote a script like below:
#!/bin/bash
print_node() {
echo "awk -F, '\$2=="\"$@\"" {print \$1}' /var/tmp/hier"
for node in `eval "awk -F, '\$2=="\"$@\"" {print \$1}' /var/tmp/hier"`
do
echo -e "\t"$node
print_node "$node"
done
}
print_node "$1"
run the script:
$ ./print_tree.sh "F1 L1"
awk -F, '$2=="F1 L1" {print $1}' /var/tmp/hier
awk: syntax error near line 1
awk: bailing out near line 1
It seemed that the awk command was malformed. but if I run the command shown in the debug output, it works:
$ awk -F, '$2=="F1 L1" {print $1}' /var/tmp/hier
F2 L2
F3 L3
What might be causing this error?
I would personally reach for Perl here; you could also do Python (or any other similar-level language that happens to be there, like Ruby or Tcl, but Perl and Python are almost universally preinstalled). I would use one of them since they have built-in nested data structures, which make it easy to cache the tree in navigable form, instead of re-parsing the parent links every time you want to fetch a node's children. (GNU awk has arrays of arrays, but BSD awk doesn't.)
Anyway, here's one perl solution:
#!/usr/bin/env perl
use strict;
use warnings;
my %parent;
while (<>) {
chomp;
my ($child, $parent) = split ',';
$parent{$child} = $parent;
}
my (%children, %roots);
while (my ($child, $parent) = each %parent) {
push @{$children{$parent} ||= []}, $child;
$roots{$parent} = 1 unless $parent{$parent};
}
foreach my $root (sort keys %roots) {
show($root);
}
sub show {
my ($node, $indent) = (@_,'');
print "$indent$node\n";
foreach my $child (sort(@{$children{$node}||[]})) {
show($child, " $indent");
}
}
I saved the above as print_tree.pl
and ran it like this on your data:
$ perl print_tree.pl *csv
You could also make it executable with chmod +x print_tree.pl
and run it without explicitly calling perl
:
$ ./print_tree.pl *csv
Anyway, on your sample data, it produces this output:
F1 L1
F2 L2
F4 L4
F5 L5
F3 L3
F6 L6