Full-outer-join for multiple files

Is there a good way to perform 'full outer join' multiple files on Unix ideally using GNU coreutils. My files were produced by uniq -c, below is a mock example the can be generated by:

echo "12 aa\n3 bb" > file1
echo "5 aa\n6 bb\n1 cc" > file2
echo "11 aa\n7 bb\n4 dd" > file3

and looks like:

tail -n +1 file*
==> file1 <==
12 aa
3 bb

==> file2 <==
5 aa
6 bb
1 cc

==> file3 <==
11 aa
7 bb
4 dd

I want to merge them one by one, using the sequence (column 2) as key filling in a 0 if the key is not included (outer join). I.e. the desired output would look like this

12 5 11 aa
3 6 6 bb
0 1 0 cc
0 0 4 dd

So far I found join to be doing the job at least for pairwise merging:

join -j2 file1 file2 -a 2 -a 2 -e '0' -o '1.1,2.1,0' > merged
# 12 5 aa
# 3 6 bb
# 0 1 cc

Note: j2 : looking at second column for key (for both files) -a FILENUM: also print unpairable lines from file FILENUM, where FILENUM is 1 or 2, corresponding to FILE1 or FILE2

But I don't know how to generalize it for multiple files, i.e. this one doesn't work, which means I can't easily put it in a loop:

join -j2 merged file3 -a 2 -a 2 -e '0' -o '1.1,2.1,0' > merged2

I'd be ideally not want to use SQL to achive this, but would be ok if there wasn't any other way.

Solution

Finally figured out an efficient solution using sort packed in the following bash script multi_join_from_uniq.sh:

#!/bin/sh

join_rec() {
  file1=$1
  file2=$2
  shift 2
  if [ $# -gt 0 ]; then
    join -e0 -a 1 -a 2 "$file1" -o auto "$file2" | join_rec - "$@"
  else
    join -e0 -a 1 -a 2 "$file1" -o auto "$file2"
  fi
}

join_rec "$@"

However, the script only works only for sorted and swapped columns, which can be done as process substitution to make this example work:

sh multi_join_from_uniq.sh <(awk '{print $2,$1}' file1 | sort) <(awk '{print $2,$1}' file2 | sort ) <(awk '{print $2,$1}' file3 | sort )

resulting in the following output:

aa 12 5 11
bb 3 6 7
cc 0 1 0
dd 0 0 4

Of course, this can be also done when piping the the output of the uniq command e.g. this way cat data.txt | uniq -c | awk '{print $2,$1}' | sort > file1 or so.