I have two files:
File1:
chr1 100736194 46 0.731 + 100735713 100736636
chr1 100736194 49 0.879 + 100735723 100736646
chr1 100736196 54 0.952 + 100735753 100736666
File2:
chr1 100735713 100736636 + BMCHAS
chr1 100735723 100736646 + ATCGSG
chr1 100735753 100736666 + BDUSUS
I want to join file 1 and 2 based on
to output something like:
chr1 100736194 46 0.731 + 100735713 100736636 BMCHAS
chr1 100736194 49 0.879 + 100735723 100736646 ATCGSG
chr1 100736196 54 0.952 + 100735753 100736666 BDUSUS
I don't know how to use join for more than a single column and I looked up an awk solution and have been trying to implement that but I think I am not right somewhere, the command that I am using is: awk 'NR==FNR{a[$1,$2]=$5;next}{$3=a[$7,$8];print}' OFS='\t' file2 file1,
Also, it is important to match not one, but two columns from each file, because the single columns are not unique enough. This means I cannot use single-column matching solutions such as only matching file1's column 6 with file2's column 2 only, it must use the multiple specified columns above.
Thanks in advance.
You can still use join
. The trick is to attach the two columns together so they get treated as one.
I've done it here with sed
but you could use awk
or whatever you like.
The <(..)
syntax is bash-specific and means approximately "create a temporary named pipe and pass that as a file".
join \
-1 6 \
-2 2 \
-o "1.1 1.2 1.3 1.4 1.5 0 2.4" \
<(sed 's/\([0-9]\{9\}\) *\([0-9]\{9\}\)/\1-\2/' file1) \
<(sed 's/\([0-9]\{9\}\) *\([0-9]\{9\}\)/\1-\2/' file2) \
| sed 's/-/ /'
This works with GNU sed (I have 4.2.2); you may need to make adjustments for other sed implementations.
The input sed
commands convert your input files to look like this:
chr1 100736194 46 0.731 + 100735713-100736636
The output sed
command undoes the change.
The output format (-o "1.1 1.2 1.3 1.4 1.5 0 2.4"
) gives the output you asked for:
chr1 100736194 46 0.731 + 100735713 100736636 BMCHAS
chr1 100736194 49 0.879 + 100735723 100736646 ATCGSG
chr1 100736196 54 0.952 + 100735753 100736666 BDUSUS