I have n bed files in the format:
n.bed
chr1 | 0 | 10000 | 4 | 331 |
chr1 | 10000 | 20000 | 6 | 154 |
chr1 | 20000 | 30000 | 3 | 12 |
I would like to take column 4 (4, 6, 3) from each bed file and output as a single table file (csv/tsv/exact format doesn't matter), where columns 4 through 4+n are labelled the name of each bed file and contain column 4.
For example, take two bed files:
1.bed :
chr1 | 0 | 10000 | 4 | 331 |
chr1 | 10000 | 20000 | 6 | 154 |
chr1 | 20000 | 30000 | 3 | 12 |
2.bed :
chr1 | 0 | 10000 | 2 | 412 |
chr1 | 10000 | 20000 | 7 | 14 |
chr1 | 20000 | 30000 | 2 | 155 |
I would like the output to be:
chrom | start | end | 1.bed | 2.bed |
---|---|---|---|---|
chr1 | 0 | 10000 | 4 | 2 |
chr1 | 10000 | 20000 | 6 | 7 |
chr1 | 20000 | 30000 | 3 | 2 |
My current attempt has been to use bedops:
$ bedops --everything *.bed \
| bedmap --echo-map - \
| awk '(split($0, a, ";") == 3)' - \
| sed 's/\;/\n/g' - \
| sort-bed - \
| uniq - \
> answer.bed
However this produces the output:
Error: Unable to find file: 1.bed
Assumptions:
chrom
, start
and end
and ...chrom
+ start
+ end
One awk
idea:
awk '
BEGIN { FS=OFS="\t"
hdr = "chrom" OFS "start" OFS "end"
}
FNR==1 { hdr = hdr OFS FILENAME }
{ key = $1 OFS $2 OFS $3
lines[FNR] = (FNR==NR ? key : lines[FNR]) OFS $4
}
END { print hdr
for (i=1;i<=FNR;i++)
print lines[i]
}
' *.bed
NOTES:
awk
script replaces OP's current bedops | bedmap | awk | sed | sort-bed | uniq
code*.bed
files already exist and are not the output from bedops | bedmap
This generates:
chrom start end 1.bed 2.bed
chr1 0 10000 4 2
chr1 10000 20000 6 7
chr1 20000 30000 3 2