bashbed

Concatenate column from many bed files into single bed file


I have n bed files in the format:

n.bed

chr1 0 10000 4 331
chr1 10000 20000 6 154
chr1 20000 30000 3 12

I would like to take column 4 (4, 6, 3) from each bed file and output as a single table file (csv/tsv/exact format doesn't matter), where columns 4 through 4+n are labelled the name of each bed file and contain column 4.

For example, take two bed files:

1.bed :

chr1 0 10000 4 331
chr1 10000 20000 6 154
chr1 20000 30000 3 12

2.bed :

chr1 0 10000 2 412
chr1 10000 20000 7 14
chr1 20000 30000 2 155

I would like the output to be:

chrom start end 1.bed 2.bed
chr1 0 10000 4 2
chr1 10000 20000 6 7
chr1 20000 30000 3 2

My current attempt has been to use bedops:

$ bedops --everything *.bed \
    | bedmap --echo-map - \
    | awk '(split($0, a, ";") == 3)' - \
    | sed 's/\;/\n/g' - \
    | sort-bed - \
    | uniq - \
    > answer.bed

However this produces the output:

Error: Unable to find file: 1.bed

Solution

  • Assumptions:

    One awk idea:

    awk '
    BEGIN  { FS=OFS="\t"
             hdr = "chrom" OFS "start" OFS "end"
           }
    FNR==1 { hdr = hdr OFS FILENAME }
           { key = $1 OFS $2 OFS $3
             lines[FNR] = (FNR==NR ? key : lines[FNR]) OFS $4
           }
    END    { print hdr
             for (i=1;i<=FNR;i++)
                 print lines[i]
           }
    ' *.bed
    

    NOTES:

    This generates:

    chrom   start   end     1.bed   2.bed
    chr1    0       10000   4       2
    chr1    10000   20000   6       7
    chr1    20000   30000   3       2