linuxbioinformaticsgwas

Lifting over GWAS summary statististic file from build 38 to build 37


I am using the UCSC lift over tool and the associated chain to lift over the results of my GWAS summary statistic file (a tab separated file) from build 38 to build 37. The GWAS summary stat file looks like:

1 chr1_17626_G_A 17626 A G 0.016 -0.0332 0.0237 0.161
1 chr_20184_G_A  20184 A G 0.113 -0.185  0.023 0.419

Follwing is the UCSC tool with the associated chain I am using:

I want to create a file in bed format from GWAS summary stat fle that is the required input by the tool, where I would like the first three columns to be tab separated and rest of the columns to be merged in a single column and separated by a non tab separator such as "." so as to preserve them while running the lift over. The first three columns of the input bed file would be:

awk '{print chr$1, $3-1, $3}' GWAS summary stat file > ucsc.input.file

#$1 = chrx - where x is chromosome number 
#$2  position -1  for SNPs
#$3  bp position hg38 for SNPs

The above three are the required columns for the tool.

My questions are:

  1. How can I use a non tab separator say ":" to merge rest of the columns of the GWAS summary stat file in one column?
  2. After running the liftover, how can I unpack the columns separated by :?

Solution

  • I am not sure if this answers your questions but please take a look.

    You can use awk to merge multiple columns by :

    awk '{print $1 ":" $2 ":" $3}' file
    

    and then say you want to replace : by tab in $1 then you can do

    awk -F ":" '{gsub(/:/,"\t",$1)}1' file
    

    Is this of any help?