I work with genetic data. I just found a supercomputer to help with genetic analysis, but I need to convert the data to exactly the format the super computer wants: two columns, one with chromosome information and one with p-value. The p-value column must not have any letters, but some of the data I have is in scientific notation, like so:
rs191895619 1.052e-05
rs140779862 0.4406
rs11127542 0.9771
rs112183333 0.02569
rs191067167 0.427
rs111321342 1.042e-05
which puts several E's in the column that must not have letters in it.
I tried to use grep
to move them into their own file using grep "*e*" filename.txt > outputfilename.txt
as well as grep "*e-05" filename.txt > outputfilename.txt
but it gave me a blank output file both times, and even if all 5000 lines of scientifically notated data had moved into their own file, I don't know how to change the data to decimal notation except by editing each line individually, which would take several days for each file.
Is there a command I can give to plink so that the data it gives me is not in scientific notation in the first place? Or a command I can use in plink or Unix to convert the scientific notation I have into decimal notation?
You can use awk
to convert scientific to decimal:
awk '{printf "%s %f\n", $1, $2}' file
Outputs:
rs191895619 0.000011
rs140779862 0.440600
rs11127542 0.977100
rs112183333 0.025690
rs191067167 0.427000
rs111321342 0.000010
You can adjust the precision by changing %f
part in printf
.
See also: