bashsedreplace

Bash: Complex examples of sed command


I have a file which looks like so which has 2 column (space-delimited):

chr1.21.imputed_info:1   100880328
chr1.31.imputed_info:1   10566215
chr1.23.imputed_info:--- 110198129
chr1.23.imputed_info:--- 114445880
chr1.24.imputed_info:--- 118141492
chr1.25.imputed_info:--- 120257110
chr1.25.imputed_info:1   121280613
chr1.30.imputed_info:--- 121287994
chr1.30.imputed_info:--- 145604302

I want to extract the number following "chr" which goes from 1-22 and the second column. So my output would look like so:

    1 100880328
    1 10566215
    1 110198129
    1 114445880
    1 118141492
    1 120257110
    1 121280613
    1 121287994
    1 145604302

A few important considerations:

I have come up with this in Bash:

cat file.txt | sed 's/chr//g' | sed 's/.imputed_info://g'

This gets me very close but it does this:

1.211    100880328
1.31     10566215
1.23---  110198129
1.23---  114445880
1.24---  118141492
1.25---  120257110
1.251    121280613
1.25---  121287994
1.30---  145604302
1.301    149906413

I know there would be ways to do this in R and Python but I should say this is a huge file so going through Bash would a great time saver.. So if anyone has a nice (and ideally clean solution - I do realise my sed command is kinda ugly) it would be great. Thanks.


Solution

  • Shorter way:

    sed 's/^chr//;s/\..* / /' filename
    

    EDIT:
    Translation: remove the leading "chr" (if it's there), and replace everything from the first '.' to the last space (that is, a '.' followed by anything, followed by ' ') with a single space.