[SOLVED] Separating the last number in each line from the letters

Separating the last number in each line from the letters

I have a long file with provisional SNP IDs and alleles, which looks like this:

14_611646T,C
14_881226CT,C
14_861416.1GGC,GGCGCGCGCG

I would like to separate the last number in each line from the letters (separate SNP ID from alleles). So to look like this:

14_611646 T,C
14_881226 CT,C
14_861416.1 GGC,GGCGCGCGCG

I tried both awk and sed, however, underscore keeps making the problem. For example:

sed 's/^[0-9][0-9]*/& / File1 > File2

gave me

14 _611646T,C
14 _881226CT,C
14 _861416.1GGC,GGCGCGCGCGC

Can anyone help me?

Solution

Try to understand what is the most smart way to achieve this.

It's better to avoid using a regex that match all the line, instead try to find the portion that need change.

sed -E 's/^[0-9_.]+/& /' file

14_611646 T,C
14_881226 CT,C
14_861416.1 GGC,GGCGCGCGCG

Node	Explanation
`^`	the beginning of the string anchor
`[0-9_.]+`	any character of: '0' to '9', '_', '.' (1 or more times (matching the most amount possible))

In the right part of sed's substitution, & is what matched in the left part.

sed 's/[[:upper:]]/ &/' file

[[:upper:]] is a POSIX regex class meant for all upper case letters.

Separating the last number in each line from the letters