My file looks like this :
1-0039.1 EMBL transcript 1 1524 . + . transcript_id "1-0039.1.2"; gene_id "1-0039.1.2"; gene_name "dnaA"
1-0039.1 EMBL CDS 1 1524 . + 0 transcript_id "1-0039.1.2"; gene_name "dnaA";
1-0039.1 EMBL transcript 1646 1972 . + . transcript_id "1-0039.1.5"; gene_id "1-0039.1.5"; gene_name "ORF0009"
I want to change all "1-0039.1" values in the first column to 1
so I have tried:
awk -vOFS='\t' '{$1="1"; print}' 1-0039.gtf > 1-0039_modified.gtf
And the output looks like this:
1 EMBL transcript 1 1524 . + . transcript_id "1-0039.1.2"; gene_id "1-0039.1.2"; gene_name "dnaA"
1 EMBL CDS 1 1524 . + 0 transcript_id "1-0039.1.2"; gene_name "dnaA";
1 EMBL transcript 1646 1972 . + . transcript_id "1-0039.1.5"; gene_id "1-0039.1.5"; gene_name "ORF0009"
1 EMBL CDS 1646 1972 . + 0 transcript_id "1-0039.1.5"; gene_name "ORF0009";
1 EMBL transcript 2023 2940 . + . transcript_id "1-0039.1.7"; gene_id "1-0039.1.7"; gene_name "ORF0586"
1 EMBL CDS 2023 2940 . + 0 transcript_id "1-0039.1.7"; gene_name "ORF0586";
1 EMBL transcript 2897 3223 . + . transcript_id "1-0039.1.9"; gene_id "1-0039.1.9"; gene_name "ORF0009"
As you can see values in the last column were space-separated but now they are tab separated. My question is how do I change the first column only without messing up other columns?
awk '{sub(/^1-0039.1/,1); print}' 1-0039.gtf > 1-0039_modified.gtf
But the sed
solutions in the comments will do the same job faster.
Unfortunately the question gives contradictory information:
The identical view can be created by tab separation at a tab width of 8 spaces using one tab per field.
So the solution has to deal with this conflict.
This is the reason why my solution does not use the field splitting feature of awk but just has a look at the pattern of the first column.
Like this the solution does not rely on an assumption for propper work. The delimiter can be of any type and count and the solution will do the job.
Especially it will not change the current state of the column delimiter(s).
Thanks for the comments below. They have their point, but keep it simple for understanding was the first thought.
So here an alternate edition to get more flexibility in the first column:
awk '{sub(/^1-[^ \t]*/,1); print}' 1-0039.gtf > 1-0039_modified.gtf
As this variant will split at the first space that possibly should not be a delimiter the following version will respect a single space as part of the content of the first column field:
awk '{sub(/^1- ?[^ \t]*/,1); print}' 1-0039.gtf > 1-0039_modified.gtf