I've got the volumetric data from different brain regions and I'm trying to sort it out to make the analysis easier. To get an idea this is a part of what I've got:
LT_Putamen 5075 5075.000000
LT_Temporal 84593 84593.000000
LT_Thalamus 7720 7720.000000
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
I want modify it and the output would be:
LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
Just want to have this "overlaps" line in each record.
I'm rather a newbie in programming but I came up with something like that:
awk '{
if (NR == 1) {
# Initialize the first region (using first world in a line)
region = $1
print $0
} else {
if ($1 != region) {
# Finalize the old region - printing "overlaps" line with 0 0
printf("%s %overlaps 0 0\n", region)
# Start the new region
region = $1
}
# Print the current line (for the current region)
print $0
}
}
END {
# For the last region
if (region) {
printf("%s 0 0\n", region)
}
}'
The outcome is close to what I want:
LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 0 0
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 0 0
RT_Amygdala overlaps 2133.000000 94.7100
But I've these extra "overlaps" lines in regions which already had it. Could you please help me? What should I do to make it work? I'd be very grateful for any help!! Thanks
Marcin
Assumptions/Understandings:
One awk
idea:
awk '
{ if ($1 != prev && NR > 1 && ! overlaps) # if different $1 and previous line did not contain string "overlaps" then ...
print prev,"overlaps",0,0 # print new line
overlaps = ($2 == "overlaps" ? 1 : 0) # set flag
prev = $1 # save current $1
}
1 # print current line
END { if (! overlaps) # if last line of file did not contain string "overlaps" then ...
print prev,"overlaps",0,0 # print new line
}
' volume.dat
This generates:
LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
To demonstrate correct processing where the last line is not an "overlaps" line:
Setup:
$ cat volume.dat
LT_Putamen 5075 5075.000000
LT_Temporal 84593 84593.000000
LT_Thalamus 7720 7720.000000
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
XX_Last_Line 1234 6789.00000
The same code generates:
LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
XX_Last_Line 1234 6789.00000
XX_Last_Line overlaps 0 0