Awk matching patterns and removing adjacent lines

I've got the volumetric data from different brain regions and I'm trying to sort it out to make the analysis easier. To get an idea this is a part of what I've got:

LT_Putamen 5075 5075.000000
LT_Temporal 84593 84593.000000
LT_Thalamus 7720 7720.000000
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100

I want modify it and the output would be:

LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100

Just want to have this "overlaps" line in each record.

I'm rather a newbie in programming but I came up with something like that:

awk '{
    if (NR == 1) {
        # Initialize the first region (using first world in a line)
        region = $1
        print $0
    } else {
        if ($1 != region) {
            # Finalize the old region - printing "overlaps" line with 0 0
            printf("%s %overlaps 0 0\n", region)
            # Start the new region
            region = $1
        }
        # Print the current line (for the current region)
        print $0

    }
}
END {
    # For the last region
    if (region) {
        printf("%s 0 0\n", region)
    }
}'

The outcome is close to what I want:

LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 0 0
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 0 0
RT_Amygdala overlaps 2133.000000 94.7100

But I've these extra "overlaps" lines in regions which already had it. Could you please help me? What should I do to make it work? I'd be very grateful for any help!! Thanks

Marcin

Solution

Assumptions/Understandings:

input file is already sorted by the 1st field
for a given value in the 1st field there will be at most 2 lines in the input file with said value
for a given value in the 1st field and only one input line containing said value, said line will not contain the string "overlaps"
for a given value in the 1st field there will be exactly 2 lines in the output with said value

One awk idea:

awk '
    { if ($1 != prev && NR > 1 && ! overlaps)       # if different $1 and previous line did not contain string "overlaps" then ...    
         print prev,"overlaps",0,0                  # print new line
      overlaps = ($2 == "overlaps" ? 1 : 0)         # set flag
      prev = $1                                     # save current $1
    }
1                                                   # print current line
END { if (! overlaps)                               # if last line of file did not contain string "overlaps" then ...  
         print prev,"overlaps",0,0                  # print new line
    }
' volume.dat

This generates:

LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100

To demonstrate correct processing where the last line is not an "overlaps" line:

Setup:

$ cat volume.dat
LT_Putamen 5075 5075.000000
LT_Temporal 84593 84593.000000
LT_Thalamus 7720 7720.000000
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
XX_Last_Line 1234 6789.00000

The same code generates:

LT_Putamen 5075 5075.000000
LT_Putamen overlaps 0 0
LT_Temporal 84593 84593.000000
LT_Temporal overlaps 0 0
LT_Thalamus 7720 7720.000000
LT_Thalamus overlaps 0 0
RT_Accumbens 623 623.000000
RT_Accumbens overlaps 64.000000 10.2700
RT_Amygdala 2252 2252.000000
RT_Amygdala overlaps 2133.000000 94.7100
XX_Last_Line 1234 6789.00000
XX_Last_Line overlaps 0 0