I have a set of data that consists of seismic wave travel times and their corresponding information (i.e. source that produced the wave and the time for that wave arriving at each geophone along the spread). I am trying to format the data to fit my code in order to do some tomography using the data, but I'm still relatively new to awk. I am at a point where I need to now insert the number of receivers for each shot/source into the line of shot/source information, but its a variable amount each time. Is there a way to have awk count the number of rows and insert that into the proper field?
My data is formatted like the following.
Each line that documents a source/shot:
s 0.01 0 0 -1 0
Every other line that follows the source/shot information:
r 0.1 0 0 1.218 0.01
r 0.15 0 0 1.214 0.01
r 0.2 0 0 1.213 0.01
I can use the "s" as a flag for the shot lines, and I would like to count the number of "r" lines for each source/shot and insert that number into the corresponding "s" line.
The number of "r" lines for each "s" line varies greatly.
Given this sample input:
s 0.01 0 0 -1 0
r 0.1 0 0 1.218 0.01
r 0.15 0 0 1.214 0.01
r 0.2 0 0 1.213 0.01
s 1.01 0 0 -1 0
r 0.05 0 0 1.159 0.01
r 0.1 0 0 1.127 0.01
r 0.15 0 0 1.106 0.01
r 0.2 0 0 1.115 0.01
r 0.25 0 0 1.107 0.01
The expected output is:
s 0.01 0 3 -1 0
r 0.1 0 0 1.218 0.01
r 0.15 0 0 1.214 0.01
r 0.2 0 0 1.213 0.01
s 1.01 0 5 -1 0
r 0.05 0 0 1.159 0.01
r 0.1 0 0 1.127 0.01
r 0.15 0 0 1.106 0.01
r 0.2 0 0 1.115 0.01
r 0.25 0 0 1.107 0.01
Note the 3
as $4 in the first s
line and the 5
as $4 in the second one.
The counted number of rows should be in the 4th column of each "s" line (asterisks here).
My experience with awk is limited to just rearranging/indexing columns, so I don't really know where to begin with this. I've tried googling help with awk, but it's very difficult to find answered awk questions that actually pertain to my specific situation (hence why I have decided to ask it myself).
I'm also new to using stackoverflow, so if I need to include more example data, please let me know. My data consists of approximately 4000 lines.
EDIT: The reason the desired result has slightly different data to the example of my data is because there are hundreds of lines for each "s" line and including that in the question seems excessive. I have cut out the majority of the data for ease of reading.
A simple method is to read the file backwards.
r
line, increment a counters
line, substitute the counter and reset itand then reverse the result:
tac input |
awk '
/^r/ { n++ }
/^s/ { $4=n; n=0 }
{ print }
' |
tac > output
You can read the file forwards but that involves maintaining state:
awk '
/^s/ {
# this prints the *previous* group of lines
if (NR>1)
print c1,c2,c3, n, c5,c6, r
# save s columns, initialise n counter and r string
c1=$1; c2=$2; c3=$3; n=0; c5=$5; c6=$6; r=""
}
/^r/ {
n++
r = r RS $0
}
END {
# print final group
print c1,c2,c3, n, c5,c6, r
}
' input >output