The short version of the question: To which value to set RS in awk
to split records based on each line whose n-th field is empty ? (if line would be completely empty ,i.e. no Timestamp field in my examples, then setting RS="\n\n ..."
would do.
The long version:
This is how my log file looks like (notice the intertwined sections related to **amd64**
resp. **arm64**
) :
...
2023-12-29T16:05:20.3032116Z
2023-12-29T16:05:20.3040485Z #10 [linux/arm64 builder 1/8] WORKDIR /app
2023-12-29T16:05:20.4084773Z #10 DONE 0.8s
2023-12-29T16:05:20.4085104Z
2023-12-29T16:05:20.4085552Z #11 [linux/amd64 builder 1/8] WORKDIR /app
2023-12-29T16:05:20.5499792Z #11 DONE 0.1s
2023-12-29T16:05:20.5505699Z
2023-12-29T16:05:20.5509862Z #12 [linux/amd64 builder 2/8] RUN apk add --no-cache libc6-compat
2023-12-29T16:05:20.5512029Z #12 0.138 fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
2023-12-29T16:05:20.6982466Z #12 ...
2023-12-29T16:05:20.6983744Z
2023-12-29T16:05:21.2474882Z #16 [linux/arm64 runner 2/7] RUN addgroup -S -g 1001 nodejs
2023-12-29T16:05:21.3971789Z #16 ...
2023-12-29T16:05:21.3972318Z
...
.... as can be seen, each section ends with an line which doesn't contain anything except a Timestamp
The goal is to print separately the sections (lines) for each of amd64 and for arm64, e.g. (for amd64):
2023-12-29T16:05:20.4085104Z <-- ideally be present in output
2023-12-29T16:05:20.4085552Z #11 [linux/amd64 builder 1/8] WORKDIR /app
2023-12-29T16:05:20.5499792Z #11 DONE 0.1s
2023-12-29T16:05:20.5505699Z <-- ideally be present in output
2023-12-29T16:05:20.5509862Z #12 [linux/amd64 builder 2/8] RUN apk add --no-cache libc6-compat
2023-12-29T16:05:20.5512029Z #12 0.138 fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
The ideal solution would:
awk
, except when solutions in sed
& co. are really overkill and more 'script-like'awk
)The followig solution only works (partially) but only if the log didn't have any fields in the empty lines (e.g. no Timestamp field):
awk -vRS='\n\n' -vORS='\n\n' '/amd64 builder/ 1' logfile
.
however, and as an extra question: why (and how to correct it) does this solution print twice, in the first section of the output, the keyword searched for, i.e. amd64 in my case? Other (subsequent) sections only have the keyword once (as expected) ?
Thanks
LE: just realized that, without preserving the line with just the Timestamp in it, the output is hard to read .. so if you guys @Ed Morton and @markp-fuso could adjust a little bit your answers to preserve that line ? Thank you !
$ awk -v tgt='amd64' 'NF<2{f=""; next} !f{f=($3 ~ ("/"tgt"$"))} f' file
2023-12-29T16:05:20.4085552Z #11 [linux/amd64 builder 1/8] WORKDIR /app
2023-12-29T16:05:20.5499792Z #11 DONE 0.1s
2023-12-29T16:05:20.5509862Z #12 [linux/amd64 builder 2/8] RUN apk add --no-cache libc6-compat
2023-12-29T16:05:20.5512029Z #12 0.138 fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
2023-12-29T16:05:20.6982466Z #12 ...
$ awk -v tgt='arm64' 'NF<2{f=""; next} !f{f=($3 ~ ("/"tgt"$"))} f' file
2023-12-29T16:05:20.3040485Z #10 [linux/arm64 builder 1/8] WORKDIR /app
2023-12-29T16:05:20.4084773Z #10 DONE 0.8s
2023-12-29T16:05:21.2474882Z #16 [linux/arm64 runner 2/7] RUN addgroup -S -g 1001 nodejs
2023-12-29T16:05:21.3971789Z #16 ...
NF<2{f=""; next}
clears the flag f
when there's only a timestamp
on the line.!f{f=($3 ~ ("/"tgt"$"))}
sets f
to 1
(if tgt
is present) or 0
(otherwise) when each line that looks like #11 [linux/amd64 builder 1/8]
is read.f
causes the current line to be printed when f
is 1
.I don't know why you thought setting RS
to \n\n
would work for you, it fails because doing so is unrelated to your problem.
Given your comments, it sounds like this is what you're looking for (using GNU awk for multi-char RS
, RT
, and \S/\s
):
$ awk -v RS='\n\\S+\\s*\n' -v ORS= '/amd64/{print $0 RT}' file
2023-12-29T16:05:20.4085552Z #11 [linux/amd64 builder 1/8] WORKDIR /app
2023-12-29T16:05:20.5499792Z #11 DONE 0.1s
2023-12-29T16:05:20.5505699Z
2023-12-29T16:05:20.5509862Z #12 [linux/amd64 builder 2/8] RUN apk add --no-cache libc6-compat
2023-12-29T16:05:20.5512029Z #12 0.138 fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/x86_64/APKINDEX.tar.gz
2023-12-29T16:05:20.6982466Z #12 ...
2023-12-29T16:05:20.6983744Z
$ awk -v RS='\n\\S+\\s*\n' -v ORS= '/arm64/{print $0 RT}' file
2023-12-29T16:05:20.3032116Z
2023-12-29T16:05:20.3040485Z #10 [linux/arm64 builder 1/8] WORKDIR /app
2023-12-29T16:05:20.4084773Z #10 DONE 0.8s
2023-12-29T16:05:20.4085104Z
2023-12-29T16:05:21.2474882Z #16 [linux/arm64 runner 2/7] RUN addgroup -S -g 1001 nodejs
2023-12-29T16:05:21.3971789Z #16 ...
2023-12-29T16:05:21.3972318Z