regexsed

Matching boot times from log using sed -E


I have the following issue. I want to compute average boot times from journalctl on a linux machine. I have already grepped only lines matching the required lines which look like this:

Sep 15 23:37:31 x systemd[1]: Startup finished in 15.751s (firmware) + 3.034s (loader) + 8.797s (kernel) + 17.169s (userspace) = 44.753s.
Sep 17 23:01:24 x systemd[1]: Startup finished in 13.976s (firmware) + 1.998s (loader) + 13.169s (kernel) + 12.014s (userspace) = 41.159s.
Sep 20 09:22:15 x systemd[1]: Startup finished in 19.200s (firmware) + 5.931s (loader) + 21min 29.490s (kernel) + 12.643s (userspace) = 22min 7.266s.
Sep 23 10:21:06 x systemd[1]: Startup finished in 13.140s (firmware) + 5.571s (loader) + 3min 44.479s (kernel) + 12.065s (userspace) = 4min 15.256s.
Sep 23 15:18:53 x systemd[1]: Startup finished in 15.277s (firmware) + 3.152s (loader) + 10.616s (kernel) + 33.766s (userspace) = 1min 2.812s.

Now I want to use sed -E to filter the startup time, i.e. the last number. This is complicated by the fact that there are startup times over and under a minute. Here is what I tried so far:

1. sed -E "s/.*systemd\[1\]:.*(([0-9]+min)?[[:space:]]*[0-9]+\.[0-9]{3}s)/\1/"
2. sed -E "s/.*systemd\[1\]:.*(([0-9]+min)+[[:space:]]*[0-9]+\.[0-9]{3}s)/\1/"
3. sed -E "s/.*systemd\[1\]:.*((([0-9]+min[[:space:]]+)?[0-9]+\.[0-9]{3}s))/\1/"
4. sed -E "s/.*systemd\[1\]:.*([0-9]+min[[:space:]]+[0-9]+\.[0-9]{3}s|[0-9]+\.[0-9]{3}s)/\1/"
5. sed -E 's/.*systemd\[1\]:.*(([0-9]+min[[:space:]]+[0-9]+\.[0-9]{3}s)|([0-9]+\.[0-9]{3}s)).*/\1\3/'

Here is what happens: 1, 3, and 4: match only the 2.812s part independent of min part. 2: matches only if the min part is present (this makes sense) 5: same as 1,3 and 4, but twice. Which makes sense somehow

So, I am at a loss how I can match both lines with the min part present and lines with no min part. Help?


Solution

  • Making the minutes optional lets sed apply the .* maximally greedily and skip them entirely. The easy fix is to require some sort of anchor, such as the equals sign just before the string you want to capture.

    sed -E "s/.*systemd\[1\]:.*= (([0-9]+min[[:space:]])?[0-9]+\.[0-9]{3}s)/\1/"
    

    Notice also how I refactored the parentheses around the optional minutes.

    Generally, I would recommend single quotes around your sed scripts if you are using a Unix-style shell, but I did not change that here.

    Demo: https://ideone.com/Dwu5Xl