awk

Is a /start/,/end/ range expression ever useful in awk?


I've always contended that you should never use a range expression like:

/start/,/end/

in awk because although it makes the trivial case where you only want to print matching text including the start and end lines slightly briefer than the alternative*:

/start/{f=1} f{print; if (/end/) f=0}

when you want to tweak it even slightly to do anything else, it requires a complete re-write or results in duplicated or otherwise undesirable code. e.g. if you want to print the matching text excluding the range delimiters using the second form above you'd just tweak it to move the components around:

f{if (/end/) f=0; else print} /start/{f=1}

but if you started with /start/,/end/ you'd need to abandon that approach in favor of what I just posted or you'd have to write something like:

/start/,/end/{ if (!/start|end/) print }

i.e. duplicate the conditions which is undesirable.

Then I saw a question posted that required identifying the LAST end in a file and where a range expression was used in the solution and I thought it seemed like that might have some value (see https://stackoverflow.com/a/21145009/1745001).

Now, though, I'm back to thinking that it's just not worth bothering with range expressions at all and a solution that doesn't use range expressions would have worked just as well for that case.

So - does anyone have an example where a range expression actually adds noticeable value to a solution?

*I used to use:

/start/{f=1} f; /end/{f=0}

but too many times I found I had to do something additional when f is true and /end/ is found (or to put it another way ONLY do something when /end/ is found IF f were true) so now I just try to stick to the slightly less brief but much more robust and extensible:

/start/{f=1} f{print; if (/end/) f=0}

Solution

  • Interesting. I also often start with a range expression and then later on switch to using a variable..

    I think a situation where this could be useful, aside from the pure range-only situations is if you want to print a match, but only if it lies in a certain range. Also because it is immediately obvious what it does. For example:

    awk '/start/,/end/{if(/ppp/)print}' file
    

    with this input:

    start
    dfgd gd
    ppp 1
    gfdg
    fd gfd
    end
    ppp 2 
    ppp 3
    start
    ppp 4
    ppp 5
    end
    ppp 6
    ppp 7
    gfdgdgd
    

    will produce:

    ppp 1
    ppp 4
    ppp 5
    

    -- One could of course also use:

    awk '/start/{f=1} /ppp/ && f; /end/{f=0}' file
    

    But it is longer and somewhat less readable..